Pandas Assignment– 5

Data Manipulation

Basic Questions

Create a DataFrame with columns [‘id’,’name’,’age’]. Sort by ‘age’ in ascending order using sort_values().
Sort the same DataFrame by ‘name’ in descending order.
Create a DataFrame with a non-sequential index [101, 105, 103]. Sort rows by index using sort_index().
Create a Series with index [3,1,2] and values [30,10,20]. Sort by index.
Reindex a DataFrame with new index [0,1,2,3,4] and fill missing rows with NaN.
Reindex a DataFrame column order from [‘A’,’B’,’C’] to [‘C’,’B’,’A’].
Create two DataFrames and concatenate them row-wise using pd.concat([df1, df2]).
Concatenate the same DataFrames column-wise.
Append a new row to a DataFrame using .append() (note: show deprecation warning but use for practice).
Create two DataFrames with a common column ‘id’. Perform an inner merge on ‘id’.
Perform a left merge on the same DataFrames using ‘id’.
Use .join() to join two DataFrames on their index.
Create a DataFrame of students with ‘class’ and ‘marks’. Group by ‘class’ and compute average marks.
Use .groupby(‘class’)[‘marks’].count() to count number of students in each class.
Group a DataFrame by ‘dept’ and find number of unique values in ’emp_id’ using .nunique().
Create a small DataFrame and pivot it with ‘id’ as index, ‘subject’ as columns, and ‘marks’ as values.
Create a DataFrame with duplicate ‘subject’ rows and use pivot_table() with aggfunc=’mean’.
Create a DataFrame with columns ‘id’,’month’,’sales’. Reshape it to long format using melt().
Create a pivoted DataFrame and use .stack() to reshape it to long form.
Use .unstack() on a MultiIndex DataFrame to reshape into wide form.

Intermediate Questions

Sort a DataFrame by two columns ‘dept’ and ‘salary’ (salary descending).
Sort a Series by values and then by index; compare outputs.
Reindex a DataFrame with new column labels and fill missing columns with default values.
Concatenate three DataFrames vertically with keys [‘x’,’y’,’z’] and access one block using .loc[‘x’].
Concatenate with ignore_index=True and show difference.
Append a Series as a new row to a DataFrame.
Perform an outer merge between two DataFrames with different ‘id’ sets.
Merge on multiple columns (‘id’,’dept’).
Use join with how=’outer’ to join on index.
Create a sales DataFrame grouped by ‘region’. Compute total and mean sales per region with .agg([‘sum’,’mean’]).
Use .groupby([‘region’,’product’]) to compute count of records for each combination.
Create a pivot table summarizing sales by ‘region’ (rows) and ‘product’ (columns).
Create a pivot table with multiple aggregation functions (sum and mean).
Use melt() to unpivot a wide DataFrame into long format, keeping ‘id’ as identifier.
Reshape a melted DataFrame back to wide using pivot().
Demonstrate .stack() on a DataFrame with two columns; print resulting Series with MultiIndex.
Apply .unstack() to move inner index level to columns.
Create a MultiIndex DataFrame and sort it by index at level 1.
Group by ‘class’ and compute both sum and unique count of ‘marks’ in one .agg() call.
Pivot a DataFrame with missing values and show how pivot_table() fills them using fill_value=0.

Advanced Questions

Create a DataFrame with employees across multiple departments. Sort by department and salary (descending), then group by department to get top 2 salaries per dept.
Demonstrate reindexing a time-series DataFrame to a new date range, filling missing with forward fill.
Concatenate multiple DataFrames with hierarchical keys and demonstrate selection using .xs() (cross-section).
Merge three DataFrames on a common ‘id’, each having different attributes, to create a wide combined DataFrame.
Perform a left join between two DataFrames where one has duplicate keys; explain and show resulting duplication.
Use groupby on a sales dataset to compute total, mean, and unique product counts per region. Present results in one DataFrame.
Create a pivot table with hierarchical indexes (region + quarter) and hierarchical columns (product + metric), aggregating with both sum and mean.
Reshape a MultiIndex DataFrame using .stack() followed by .unstack() to show equivalence with pivot tables.
Create a reshaping pipeline: start with a wide student DataFrame (id, marks for multiple subjects), melt it to long, compute average per subject with groupby, and pivot back to wide.
Build a dashboard-like summary:

- Total sales per region
- Count of unique products
- Average sales per product per region (pivot table)
- Long-format representation of all sales using melt()