Pandas Assignment– 5
Data Manipulation
Basic Questions
- Create a DataFrame with columns [‘id’,’name’,’age’]. Sort by ‘age’ in ascending order using sort_values().
- Sort the same DataFrame by ‘name’ in descending order.
- Create a DataFrame with a non-sequential index [101, 105, 103]. Sort rows by index using sort_index().
- Create a Series with index [3,1,2] and values [30,10,20]. Sort by index.
- Reindex a DataFrame with new index [0,1,2,3,4] and fill missing rows with NaN.
- Reindex a DataFrame column order from [‘A’,’B’,’C’] to [‘C’,’B’,’A’].
- Create two DataFrames and concatenate them row-wise using pd.concat([df1, df2]).
- Concatenate the same DataFrames column-wise.
- Append a new row to a DataFrame using .append() (note: show deprecation warning but use for practice).
- Create two DataFrames with a common column ‘id’. Perform an inner merge on ‘id’.
- Perform a left merge on the same DataFrames using ‘id’.
- Use .join() to join two DataFrames on their index.
- Create a DataFrame of students with ‘class’ and ‘marks’. Group by ‘class’ and compute average marks.
- Use .groupby(‘class’)[‘marks’].count() to count number of students in each class.
- Group a DataFrame by ‘dept’ and find number of unique values in ’emp_id’ using .nunique().
- Create a small DataFrame and pivot it with ‘id’ as index, ‘subject’ as columns, and ‘marks’ as values.
- Create a DataFrame with duplicate ‘subject’ rows and use pivot_table() with aggfunc=’mean’.
- Create a DataFrame with columns ‘id’,’month’,’sales’. Reshape it to long format using melt().
- Create a pivoted DataFrame and use .stack() to reshape it to long form.
- Use .unstack() on a MultiIndex DataFrame to reshape into wide form.
Intermediate Questions
- Sort a DataFrame by two columns ‘dept’ and ‘salary’ (salary descending).
- Sort a Series by values and then by index; compare outputs.
- Reindex a DataFrame with new column labels and fill missing columns with default values.
- Concatenate three DataFrames vertically with keys [‘x’,’y’,’z’] and access one block using .loc[‘x’].
- Concatenate with ignore_index=True and show difference.
- Append a Series as a new row to a DataFrame.
- Perform an outer merge between two DataFrames with different ‘id’ sets.
- Merge on multiple columns (‘id’,’dept’).
- Use join with how=’outer’ to join on index.
- Create a sales DataFrame grouped by ‘region’. Compute total and mean sales per region with .agg([‘sum’,’mean’]).
- Use .groupby([‘region’,’product’]) to compute count of records for each combination.
- Create a pivot table summarizing sales by ‘region’ (rows) and ‘product’ (columns).
- Create a pivot table with multiple aggregation functions (sum and mean).
- Use melt() to unpivot a wide DataFrame into long format, keeping ‘id’ as identifier.
- Reshape a melted DataFrame back to wide using pivot().
- Demonstrate .stack() on a DataFrame with two columns; print resulting Series with MultiIndex.
- Apply .unstack() to move inner index level to columns.
- Create a MultiIndex DataFrame and sort it by index at level 1.
- Group by ‘class’ and compute both sum and unique count of ‘marks’ in one .agg() call.
- Pivot a DataFrame with missing values and show how pivot_table() fills them using fill_value=0.
Advanced Questions
- Create a DataFrame with employees across multiple departments. Sort by department and salary (descending), then group by department to get top 2 salaries per dept.
- Demonstrate reindexing a time-series DataFrame to a new date range, filling missing with forward fill.
- Concatenate multiple DataFrames with hierarchical keys and demonstrate selection using .xs() (cross-section).
- Merge three DataFrames on a common ‘id’, each having different attributes, to create a wide combined DataFrame.
- Perform a left join between two DataFrames where one has duplicate keys; explain and show resulting duplication.
- Use groupby on a sales dataset to compute total, mean, and unique product counts per region. Present results in one DataFrame.
- Create a pivot table with hierarchical indexes (region + quarter) and hierarchical columns (product + metric), aggregating with both sum and mean.
- Reshape a MultiIndex DataFrame using .stack() followed by .unstack() to show equivalence with pivot tables.
- Create a reshaping pipeline: start with a wide student DataFrame (id, marks for multiple subjects), melt it to long, compute average per subject with groupby, and pivot back to wide.
- Build a dashboard-like summary:
- Total sales per region
- Count of unique products
- Average sales per product per region (pivot table)
- Long-format representation of all sales using melt()