Pandas Assignment– 5

Data Manipulation

Basic Questions

  1. Create a DataFrame with columns [‘id’,’name’,’age’]. Sort by ‘age’ in ascending order using sort_values().
  2. Sort the same DataFrame by ‘name’ in descending order.
  3. Create a DataFrame with a non-sequential index [101, 105, 103]. Sort rows by index using sort_index().
  4. Create a Series with index [3,1,2] and values [30,10,20]. Sort by index.
  5. Reindex a DataFrame with new index [0,1,2,3,4] and fill missing rows with NaN.
  6. Reindex a DataFrame column order from [‘A’,’B’,’C’] to [‘C’,’B’,’A’].
  7. Create two DataFrames and concatenate them row-wise using pd.concat([df1, df2]).
  8. Concatenate the same DataFrames column-wise.
  9. Append a new row to a DataFrame using .append() (note: show deprecation warning but use for practice).
  10. Create two DataFrames with a common column ‘id’. Perform an inner merge on ‘id’.
  11. Perform a left merge on the same DataFrames using ‘id’.
  12. Use .join() to join two DataFrames on their index.
  13. Create a DataFrame of students with ‘class’ and ‘marks’. Group by ‘class’ and compute average marks.
  14. Use .groupby(‘class’)[‘marks’].count() to count number of students in each class.
  15. Group a DataFrame by ‘dept’ and find number of unique values in ’emp_id’ using .nunique().
  16. Create a small DataFrame and pivot it with ‘id’ as index, ‘subject’ as columns, and ‘marks’ as values.
  17. Create a DataFrame with duplicate ‘subject’ rows and use pivot_table() with aggfunc=’mean’.
  18. Create a DataFrame with columns ‘id’,’month’,’sales’. Reshape it to long format using melt().
  19. Create a pivoted DataFrame and use .stack() to reshape it to long form.
  20. Use .unstack() on a MultiIndex DataFrame to reshape into wide form.

Intermediate Questions

  1. Sort a DataFrame by two columns ‘dept’ and ‘salary’ (salary descending).
  2. Sort a Series by values and then by index; compare outputs.
  3. Reindex a DataFrame with new column labels and fill missing columns with default values.
  4. Concatenate three DataFrames vertically with keys [‘x’,’y’,’z’] and access one block using .loc[‘x’].
  5. Concatenate with ignore_index=True and show difference.
  6. Append a Series as a new row to a DataFrame.
  7. Perform an outer merge between two DataFrames with different ‘id’ sets.
  8. Merge on multiple columns (‘id’,’dept’).
  9. Use join with how=’outer’ to join on index.
  10. Create a sales DataFrame grouped by ‘region’. Compute total and mean sales per region with .agg([‘sum’,’mean’]).
  11. Use .groupby([‘region’,’product’]) to compute count of records for each combination.
  12. Create a pivot table summarizing sales by ‘region’ (rows) and ‘product’ (columns).
  13. Create a pivot table with multiple aggregation functions (sum and mean).
  14. Use melt() to unpivot a wide DataFrame into long format, keeping ‘id’ as identifier.
  15. Reshape a melted DataFrame back to wide using pivot().
  16. Demonstrate .stack() on a DataFrame with two columns; print resulting Series with MultiIndex.
  17. Apply .unstack() to move inner index level to columns.
  18. Create a MultiIndex DataFrame and sort it by index at level 1.
  19. Group by ‘class’ and compute both sum and unique count of ‘marks’ in one .agg() call.
  20. Pivot a DataFrame with missing values and show how pivot_table() fills them using fill_value=0.

Advanced Questions

  1. Create a DataFrame with employees across multiple departments. Sort by department and salary (descending), then group by department to get top 2 salaries per dept.
  2. Demonstrate reindexing a time-series DataFrame to a new date range, filling missing with forward fill.
  3. Concatenate multiple DataFrames with hierarchical keys and demonstrate selection using .xs() (cross-section).
  4. Merge three DataFrames on a common ‘id’, each having different attributes, to create a wide combined DataFrame.
  5. Perform a left join between two DataFrames where one has duplicate keys; explain and show resulting duplication.
  6. Use groupby on a sales dataset to compute total, mean, and unique product counts per region. Present results in one DataFrame.
  7. Create a pivot table with hierarchical indexes (region + quarter) and hierarchical columns (product + metric), aggregating with both sum and mean.
  8. Reshape a MultiIndex DataFrame using .stack() followed by .unstack() to show equivalence with pivot tables.
  9. Create a reshaping pipeline: start with a wide student DataFrame (id, marks for multiple subjects), melt it to long, compute average per subject with groupby, and pivot back to wide.
  10. Build a dashboard-like summary:
    • Total sales per region
    • Count of unique products
    • Average sales per product per region (pivot table)
    • Long-format representation of all sales using melt()