Pandas Assignment– 7

Advanced Data Analysis

Basic Questions

  1. Create a Series of 20 random integers between 1 and 100. Apply .rolling(3).mean() to compute 3-period moving average.
  2. Use the same Series and compute .rolling(5).sum() to calculate rolling sums.
  3. Create a Series of 10 numbers and apply .expanding().mean() to calculate cumulative mean.
  4. Demonstrate .expanding().max() on a Series of random integers.
  5. Create a categorical Series from [‘low’,’medium’,’high’,’medium’,’low’]. Convert it to category dtype using .astype(‘category’).
  6. Print .cat.categories of the above categorical Series.
  7. Print .cat.codes for the same categorical Series.
  8. Reorder categories to [‘low’,’medium’,’high’] using .cat.reorder_categories().
  9. Build a DataFrame with a MultiIndex (two levels: city and year). Print the DataFrame.
  10. Use .stack() to move columns into index for a MultiIndex DataFrame.
  11. Apply .unstack() to reverse the stacking process.
  12. Use .swaplevel() to swap two levels of a MultiIndex.
  13. Create a DataFrame with columns [‘gender’,’dept’] and some sample data. Use pd.crosstab(df[‘gender’], df[‘dept’]) to create a cross-tab.
  14. Create a Series of exam marks (0–100). Bin marks into grade categories [‘Fail’,’Pass’,’Good’,’Excellent’] using pd.cut().
  15. Use pd.qcut() to divide the same marks into 4 equal-frequency bins.
  16. Print counts of values in each bin created by cut().
  17. Use .transform() on a DataFrame column to standardize values (x – mean)/std.
  18. Apply .transform(‘sum’) on grouped data to compute group-wise sums.
  19. Create a function that scales a Series by dividing by its maximum. Apply it to a DataFrame column using .pipe().
  20. Demonstrate chaining with .pipe() to apply multiple functions sequentially on a DataFrame.

Intermediate Questions

  1. Create a Series of 50 random numbers. Apply .rolling(window=7, min_periods=1).mean() to handle edge cases at the start.
  2. Compute .rolling(10).std() for the same Series to get rolling standard deviation.
  3. Compare .rolling(5).mean() with .expanding().mean() on the same dataset.
  4. Build a categorical DataFrame column ‘grade’ with repeated values. Convert to category dtype and display memory usage before and after conversion.
  5. Use .cat.add_categories() to add a new category ‘very high’ to an existing categorical Series.
  6. Create a MultiIndex DataFrame with two levels of rows (dept, team) and columns (year1, year2). Use .stack() and .unstack() to reshape.
  7. Swap column MultiIndex levels using .swaplevel(axis=1) and print the new order.
  8. Use pd.crosstab() with margins=True to add row and column totals.
  9. Create a crosstab with normalization normalize=’index’ to show percentages.
  10. Create 100 random marks and use pd.cut() into bins [0,40,60,80,100]. Count students per bin.
  11. Use pd.qcut() on marks to split into quartiles and analyze frequencies.
  12. Group a DataFrame by ‘dept’ and apply .transform(‘mean’) on salary to assign mean salary to each row.
  13. Apply a custom lambda with .transform() to normalize salaries by subtracting group-wise min.
  14. Define a function that returns (x – mean)/std and apply it to a column using .pipe().
  15. Chain .pipe() with two different functions (e.g., fill missing then normalize) on a DataFrame column.
  16. Create a time-series DataFrame and compute .rolling(30).mean() for monthly trend analysis.
  17. Use .expanding().sum() to compute cumulative sales for a product dataset.
  18. Generate a MultiIndex from product and region, reshape sales data with .stack() and .unstack() to switch perspectives.
  19. Build a cross-tab of gender vs pass/fail status from a student dataset.
  20. Apply cut() to bin employees into experience groups (0–2, 3–5, 6–10 years) and count employees per bin.

Advanced Questions

  1. Generate a 100-day stock price series. Compute 7-day rolling mean, 30-day rolling mean, and plot both against original prices.
  2. On a sales DataFrame with region as category, convert region column to category dtype, reassign codes, and compare groupby speed with object dtype.
  3. Create a MultiIndex DataFrame (region, product) and compute total sales per product by unstacking and stacking in different ways.
  4. Use .swaplevel() on a 3-level MultiIndex DataFrame and demonstrate how sorting affects selections.
  5. Create a crosstab of employees (dept vs gender) and normalize results column-wise.
  6. Bin customer ages into deciles using pd.qcut() and analyze average purchase amount per bin.
  7. Use .transform() to calculate z-scores of marks within each class in a grouped DataFrame.
  8. Build a small ETL pipeline with .pipe(): clean missing values, standardize numeric columns, and add a computed ratio column.
  9. Combine .rolling() and .transform() to compute rolling z-scores of stock returns.
  10. Create a MultiIndex sales dataset (region, product, quarter). Use .stack() and .unstack() to build a pivot-like structure, then apply group-wise .transform() to scale values within each region.