Pandas Assignment– 7

Advanced Data Analysis

Basic Questions

Create a Series of 20 random integers between 1 and 100. Apply .rolling(3).mean() to compute 3-period moving average.
Use the same Series and compute .rolling(5).sum() to calculate rolling sums.
Create a Series of 10 numbers and apply .expanding().mean() to calculate cumulative mean.
Demonstrate .expanding().max() on a Series of random integers.
Create a categorical Series from [‘low’,’medium’,’high’,’medium’,’low’]. Convert it to category dtype using .astype(‘category’).
Print .cat.categories of the above categorical Series.
Print .cat.codes for the same categorical Series.
Reorder categories to [‘low’,’medium’,’high’] using .cat.reorder_categories().
Build a DataFrame with a MultiIndex (two levels: city and year). Print the DataFrame.
Use .stack() to move columns into index for a MultiIndex DataFrame.
Apply .unstack() to reverse the stacking process.
Use .swaplevel() to swap two levels of a MultiIndex.
Create a DataFrame with columns [‘gender’,’dept’] and some sample data. Use pd.crosstab(df[‘gender’], df[‘dept’]) to create a cross-tab.
Create a Series of exam marks (0–100). Bin marks into grade categories [‘Fail’,’Pass’,’Good’,’Excellent’] using pd.cut().
Use pd.qcut() to divide the same marks into 4 equal-frequency bins.
Print counts of values in each bin created by cut().
Use .transform() on a DataFrame column to standardize values (x – mean)/std.
Apply .transform(‘sum’) on grouped data to compute group-wise sums.
Create a function that scales a Series by dividing by its maximum. Apply it to a DataFrame column using .pipe().
Demonstrate chaining with .pipe() to apply multiple functions sequentially on a DataFrame.

Intermediate Questions

Create a Series of 50 random numbers. Apply .rolling(window=7, min_periods=1).mean() to handle edge cases at the start.
Compute .rolling(10).std() for the same Series to get rolling standard deviation.
Compare .rolling(5).mean() with .expanding().mean() on the same dataset.
Build a categorical DataFrame column ‘grade’ with repeated values. Convert to category dtype and display memory usage before and after conversion.
Use .cat.add_categories() to add a new category ‘very high’ to an existing categorical Series.
Create a MultiIndex DataFrame with two levels of rows (dept, team) and columns (year1, year2). Use .stack() and .unstack() to reshape.
Swap column MultiIndex levels using .swaplevel(axis=1) and print the new order.
Use pd.crosstab() with margins=True to add row and column totals.
Create a crosstab with normalization normalize=’index’ to show percentages.
Create 100 random marks and use pd.cut() into bins [0,40,60,80,100]. Count students per bin.
Use pd.qcut() on marks to split into quartiles and analyze frequencies.
Group a DataFrame by ‘dept’ and apply .transform(‘mean’) on salary to assign mean salary to each row.
Apply a custom lambda with .transform() to normalize salaries by subtracting group-wise min.
Define a function that returns (x – mean)/std and apply it to a column using .pipe().
Chain .pipe() with two different functions (e.g., fill missing then normalize) on a DataFrame column.
Create a time-series DataFrame and compute .rolling(30).mean() for monthly trend analysis.
Use .expanding().sum() to compute cumulative sales for a product dataset.
Generate a MultiIndex from product and region, reshape sales data with .stack() and .unstack() to switch perspectives.
Build a cross-tab of gender vs pass/fail status from a student dataset.
Apply cut() to bin employees into experience groups (0–2, 3–5, 6–10 years) and count employees per bin.

Advanced Questions

Generate a 100-day stock price series. Compute 7-day rolling mean, 30-day rolling mean, and plot both against original prices.
On a sales DataFrame with region as category, convert region column to category dtype, reassign codes, and compare groupby speed with object dtype.
Create a MultiIndex DataFrame (region, product) and compute total sales per product by unstacking and stacking in different ways.
Use .swaplevel() on a 3-level MultiIndex DataFrame and demonstrate how sorting affects selections.
Create a crosstab of employees (dept vs gender) and normalize results column-wise.
Bin customer ages into deciles using pd.qcut() and analyze average purchase amount per bin.
Use .transform() to calculate z-scores of marks within each class in a grouped DataFrame.
Build a small ETL pipeline with .pipe(): clean missing values, standardize numeric columns, and add a computed ratio column.
Combine .rolling() and .transform() to compute rolling z-scores of stock returns.
Create a MultiIndex sales dataset (region, product, quarter). Use .stack() and .unstack() to build a pivot-like structure, then apply group-wise .transform() to scale values within each region.