Pandas Assignment– 7
Advanced Data Analysis
Basic Questions
- Create a Series of 20 random integers between 1 and 100. Apply .rolling(3).mean() to compute 3-period moving average.
- Use the same Series and compute .rolling(5).sum() to calculate rolling sums.
- Create a Series of 10 numbers and apply .expanding().mean() to calculate cumulative mean.
- Demonstrate .expanding().max() on a Series of random integers.
- Create a categorical Series from [‘low’,’medium’,’high’,’medium’,’low’]. Convert it to category dtype using .astype(‘category’).
- Print .cat.categories of the above categorical Series.
- Print .cat.codes for the same categorical Series.
- Reorder categories to [‘low’,’medium’,’high’] using .cat.reorder_categories().
- Build a DataFrame with a MultiIndex (two levels: city and year). Print the DataFrame.
- Use .stack() to move columns into index for a MultiIndex DataFrame.
- Apply .unstack() to reverse the stacking process.
- Use .swaplevel() to swap two levels of a MultiIndex.
- Create a DataFrame with columns [‘gender’,’dept’] and some sample data. Use pd.crosstab(df[‘gender’], df[‘dept’]) to create a cross-tab.
- Create a Series of exam marks (0–100). Bin marks into grade categories [‘Fail’,’Pass’,’Good’,’Excellent’] using pd.cut().
- Use pd.qcut() to divide the same marks into 4 equal-frequency bins.
- Print counts of values in each bin created by cut().
- Use .transform() on a DataFrame column to standardize values (x – mean)/std.
- Apply .transform(‘sum’) on grouped data to compute group-wise sums.
- Create a function that scales a Series by dividing by its maximum. Apply it to a DataFrame column using .pipe().
- Demonstrate chaining with .pipe() to apply multiple functions sequentially on a DataFrame.
Intermediate Questions
- Create a Series of 50 random numbers. Apply .rolling(window=7, min_periods=1).mean() to handle edge cases at the start.
- Compute .rolling(10).std() for the same Series to get rolling standard deviation.
- Compare .rolling(5).mean() with .expanding().mean() on the same dataset.
- Build a categorical DataFrame column ‘grade’ with repeated values. Convert to category dtype and display memory usage before and after conversion.
- Use .cat.add_categories() to add a new category ‘very high’ to an existing categorical Series.
- Create a MultiIndex DataFrame with two levels of rows (dept, team) and columns (year1, year2). Use .stack() and .unstack() to reshape.
- Swap column MultiIndex levels using .swaplevel(axis=1) and print the new order.
- Use pd.crosstab() with margins=True to add row and column totals.
- Create a crosstab with normalization normalize=’index’ to show percentages.
- Create 100 random marks and use pd.cut() into bins [0,40,60,80,100]. Count students per bin.
- Use pd.qcut() on marks to split into quartiles and analyze frequencies.
- Group a DataFrame by ‘dept’ and apply .transform(‘mean’) on salary to assign mean salary to each row.
- Apply a custom lambda with .transform() to normalize salaries by subtracting group-wise min.
- Define a function that returns (x – mean)/std and apply it to a column using .pipe().
- Chain .pipe() with two different functions (e.g., fill missing then normalize) on a DataFrame column.
- Create a time-series DataFrame and compute .rolling(30).mean() for monthly trend analysis.
- Use .expanding().sum() to compute cumulative sales for a product dataset.
- Generate a MultiIndex from product and region, reshape sales data with .stack() and .unstack() to switch perspectives.
- Build a cross-tab of gender vs pass/fail status from a student dataset.
- Apply cut() to bin employees into experience groups (0–2, 3–5, 6–10 years) and count employees per bin.
Advanced Questions
- Generate a 100-day stock price series. Compute 7-day rolling mean, 30-day rolling mean, and plot both against original prices.
- On a sales DataFrame with region as category, convert region column to category dtype, reassign codes, and compare groupby speed with object dtype.
- Create a MultiIndex DataFrame (region, product) and compute total sales per product by unstacking and stacking in different ways.
- Use .swaplevel() on a 3-level MultiIndex DataFrame and demonstrate how sorting affects selections.
- Create a crosstab of employees (dept vs gender) and normalize results column-wise.
- Bin customer ages into deciles using pd.qcut() and analyze average purchase amount per bin.
- Use .transform() to calculate z-scores of marks within each class in a grouped DataFrame.
- Build a small ETL pipeline with .pipe(): clean missing values, standardize numeric columns, and add a computed ratio column.
- Combine .rolling() and .transform() to compute rolling z-scores of stock returns.
- Create a MultiIndex sales dataset (region, product, quarter). Use .stack() and .unstack() to build a pivot-like structure, then apply group-wise .transform() to scale values within each region.