Pandas Assignment– 3

Data Exploration & Summarization

Basic Questions

Create a DataFrame with numeric columns ‘math’,’phy’,’chem’ for 5 students. Use df.describe() to print summary statistics.
From the same DataFrame, compute column-wise mean using df.mean().
Compute the median of the ‘math’ column using df[‘math’].median().
Compute the mode of the ‘phy’ column using df[‘phy’].mode().
Count non-null values in the entire DataFrame using df.count().
Compute sum of each column in the DataFrame using df.sum().
Find the minimum marks in ‘chem’ using df[‘chem’].min().
Find the maximum marks in ‘math’ using df[‘math’].max().
Create a Series [‘A’,’B’,’A’,’C’,’B’,’A’]. Print the unique values using .unique().
Print the number of unique values in the Series using .nunique().
Print frequency counts of values in the Series using .value_counts().
Create a DataFrame with some missing values (NaN). Use df.isna() to check nulls.
Use df.notna() to check non-null values.
Count total missing values in each column using df.isna().sum().
Create a DataFrame of 10 rows and use df.head() to display the first 5 rows.
Use df.tail(3) to display the last 3 rows.
Use df.sample(4) to display 4 random rows from the DataFrame.
For a Series with values [1,2,3,3,4,4,4,5], compute value_counts() and print the most frequent value.
Create a DataFrame of employees with ‘dept’ column. Print unique departments using df[‘dept’].unique().
Count unique departments using df[‘dept’].nunique().

Intermediate Questions

Create a DataFrame of sales with columns ‘product’,’region’,’sales’. Use df.describe(include=’all’) to inspect both numeric and categorical data.
Compute row-wise sum of numeric columns in a DataFrame using df.sum(axis=1).
Compute mean of ‘sales’ column grouped by ‘region’ using df.groupby(‘region’)[‘sales’].mean().
For a numeric DataFrame, calculate column-wise min and max and print them side by side.
Create a Series with duplicate categorical values and find the top 2 most frequent using .value_counts().head(2).
Compare the outputs of df[‘col’].unique() and df[‘col’].nunique() for a categorical column.
Generate a DataFrame with missing values; use df.isna().sum().sum() to count total null values in the DataFrame.
Use df.notna().all() to check if all values in each column are non-null.
Demonstrate that df[‘col’].count() excludes nulls while df[‘col’].size counts total including nulls.
Use df.describe(percentiles=[0.1,0.25,0.75,0.9]) to compute custom percentile summary statistics.
Create a DataFrame of student marks and compute mode for each column using df.mode().
For a sales DataFrame, use .sum() and .count() to compute average sales without using .mean().
Use .value_counts(normalize=True) on a Series to compute relative frequencies.
Create a Series with NaN values and print how many are null using .isna().sum().
Use .notna().sum() to count how many are not null.
Print first 7 rows of a DataFrame using .head(7) and compare with .iloc[:7].
Print last 4 rows of a DataFrame using .tail(4) and compare with .iloc[-4:].
Use .sample(frac=0.5, random_state=1) to select half the rows randomly.
Demonstrate .sample(n=3, replace=True) to randomly sample with replacement.
Build a DataFrame and check the data types of each column with .dtypes alongside df.info().

Advanced Questions

Build a DataFrame with mixed numeric and categorical columns; use .describe(include=’all’) and interpret the difference in summary for each dtype.
Create a time series DataFrame (dates + values). Compute mean, median, min, and max grouped by month using groupby.
Generate a DataFrame with 100 rows containing random integers and NaNs. Compute null counts, fill missing values with column median, and re-check with .isna().sum().
For a survey dataset with ‘gender’ and ‘preference’ columns, use .value_counts(normalize=True) on both to get distribution percentages.
Create a DataFrame of marks and find students who scored above the column-wise mean using boolean filtering.
Build a DataFrame with missing categorical values. Use .isna() mask with .value_counts() to compute missing percentage per column.
Use .head(), .tail(), and .sample() together to design a quick exploration report: print top 3, bottom 3, and 5 random rows.
Write a function that prints descriptive statistics (mean, median, mode, min, max, unique count) for each column in a DataFrame. Apply it on a test DataFrame.
Construct a DataFrame and verify if any column has only one unique value using .nunique() == 1.
Create a DataFrame with dept, salary, and bonus. Group by dept and compute aggregate stats: sum, mean, min, max, and count in one .agg() call.