Pandas Assignment– 3
Data Exploration & Summarization
Basic Questions
- Create a DataFrame with numeric columns ‘math’,’phy’,’chem’ for 5 students. Use df.describe() to print summary statistics.
- From the same DataFrame, compute column-wise mean using df.mean().
- Compute the median of the ‘math’ column using df[‘math’].median().
- Compute the mode of the ‘phy’ column using df[‘phy’].mode().
- Count non-null values in the entire DataFrame using df.count().
- Compute sum of each column in the DataFrame using df.sum().
- Find the minimum marks in ‘chem’ using df[‘chem’].min().
- Find the maximum marks in ‘math’ using df[‘math’].max().
- Create a Series [‘A’,’B’,’A’,’C’,’B’,’A’]. Print the unique values using .unique().
- Print the number of unique values in the Series using .nunique().
- Print frequency counts of values in the Series using .value_counts().
- Create a DataFrame with some missing values (NaN). Use df.isna() to check nulls.
- Use df.notna() to check non-null values.
- Count total missing values in each column using df.isna().sum().
- Create a DataFrame of 10 rows and use df.head() to display the first 5 rows.
- Use df.tail(3) to display the last 3 rows.
- Use df.sample(4) to display 4 random rows from the DataFrame.
- For a Series with values [1,2,3,3,4,4,4,5], compute value_counts() and print the most frequent value.
- Create a DataFrame of employees with ‘dept’ column. Print unique departments using df[‘dept’].unique().
- Count unique departments using df[‘dept’].nunique().
Intermediate Questions
- Create a DataFrame of sales with columns ‘product’,’region’,’sales’. Use df.describe(include=’all’) to inspect both numeric and categorical data.
- Compute row-wise sum of numeric columns in a DataFrame using df.sum(axis=1).
- Compute mean of ‘sales’ column grouped by ‘region’ using df.groupby(‘region’)[‘sales’].mean().
- For a numeric DataFrame, calculate column-wise min and max and print them side by side.
- Create a Series with duplicate categorical values and find the top 2 most frequent using .value_counts().head(2).
- Compare the outputs of df[‘col’].unique() and df[‘col’].nunique() for a categorical column.
- Generate a DataFrame with missing values; use df.isna().sum().sum() to count total null values in the DataFrame.
- Use df.notna().all() to check if all values in each column are non-null.
- Demonstrate that df[‘col’].count() excludes nulls while df[‘col’].size counts total including nulls.
- Use df.describe(percentiles=[0.1,0.25,0.75,0.9]) to compute custom percentile summary statistics.
- Create a DataFrame of student marks and compute mode for each column using df.mode().
- For a sales DataFrame, use .sum() and .count() to compute average sales without using .mean().
- Use .value_counts(normalize=True) on a Series to compute relative frequencies.
- Create a Series with NaN values and print how many are null using .isna().sum().
- Use .notna().sum() to count how many are not null.
- Print first 7 rows of a DataFrame using .head(7) and compare with .iloc[:7].
- Print last 4 rows of a DataFrame using .tail(4) and compare with .iloc[-4:].
- Use .sample(frac=0.5, random_state=1) to select half the rows randomly.
- Demonstrate .sample(n=3, replace=True) to randomly sample with replacement.
- Build a DataFrame and check the data types of each column with .dtypes alongside df.info().
Advanced Questions
- Build a DataFrame with mixed numeric and categorical columns; use .describe(include=’all’) and interpret the difference in summary for each dtype.
- Create a time series DataFrame (dates + values). Compute mean, median, min, and max grouped by month using groupby.
- Generate a DataFrame with 100 rows containing random integers and NaNs. Compute null counts, fill missing values with column median, and re-check with .isna().sum().
- For a survey dataset with ‘gender’ and ‘preference’ columns, use .value_counts(normalize=True) on both to get distribution percentages.
- Create a DataFrame of marks and find students who scored above the column-wise mean using boolean filtering.
- Build a DataFrame with missing categorical values. Use .isna() mask with .value_counts() to compute missing percentage per column.
- Use .head(), .tail(), and .sample() together to design a quick exploration report: print top 3, bottom 3, and 5 random rows.
- Write a function that prints descriptive statistics (mean, median, mode, min, max, unique count) for each column in a DataFrame. Apply it on a test DataFrame.
- Construct a DataFrame and verify if any column has only one unique value using .nunique() == 1.
- Create a DataFrame with dept, salary, and bonus. Group by dept and compute aggregate stats: sum, mean, min, max, and count in one .agg() call.