Pandas Assignment– 3

Data Exploration & Summarization

Basic Questions

  1. Create a DataFrame with numeric columns ‘math’,’phy’,’chem’ for 5 students. Use df.describe() to print summary statistics.
  2. From the same DataFrame, compute column-wise mean using df.mean().
  3. Compute the median of the ‘math’ column using df[‘math’].median().
  4. Compute the mode of the ‘phy’ column using df[‘phy’].mode().
  5. Count non-null values in the entire DataFrame using df.count().
  6. Compute sum of each column in the DataFrame using df.sum().
  7. Find the minimum marks in ‘chem’ using df[‘chem’].min().
  8. Find the maximum marks in ‘math’ using df[‘math’].max().
  9. Create a Series [‘A’,’B’,’A’,’C’,’B’,’A’]. Print the unique values using .unique().
  10. Print the number of unique values in the Series using .nunique().
  11. Print frequency counts of values in the Series using .value_counts().
  12. Create a DataFrame with some missing values (NaN). Use df.isna() to check nulls.
  13. Use df.notna() to check non-null values.
  14. Count total missing values in each column using df.isna().sum().
  15. Create a DataFrame of 10 rows and use df.head() to display the first 5 rows.
  16. Use df.tail(3) to display the last 3 rows.
  17. Use df.sample(4) to display 4 random rows from the DataFrame.
  18. For a Series with values [1,2,3,3,4,4,4,5], compute value_counts() and print the most frequent value.
  19. Create a DataFrame of employees with ‘dept’ column. Print unique departments using df[‘dept’].unique().
  20. Count unique departments using df[‘dept’].nunique().

Intermediate Questions

  1. Create a DataFrame of sales with columns ‘product’,’region’,’sales’. Use df.describe(include=’all’) to inspect both numeric and categorical data.
  2. Compute row-wise sum of numeric columns in a DataFrame using df.sum(axis=1).
  3. Compute mean of ‘sales’ column grouped by ‘region’ using df.groupby(‘region’)[‘sales’].mean().
  4. For a numeric DataFrame, calculate column-wise min and max and print them side by side.
  5. Create a Series with duplicate categorical values and find the top 2 most frequent using .value_counts().head(2).
  6. Compare the outputs of df[‘col’].unique() and df[‘col’].nunique() for a categorical column.
  7. Generate a DataFrame with missing values; use df.isna().sum().sum() to count total null values in the DataFrame.
  8. Use df.notna().all() to check if all values in each column are non-null.
  9. Demonstrate that df[‘col’].count() excludes nulls while df[‘col’].size counts total including nulls.
  10. Use df.describe(percentiles=[0.1,0.25,0.75,0.9]) to compute custom percentile summary statistics.
  11. Create a DataFrame of student marks and compute mode for each column using df.mode().
  12. For a sales DataFrame, use .sum() and .count() to compute average sales without using .mean().
  13. Use .value_counts(normalize=True) on a Series to compute relative frequencies.
  14. Create a Series with NaN values and print how many are null using .isna().sum().
  15. Use .notna().sum() to count how many are not null.
  16. Print first 7 rows of a DataFrame using .head(7) and compare with .iloc[:7].
  17. Print last 4 rows of a DataFrame using .tail(4) and compare with .iloc[-4:].
  18. Use .sample(frac=0.5, random_state=1) to select half the rows randomly.
  19. Demonstrate .sample(n=3, replace=True) to randomly sample with replacement.
  20. Build a DataFrame and check the data types of each column with .dtypes alongside df.info().

Advanced Questions

  1. Build a DataFrame with mixed numeric and categorical columns; use .describe(include=’all’) and interpret the difference in summary for each dtype.
  2. Create a time series DataFrame (dates + values). Compute mean, median, min, and max grouped by month using groupby.
  3. Generate a DataFrame with 100 rows containing random integers and NaNs. Compute null counts, fill missing values with column median, and re-check with .isna().sum().
  4. For a survey dataset with ‘gender’ and ‘preference’ columns, use .value_counts(normalize=True) on both to get distribution percentages.
  5. Create a DataFrame of marks and find students who scored above the column-wise mean using boolean filtering.
  6. Build a DataFrame with missing categorical values. Use .isna() mask with .value_counts() to compute missing percentage per column.
  7. Use .head(), .tail(), and .sample() together to design a quick exploration report: print top 3, bottom 3, and 5 random rows.
  8. Write a function that prints descriptive statistics (mean, median, mode, min, max, unique count) for each column in a DataFrame. Apply it on a test DataFrame.
  9. Construct a DataFrame and verify if any column has only one unique value using .nunique() == 1.
  10. Create a DataFrame with dept, salary, and bonus. Group by dept and compute aggregate stats: sum, mean, min, max, and count in one .agg() call.