Pandas Assignment– 2

Indexing, Selection & Slicing

Basic Questions

  1. Create a small DataFrame with columns [‘id’,’name’,’age’] and 4 rows; access the ‘name’ column using df[‘name’] and print the first two names.
  2. Using the same DataFrame, select rows 0–1 and columns ‘id’ and ‘age’ with df.loc[0:1, [‘id’,’age’]].
  3. Select rows 0–1 and columns by integer positions 0 and 2 using df.iloc[0:2, [0, 2]].
  4. Access a single scalar by label using df.at[1, ‘age’]; then increment it by 1 and show the updated value.
  5. Access a single scalar by integer position using df.iat[2, 0]; replace it with a new integer and show the row.
  6. Compare label-based vs integer-based indexing by printing df.loc[0, ‘name’] and df.iloc[0, 1] and confirming they point to the same value.
  7. Create a boolean mask df[‘age’] > 25 and use it to filter rows; print the filtered DataFrame.
  8. Filter rows where ‘name’ is in a list (e.g., [‘A’,’C’]) using boolean indexing; print the result.
  9. Use conditional filtering with two conditions: ‘age’ >= 20 and ‘age’ <= 30 combined with &; print the result.
  10. Add a column ‘city’ and filter rows where ‘city’ equals a chosen value using df.loc[df[‘city’] == ‘X’].
  11. Select the last two rows using df.iloc[-2:]; print them.
  12. Reorder columns to [‘name’,’age’,’id’] using label-based indexing; print column order.
  13. Use a slice of labels df.loc[1:3] and compare with df.iloc[1:4]; print both to note differences.
  14. From a DataFrame with numeric columns ‘a’,’b’, select only ‘a’ using df[[‘a’]] (2D) and compare with df[‘a’] (1D) by printing .ndim.
  15. Use df.loc[:, ‘age’] to select the ‘age’ Series and print its .head(1) and .index.dtype.
  16. Create a simple boolean mask from string column using df[‘name’].str.startswith(‘A’); filter and print.
  17. Use df.loc[df[‘age’].between(18, 30), [‘name’,’age’]] to print names of young adults.
  18. Demonstrate chained vs single-step indexing: show why df[‘age’][0] is less preferred; instead use df.at[0, ‘age’] and print both values.
  19. Build a DataFrame with a non-default index (e.g., [‘r1′,’r2′,’r3′,’r4’]); access the row with label ‘r3’ via df.loc[‘r3’].
  20. With the same labeled index, access the third row by position using df.iloc[2]; print both label- and position-based selections for comparison.

Intermediate Questions

  1. Create a DataFrame indexed by dates (use three consecutive dates as index) and columns [‘open’,’close’,’vol’]; select the middle date row with loc and the last row with iloc.
  2. With the same DataFrame, select the ‘open’ and ‘vol’ columns for the first two rows using df.loc[df.index[:2], [‘open’,’vol’]].
  3. Build a DataFrame of 6 rows; set ‘id’ as index; select a block with loc[start_label:end_label, [‘col1′,’col2’]].
  4. Show mixed selections: use df.loc[df[‘col1’] > 50, ‘col2’] to get a filtered Series; then assign 0 to those positions and print changes.
  5. Use iloc to select every other row (step slicing) and only the last two columns; print the result.
  6. Combine boolean conditions (df[‘a’] % 2 == 0) & (df[‘b’] > df[‘a’]) to filter rows; print the subset.
  7. Use df.query(‘age >= 25 and city == “X”‘) to filter; then recreate the same filter with boolean indexing and confirm equality (index/values).
  8. Create a DataFrame with duplicate index labels; use df.loc[‘k’] to show all rows with that label; compare with df.iloc selection of the same positions.
  9. Use df.loc[:, df.dtypes.eq(‘int64’)] to select only integer dtype columns; print selected columns.
  10. Given columns ‘math’,’phy’,’chem’, select rows where any score is below 40 using df[[‘math’,’phy’,’chem’]].lt(40).any(axis=1); print the risky rows.
  11. Select rows where all scores are at least 60 using .ge(60).all(axis=1); print names and scores.
  12. Using a string column ‘dept’, filter rows not equal to ‘HR’ with ~(df[‘dept’] == ‘HR’); print remaining.
  13. Slice rows by label range (inclusive) and columns by integer positions simultaneously: df.loc[‘r2′:’r5’, df.columns[[0,2]]]; print.
  14. Use at to set a single scalar (e.g., row ‘r3’, column ‘status’) to ‘active’; verify with loc.
  15. Use iat to increment a numeric cell by 10 at row position 2 and column position 1; print row before and after.
  16. Demonstrate safe reindexing: create a new index adding a label that doesn’t exist; select with df.reindex(new_index).loc[‘missing’] and show NaN.
  17. For a DataFrame with an index of integers not starting at 0 (e.g., [10,20,30,40]), show difference between df.loc[20] and df.iloc[1].
  18. Use .between_time() equivalent logic: create a time-indexed DataFrame (hours), then loc-slice between ’10:00′ and ’14:00′; print the slice.
  19. Show column slicing by name range using df.loc[:, ‘b’:’d’] versus explicit list df.loc[:, [‘b’,’c’,’d’]]; print both.
  20. Build a DataFrame with a categorical column ‘grade’ in [‘A’,’B’,’C’]; filter rows where ‘grade’ is in [‘A’,’B’] using isin; print.

Advanced Questions

  1. Construct a MultiIndex DataFrame with levels (‘city’=[‘DEL’,’MUM’], ‘dept’=[‘HR’,’ENG’]) and columns [’emp’,’salary’]; select ‘ENG’ rows in ‘DEL’ using df.loc[(‘DEL’,’ENG’), :].
  2. From the same MultiIndex, select all rows for ‘MUM’ using partial indexing df.loc[‘MUM’]; then only the ‘salary’ column for both depts with df.loc[(‘MUM’, slice(None)), ‘salary’].
  3. Create a 3-level MultiIndex (country, year, quarter); slice all rows for a given country and year across all quarters using pd.IndexSlice; print the block.
  4. Swap MultiIndex levels (.swaplevel(0,1)) and sort the index; then select a label across the swapped level to show effect on selection.
  5. Use .xs() (cross-section) on a MultiIndex to get a single dept across all city; compare with loc that achieves the same selection.
  6. Build a MultiIndex on columns (e.g., metrics (‘min’,’max’,’avg’) under subjects) and select a column slice using df.loc[:, (slice(None), ‘avg’)].
  7. Given a MultiIndex DataFrame of daily sales by (‘store’,’date’), filter stores where any day’s sales exceed 1000 using a groupwise boolean mask and df.loc[mask].
  8. Create a large DataFrame and compare performance of boolean filtering using chained & vs precomputed masks stored in variables; time both and print timings.
  9. Demonstrate aligned assignment on MultiIndex: select (‘DEL’,’HR’) rows and set ‘salary’ to df.loc[(‘DEL’,’HR’), ‘salary’] * 1.05; verify only those rows changed.
  10. Convert a flat DataFrame to a MultiIndex by setting index to [‘city’,’dept’]; show selections with loc for single label, tuple label, label-slice, and list of labels; print each result.