Pandas Assignment– 2
Indexing, Selection & Slicing
Basic Questions
- Create a small DataFrame with columns [‘id’,’name’,’age’] and 4 rows; access the ‘name’ column using df[‘name’] and print the first two names.
- Using the same DataFrame, select rows 0–1 and columns ‘id’ and ‘age’ with df.loc[0:1, [‘id’,’age’]].
- Select rows 0–1 and columns by integer positions 0 and 2 using df.iloc[0:2, [0, 2]].
- Access a single scalar by label using df.at[1, ‘age’]; then increment it by 1 and show the updated value.
- Access a single scalar by integer position using df.iat[2, 0]; replace it with a new integer and show the row.
- Compare label-based vs integer-based indexing by printing df.loc[0, ‘name’] and df.iloc[0, 1] and confirming they point to the same value.
- Create a boolean mask df[‘age’] > 25 and use it to filter rows; print the filtered DataFrame.
- Filter rows where ‘name’ is in a list (e.g., [‘A’,’C’]) using boolean indexing; print the result.
- Use conditional filtering with two conditions: ‘age’ >= 20 and ‘age’ <= 30 combined with &; print the result.
- Add a column ‘city’ and filter rows where ‘city’ equals a chosen value using df.loc[df[‘city’] == ‘X’].
- Select the last two rows using df.iloc[-2:]; print them.
- Reorder columns to [‘name’,’age’,’id’] using label-based indexing; print column order.
- Use a slice of labels df.loc[1:3] and compare with df.iloc[1:4]; print both to note differences.
- From a DataFrame with numeric columns ‘a’,’b’, select only ‘a’ using df[[‘a’]] (2D) and compare with df[‘a’] (1D) by printing .ndim.
- Use df.loc[:, ‘age’] to select the ‘age’ Series and print its .head(1) and .index.dtype.
- Create a simple boolean mask from string column using df[‘name’].str.startswith(‘A’); filter and print.
- Use df.loc[df[‘age’].between(18, 30), [‘name’,’age’]] to print names of young adults.
- Demonstrate chained vs single-step indexing: show why df[‘age’][0] is less preferred; instead use df.at[0, ‘age’] and print both values.
- Build a DataFrame with a non-default index (e.g., [‘r1′,’r2′,’r3′,’r4’]); access the row with label ‘r3’ via df.loc[‘r3’].
- With the same labeled index, access the third row by position using df.iloc[2]; print both label- and position-based selections for comparison.
Intermediate Questions
- Create a DataFrame indexed by dates (use three consecutive dates as index) and columns [‘open’,’close’,’vol’]; select the middle date row with loc and the last row with iloc.
- With the same DataFrame, select the ‘open’ and ‘vol’ columns for the first two rows using df.loc[df.index[:2], [‘open’,’vol’]].
- Build a DataFrame of 6 rows; set ‘id’ as index; select a block with loc[start_label:end_label, [‘col1′,’col2’]].
- Show mixed selections: use df.loc[df[‘col1’] > 50, ‘col2’] to get a filtered Series; then assign 0 to those positions and print changes.
- Use iloc to select every other row (step slicing) and only the last two columns; print the result.
- Combine boolean conditions (df[‘a’] % 2 == 0) & (df[‘b’] > df[‘a’]) to filter rows; print the subset.
- Use df.query(‘age >= 25 and city == “X”‘) to filter; then recreate the same filter with boolean indexing and confirm equality (index/values).
- Create a DataFrame with duplicate index labels; use df.loc[‘k’] to show all rows with that label; compare with df.iloc selection of the same positions.
- Use df.loc[:, df.dtypes.eq(‘int64’)] to select only integer dtype columns; print selected columns.
- Given columns ‘math’,’phy’,’chem’, select rows where any score is below 40 using df[[‘math’,’phy’,’chem’]].lt(40).any(axis=1); print the risky rows.
- Select rows where all scores are at least 60 using .ge(60).all(axis=1); print names and scores.
- Using a string column ‘dept’, filter rows not equal to ‘HR’ with ~(df[‘dept’] == ‘HR’); print remaining.
- Slice rows by label range (inclusive) and columns by integer positions simultaneously: df.loc[‘r2′:’r5’, df.columns[[0,2]]]; print.
- Use at to set a single scalar (e.g., row ‘r3’, column ‘status’) to ‘active’; verify with loc.
- Use iat to increment a numeric cell by 10 at row position 2 and column position 1; print row before and after.
- Demonstrate safe reindexing: create a new index adding a label that doesn’t exist; select with df.reindex(new_index).loc[‘missing’] and show NaN.
- For a DataFrame with an index of integers not starting at 0 (e.g., [10,20,30,40]), show difference between df.loc[20] and df.iloc[1].
- Use .between_time() equivalent logic: create a time-indexed DataFrame (hours), then loc-slice between ’10:00′ and ’14:00′; print the slice.
- Show column slicing by name range using df.loc[:, ‘b’:’d’] versus explicit list df.loc[:, [‘b’,’c’,’d’]]; print both.
- Build a DataFrame with a categorical column ‘grade’ in [‘A’,’B’,’C’]; filter rows where ‘grade’ is in [‘A’,’B’] using isin; print.
Advanced Questions
- Construct a MultiIndex DataFrame with levels (‘city’=[‘DEL’,’MUM’], ‘dept’=[‘HR’,’ENG’]) and columns [’emp’,’salary’]; select ‘ENG’ rows in ‘DEL’ using df.loc[(‘DEL’,’ENG’), :].
- From the same MultiIndex, select all rows for ‘MUM’ using partial indexing df.loc[‘MUM’]; then only the ‘salary’ column for both depts with df.loc[(‘MUM’, slice(None)), ‘salary’].
- Create a 3-level MultiIndex (country, year, quarter); slice all rows for a given country and year across all quarters using pd.IndexSlice; print the block.
- Swap MultiIndex levels (.swaplevel(0,1)) and sort the index; then select a label across the swapped level to show effect on selection.
- Use .xs() (cross-section) on a MultiIndex to get a single dept across all city; compare with loc that achieves the same selection.
- Build a MultiIndex on columns (e.g., metrics (‘min’,’max’,’avg’) under subjects) and select a column slice using df.loc[:, (slice(None), ‘avg’)].
- Given a MultiIndex DataFrame of daily sales by (‘store’,’date’), filter stores where any day’s sales exceed 1000 using a groupwise boolean mask and df.loc[mask].
- Create a large DataFrame and compare performance of boolean filtering using chained & vs precomputed masks stored in variables; time both and print timings.
- Demonstrate aligned assignment on MultiIndex: select (‘DEL’,’HR’) rows and set ‘salary’ to df.loc[(‘DEL’,’HR’), ‘salary’] * 1.05; verify only those rows changed.
- Convert a flat DataFrame to a MultiIndex by setting index to [‘city’,’dept’]; show selections with loc for single label, tuple label, label-slice, and list of labels; print each result.