Pandas Assignment– 2

Indexing, Selection & Slicing

Basic Questions

Create a small DataFrame with columns [‘id’,’name’,’age’] and 4 rows; access the ‘name’ column using df[‘name’] and print the first two names.
Using the same DataFrame, select rows 0–1 and columns ‘id’ and ‘age’ with df.loc[0:1, [‘id’,’age’]].
Select rows 0–1 and columns by integer positions 0 and 2 using df.iloc[0:2, [0, 2]].
Access a single scalar by label using df.at[1, ‘age’]; then increment it by 1 and show the updated value.
Access a single scalar by integer position using df.iat[2, 0]; replace it with a new integer and show the row.
Compare label-based vs integer-based indexing by printing df.loc[0, ‘name’] and df.iloc[0, 1] and confirming they point to the same value.
Create a boolean mask df[‘age’] > 25 and use it to filter rows; print the filtered DataFrame.
Filter rows where ‘name’ is in a list (e.g., [‘A’,’C’]) using boolean indexing; print the result.
Use conditional filtering with two conditions: ‘age’ >= 20 and ‘age’ <= 30 combined with &; print the result.
Add a column ‘city’ and filter rows where ‘city’ equals a chosen value using df.loc[df[‘city’] == ‘X’].
Select the last two rows using df.iloc[-2:]; print them.
Reorder columns to [‘name’,’age’,’id’] using label-based indexing; print column order.
Use a slice of labels df.loc[1:3] and compare with df.iloc[1:4]; print both to note differences.
From a DataFrame with numeric columns ‘a’,’b’, select only ‘a’ using df[[‘a’]] (2D) and compare with df[‘a’] (1D) by printing .ndim.
Use df.loc[:, ‘age’] to select the ‘age’ Series and print its .head(1) and .index.dtype.
Create a simple boolean mask from string column using df[‘name’].str.startswith(‘A’); filter and print.
Use df.loc[df[‘age’].between(18, 30), [‘name’,’age’]] to print names of young adults.
Demonstrate chained vs single-step indexing: show why df[‘age’][0] is less preferred; instead use df.at[0, ‘age’] and print both values.
Build a DataFrame with a non-default index (e.g., [‘r1′,’r2′,’r3′,’r4’]); access the row with label ‘r3’ via df.loc[‘r3’].
With the same labeled index, access the third row by position using df.iloc[2]; print both label- and position-based selections for comparison.

Intermediate Questions

Create a DataFrame indexed by dates (use three consecutive dates as index) and columns [‘open’,’close’,’vol’]; select the middle date row with loc and the last row with iloc.
With the same DataFrame, select the ‘open’ and ‘vol’ columns for the first two rows using df.loc[df.index[:2], [‘open’,’vol’]].
Build a DataFrame of 6 rows; set ‘id’ as index; select a block with loc[start_label:end_label, [‘col1′,’col2’]].
Show mixed selections: use df.loc[df[‘col1’] > 50, ‘col2’] to get a filtered Series; then assign 0 to those positions and print changes.
Use iloc to select every other row (step slicing) and only the last two columns; print the result.
Combine boolean conditions (df[‘a’] % 2 == 0) & (df[‘b’] > df[‘a’]) to filter rows; print the subset.
Use df.query(‘age >= 25 and city == “X”‘) to filter; then recreate the same filter with boolean indexing and confirm equality (index/values).
Create a DataFrame with duplicate index labels; use df.loc[‘k’] to show all rows with that label; compare with df.iloc selection of the same positions.
Use df.loc[:, df.dtypes.eq(‘int64’)] to select only integer dtype columns; print selected columns.
Given columns ‘math’,’phy’,’chem’, select rows where any score is below 40 using df[[‘math’,’phy’,’chem’]].lt(40).any(axis=1); print the risky rows.
Select rows where all scores are at least 60 using .ge(60).all(axis=1); print names and scores.
Using a string column ‘dept’, filter rows not equal to ‘HR’ with ~(df[‘dept’] == ‘HR’); print remaining.
Slice rows by label range (inclusive) and columns by integer positions simultaneously: df.loc[‘r2′:’r5’, df.columns[[0,2]]]; print.
Use at to set a single scalar (e.g., row ‘r3’, column ‘status’) to ‘active’; verify with loc.
Use iat to increment a numeric cell by 10 at row position 2 and column position 1; print row before and after.
Demonstrate safe reindexing: create a new index adding a label that doesn’t exist; select with df.reindex(new_index).loc[‘missing’] and show NaN.
For a DataFrame with an index of integers not starting at 0 (e.g., [10,20,30,40]), show difference between df.loc[20] and df.iloc[1].
Use .between_time() equivalent logic: create a time-indexed DataFrame (hours), then loc-slice between ’10:00′ and ’14:00′; print the slice.
Show column slicing by name range using df.loc[:, ‘b’:’d’] versus explicit list df.loc[:, [‘b’,’c’,’d’]]; print both.
Build a DataFrame with a categorical column ‘grade’ in [‘A’,’B’,’C’]; filter rows where ‘grade’ is in [‘A’,’B’] using isin; print.

Advanced Questions

Construct a MultiIndex DataFrame with levels (‘city’=[‘DEL’,’MUM’], ‘dept’=[‘HR’,’ENG’]) and columns [’emp’,’salary’]; select ‘ENG’ rows in ‘DEL’ using df.loc[(‘DEL’,’ENG’), :].
From the same MultiIndex, select all rows for ‘MUM’ using partial indexing df.loc[‘MUM’]; then only the ‘salary’ column for both depts with df.loc[(‘MUM’, slice(None)), ‘salary’].
Create a 3-level MultiIndex (country, year, quarter); slice all rows for a given country and year across all quarters using pd.IndexSlice; print the block.
Swap MultiIndex levels (.swaplevel(0,1)) and sort the index; then select a label across the swapped level to show effect on selection.
Use .xs() (cross-section) on a MultiIndex to get a single dept across all city; compare with loc that achieves the same selection.
Build a MultiIndex on columns (e.g., metrics (‘min’,’max’,’avg’) under subjects) and select a column slice using df.loc[:, (slice(None), ‘avg’)].
Given a MultiIndex DataFrame of daily sales by (‘store’,’date’), filter stores where any day’s sales exceed 1000 using a groupwise boolean mask and df.loc[mask].
Create a large DataFrame and compare performance of boolean filtering using chained & vs precomputed masks stored in variables; time both and print timings.
Demonstrate aligned assignment on MultiIndex: select (‘DEL’,’HR’) rows and set ‘salary’ to df.loc[(‘DEL’,’HR’), ‘salary’] * 1.05; verify only those rows changed.
Convert a flat DataFrame to a MultiIndex by setting index to [‘city’,’dept’]; show selections with loc for single label, tuple label, label-slice, and list of labels; print each result.