Pandas Assignments — Load. Transform. Analyze.

Move from beginner data wrangling to production-ready analytics. Each topic-wise set includes 20 Basic, 20 Intermediate, and 10 Advanced questions so learners can practice deliberately and level up fast.

Your eight attached assignments span: basics & I/O, indexing/selection, exploration & summarization, cleaning & preprocessing, data manipulation (sort/merge/group/pivot/reshape), time series, advanced analysis (categoricals, MultiIndex, rolling/expanding, pipe/transform), and exporting & optimization.

Why practice with these assignments?

  • Go beyond theory—ingest CSV/JSON/Excel/SQL, slice & filter, group & pivot, and build tidy data pipelines.
  • Develop real habits: null handling, dtype control, performance profiling, and memory-savvy workflows.
  • Get job-ready with time series, MultiIndex, categoricals, and export/round-trip reliability.

How it works

  • Open any assignment and attempt questions in order: Basic → Intermediate → Advanced.
  • Run in a notebook or REPL; print shape/dtypes and sample outputs to verify steps.
  • Compare multiple approaches (e.g., query vs boolean masks, eval vs direct ops); record timings and notes.
  • Keep a short “what I learned” note per task—syntax + reasoning + gotchas.

What you’ll achieve

  • Fluency with Series/DataFrame creation, I/O (CSV/JSON/Excel/HTML/SQL), and dtype management.
  • Confident indexing/selection: loc/iloc/at/iat, boolean masks, label vs position, and safe assignments.
  • Practical describe/groupby/agg/value_counts exploration skills.
  • Reliable cleaning: missing values, duplicates, renaming, string ops, vectorized transforms.
  • Powerful manipulation: sort/reindex, concat/merge/join, groupby, pivot/pivot_table, melt/stack/unstack.
  • Solid time series: to_datetime, resample/rolling/expanding, lags & returns, business frequencies.
  • Advanced analysis with categoricals, MultiIndex, crosstabs, bins, transform/pipe.
  • Performance & export proficiency: chunksize, dtype downcasting, eval/query, and round-trips to CSV/Excel/SQL/HTML/JSON.

Browse the Assignments

  • Pandas Basics & I/O — Series/DataFrame creation; CSV/JSON/Excel/HTML/SQL round-trips; memory introspection.
  • Indexing, Selection & Slicing — loc/iloc/at/iat, masks, label vs position, chained vs single-step, non-default indexes.
  • Data Exploration & Summarization — describe, counts/uniques, custom percentiles, group summaries, sampling.
  • Data Cleaning & Preprocessing — dropna/fillna, duplicates, astype, string ops, apply/pipe/applymap.
  • Data Manipulation — sort/reindex, concat/merge/join, groupby/agg, pivot/pivot_table, melt/stack/unstack.
  • Time Series & Date Handling — parsing dates, frequencies, resample/rolling/expanding, lags, monthly/quarterly analyses.
  • Advanced Data Analysis — categoricals, MultiIndex, crosstab, cut/qcut, transform, chaining with pipe.
  • Exporting & Optimization — CSV/Excel/JSON/HTML/SQL; chunksize pipelines; dtype downcast; eval/query; memory & speed tests.

Tips for success

  • Print df.shape/df.dtypes early; most bugs are shape/dtype mismatches.
  • Prefer vectorized ops; benchmark vs loops and keep notes.
  • Be explicit with dtypes on load (dtype=…, parse_dates=…) and when converting (astype, categoricals).
  • Use loc for assignment; avoid chained indexing.
  • For large data, stream with chunksize and downcast early; measure memory with memory_usage(deep=True).

Ready to build real confidence in Pandas? Pick a set below and start solving!

FAQs

Q1. Which Python/Pandas versions should I use?
Any recent Python 3 with a current Pandas release works. All tasks use mainstream APIs such as I/O, groupby, pivot, resample, and categoricals supported in modern versions.

Q2. How do I verify my answers?
Check shapes and dtypes, preview rows, and compare results using equals, all(), or assert_frame_equal for merges and round-trips. Many tasks include explicit checks.

Q3. Why are my string columns object and not “string”?
Historically, DataFrames defaulted to object dtype for strings. Convert to Pandas’ string dtype for consistency, memory efficiency, and clearer semantics.

Q4. What’s the safest way to handle missing data?
Quantify with .isna().sum(), then choose a strategy such as dropna, mean/median/mode fills, or forward/backward fill. Exercises demonstrate each approach.

Q5. When should I use merge vs join vs concat?
Use merge for key-based column joins, join for index-based joins, and concat to stack or align along rows and columns. You’ll practice all three with edge cases.

Q6. loc or iloc—which is better?
Use loc for label-based selection and assignment, and iloc for position-based slices. Avoid chained indexing to prevent unintended copies.

Q7. How do I speed up large workflows?
Load only needed columns with usecols, stream with chunksize, downcast dtypes, convert low-cardinality text to category, and apply eval or query where useful.

Q8. What’s the difference between resample and rolling?
resample changes frequency (e.g., daily to weekly) and aggregates, while rolling computes windowed statistics at the original frequency. Assignments include comparisons.

Q9. How do I avoid “SettingWithCopy” warnings?
Use single-step indexing such as df.loc[mask, 'col'] = … and return new objects from transforms. Assignments highlight safe coding patterns.

Q10. How much time should I allocate per assignment?
Plan 10–60 minutes per difficulty band. Time series, reshaping, and optimization exercises may take longer—prioritize correctness and clarity.