Seaborn Assignment– 6

Statistical Plots

Basic Questions

  1. Using tips, draw a linear regression fit of total_bill vs tip with sns.regplot; keep default confidence interval.
  2. Using tips, draw sns.regplot with no scatter points (scatter=False) to show only the regression line.
  3. With tips, draw sns.regplot and set ci=95; then repeat with ci=None on a second axes.
  4. Using tips, fit a polynomial regression of order 2 for total_bill vs tip using sns.regplot(order=2).
  5. Plot a residual plot (sns.residplot) for total_bill vs tip on tips.
  6. Using tips, show a robust linear fit with sns.regplot(robust=True) for total_bill vs tip.
  7. Using tips, fit a logistic regression with sns.regplot(logistic=True) predicting smoker (binary 0/1) from total_bill (create binary 0/1 column from smoker).
  8. Using tips, draw sns.lmplot of total_bill vs tip with hue=”sex” (separate lines per group).
  9. With tips, draw sns.lmplot of total_bill vs tip and show separate confidence bands for Lunch vs Dinner using hue=”time”.
  10. Using penguins, draw sns.regplot of bill_length_mm vs bill_depth_mm; set truncate=False to keep full line.
  11. Using penguins, add jitter along x in sns.regplot with x_jitter=.2 to reduce overlap.
  12. Create a facet linear fit with sns.lmplot(col=”sex”) on penguins for bill_length_mm vs bill_depth_mm.
  13. Using tips, plot sns.residplot with lowess=True to visualize nonlinearity in residuals.
  14. Using synthetic data (x linear, y with noise), draw sns.regplot and print slope/intercept estimated via np.polyfit on the plot title.
  15. Using tips, plot polynomial order=3 fit (order=3) and compare visually to order=1 (two subplots).
  16. Using tips, draw sns.lmplot with col=”day” to see if slopes vary by day.
  17. Using penguins, draw sns.regplot and set custom scatter transparency scatter_kws={“alpha”:0.5}.
  18. Using tips, draw sns.lmplot with markers=[“o”,”s”] for hue=”sex” and palette=”deep”.
  19. Using tips, draw sns.regplot and disable the confidence interval (ci=None) while increasing line width.
  20. Save any one regression figure as PNG and PDF (300 DPI).

Intermediate Questions

  1. Using tips, compare linear vs polynomial (order=2) fits in a 1×2 figure; keep identical axes limits for fair comparison.
  2. With tips, draw sns.lmplot of total_bill vs tip with hue=”smoker” and col=”time”; ensure independent regression lines per facet.
  3. Using penguins, build a robust regression (robust=True) vs a standard OLS on the same axes (two regplot calls with different linestyles).
  4. On tips, add bootstrap size n_boot=5000 in sns.regplot and compare CI width to default (two subplots).
  5. Create a binary target from tips: high_tip = (tip/total_bill >= 0.20); fit logistic regression curve vs total_bill using sns.regplot(logistic=True).
  6. Using tips, draw sns.residplot grouped by hue=”sex” via sns.lmplot residual-style workaround: plot two residual plots on stacked axes and label clearly.
  7. Using penguins, facet a polynomial fit with sns.lmplot(col=”species”, order=2); keep shared y-limits.
  8. With synthetic data containing an outlier cluster, compare OLS (robust=False) vs robust (robust=True) in one axes; annotate the outlier region.
  9. Using tips, compute tip percentage and regress tip_pct on total_bill; show CI bands and annotate slope on the axes.
  10. Using tips, plot sns.lmplot with hue=”day” and set legend_out=False; reposition the legend inside top-left.
  11. Using penguins, add scatter_kws={“s”:40, “edgecolor”:”k”} to improve readability; keep CI visible.
  12. Using tips, build a lowess-smoothed residual diagnostic by first fitting order=2 and then plotting residplot(lowess=True); discuss trend via caption text on the figure.
  13. With tips, draw separate logistic curves for sex using sns.lmplot(logistic=True, hue=”sex”) predicting high_tip from total_bill.
  14. Using synthetic data with heteroscedastic noise (variance grows with x), draw regplot and then plot residuals vs fitted values to diagnose heteroscedasticity.
  15. Using tips, limit the fit line to the data range via truncate=True; compare to truncate=False.
  16. Using penguins, fit polynomial order=3 and overlay a piecewise visual (manually plot two linear fits over ranges) to compare shapes.
  17. Using tips, demonstrate CI styling: one regplot with err_style=”band” and another emulating bars by manually drawing errorbar on aggregated bins.
  18. Using tips, add x_bins=10 in regplot to show binned means overlay; compare to standard scatter.
  19. Using penguins, build a 2×2 grid: (A) OLS fit, (B) Robust fit, (C) Residuals OLS, (D) Residuals Robust; harmonize titles and axes.
  20. Using tips, compute predictions from a simple np.poly1d fit and overlay them on the regplot line; verify alignment.

Advanced Questions

  1. Predictive Diagnostics Suite (OLS): On tips, create a 2×2 panel:

    • A: OLS regplot with CI

    • B: Residuals vs Fitted (residplot with fitted on x via manual pass)

    • C: QQ-style check (plot residual quantiles vs normal quantiles using Matplotlib)

    • D: Scale–Location (|resid|^0.5 vs fitted)
      Keep one theme and export PNG + PDF.

  2. Robust vs OLS Case Study: Inject synthetic outliers into tips (add 5 extreme total_bill/tip points). Build a 1×3 figure: OLS fit, Robust fit, Residuals comparison. Discuss impact with axis captions.

  3. Logistic Regression Dashboard: Construct a binary target high_tip and create a 2×2 figure:

    • A: logistic regplot curve vs total_bill

    • B: predicted probability vs actual class (scatter jittered)

    • C: residual-like plot using deviance residuals (compute manually)

    • D: threshold sweep plot (precision/recall vs threshold using Matplotlib)

  4. Polynomial Model Selection: Generate synthetic nonlinear data. Compare order=1,2,3,4 in a 2×2 lmplot/subplot grid. Annotate each panel with cross-validated MSE (computed) as figure text.

  5. Grouped Trends with Uncertainty: With penguins, create lmplot (hue=species, col=sex) and ensure consistent y-limits. Add per-panel slope text extracted from np.polyfit.

  6. Facet Residual Diagnostics: Build a FacetGrid over tips by time (Lunch/Dinner):

    • Top row: regplot fits

    • Bottom row: residplot for the same panels
      Share x/y axes and add a suptitle.

  7. Heteroscedasticity Illustration: Simulate y = x + ε·x where ε~N(0,σ). Show OLS fit with CI, then plot residuals vs x. Add a fitted LOWESS line (via sns.regplot(lowess=True, scatter=False) on residual panel).

  8. Logit with Multiple Groups: Build lmplot(logistic=True) predicting smoker from total_bill with hue=”sex” and col=”day”. Ensure balanced legends and add a single shared legend below using figure handles.

  9. End-to-End Publishing Figure: Create a 3-panel predictive story on tips:

    • Panel 1: polynomial fit (order=2) with CI

    • Panel 2: residual diagnostics

    • Panel 3: logistic high_tip curve
      Apply a publishing style (font scale, grid, palette) and export at 300 DPI PNG & SVG.

  10. Model Drift Scenario (Synthetic): Simulate two periods with different slopes. Create a side-by-side regplot comparison and a residuals delta plot. Annotate slope change and CI overlap.