Seaborn Assignment– 6
Statistical Plots
Basic Questions
- Using tips, draw a linear regression fit of total_bill vs tip with sns.regplot; keep default confidence interval.
- Using tips, draw sns.regplot with no scatter points (scatter=False) to show only the regression line.
- With tips, draw sns.regplot and set ci=95; then repeat with ci=None on a second axes.
- Using tips, fit a polynomial regression of order 2 for total_bill vs tip using sns.regplot(order=2).
- Plot a residual plot (sns.residplot) for total_bill vs tip on tips.
- Using tips, show a robust linear fit with sns.regplot(robust=True) for total_bill vs tip.
- Using tips, fit a logistic regression with sns.regplot(logistic=True) predicting smoker (binary 0/1) from total_bill (create binary 0/1 column from smoker).
- Using tips, draw sns.lmplot of total_bill vs tip with hue=”sex” (separate lines per group).
- With tips, draw sns.lmplot of total_bill vs tip and show separate confidence bands for Lunch vs Dinner using hue=”time”.
- Using penguins, draw sns.regplot of bill_length_mm vs bill_depth_mm; set truncate=False to keep full line.
- Using penguins, add jitter along x in sns.regplot with x_jitter=.2 to reduce overlap.
- Create a facet linear fit with sns.lmplot(col=”sex”) on penguins for bill_length_mm vs bill_depth_mm.
- Using tips, plot sns.residplot with lowess=True to visualize nonlinearity in residuals.
- Using synthetic data (x linear, y with noise), draw sns.regplot and print slope/intercept estimated via np.polyfit on the plot title.
- Using tips, plot polynomial order=3 fit (order=3) and compare visually to order=1 (two subplots).
- Using tips, draw sns.lmplot with col=”day” to see if slopes vary by day.
- Using penguins, draw sns.regplot and set custom scatter transparency scatter_kws={“alpha”:0.5}.
- Using tips, draw sns.lmplot with markers=[“o”,”s”] for hue=”sex” and palette=”deep”.
- Using tips, draw sns.regplot and disable the confidence interval (ci=None) while increasing line width.
- Save any one regression figure as PNG and PDF (300 DPI).
Intermediate Questions
- Using tips, compare linear vs polynomial (order=2) fits in a 1×2 figure; keep identical axes limits for fair comparison.
- With tips, draw sns.lmplot of total_bill vs tip with hue=”smoker” and col=”time”; ensure independent regression lines per facet.
- Using penguins, build a robust regression (robust=True) vs a standard OLS on the same axes (two regplot calls with different linestyles).
- On tips, add bootstrap size n_boot=5000 in sns.regplot and compare CI width to default (two subplots).
- Create a binary target from tips: high_tip = (tip/total_bill >= 0.20); fit logistic regression curve vs total_bill using sns.regplot(logistic=True).
- Using tips, draw sns.residplot grouped by hue=”sex” via sns.lmplot residual-style workaround: plot two residual plots on stacked axes and label clearly.
- Using penguins, facet a polynomial fit with sns.lmplot(col=”species”, order=2); keep shared y-limits.
- With synthetic data containing an outlier cluster, compare OLS (robust=False) vs robust (robust=True) in one axes; annotate the outlier region.
- Using tips, compute tip percentage and regress tip_pct on total_bill; show CI bands and annotate slope on the axes.
- Using tips, plot sns.lmplot with hue=”day” and set legend_out=False; reposition the legend inside top-left.
- Using penguins, add scatter_kws={“s”:40, “edgecolor”:”k”} to improve readability; keep CI visible.
- Using tips, build a lowess-smoothed residual diagnostic by first fitting order=2 and then plotting residplot(lowess=True); discuss trend via caption text on the figure.
- With tips, draw separate logistic curves for sex using sns.lmplot(logistic=True, hue=”sex”) predicting high_tip from total_bill.
- Using synthetic data with heteroscedastic noise (variance grows with x), draw regplot and then plot residuals vs fitted values to diagnose heteroscedasticity.
- Using tips, limit the fit line to the data range via truncate=True; compare to truncate=False.
- Using penguins, fit polynomial order=3 and overlay a piecewise visual (manually plot two linear fits over ranges) to compare shapes.
- Using tips, demonstrate CI styling: one regplot with err_style=”band” and another emulating bars by manually drawing errorbar on aggregated bins.
- Using tips, add x_bins=10 in regplot to show binned means overlay; compare to standard scatter.
- Using penguins, build a 2×2 grid: (A) OLS fit, (B) Robust fit, (C) Residuals OLS, (D) Residuals Robust; harmonize titles and axes.
- Using tips, compute predictions from a simple np.poly1d fit and overlay them on the regplot line; verify alignment.
Advanced Questions
Predictive Diagnostics Suite (OLS): On tips, create a 2×2 panel:
A: OLS regplot with CI
B: Residuals vs Fitted (residplot with fitted on x via manual pass)
C: QQ-style check (plot residual quantiles vs normal quantiles using Matplotlib)
D: Scale–Location (|resid|^0.5 vs fitted)
Keep one theme and export PNG + PDF.
Robust vs OLS Case Study: Inject synthetic outliers into tips (add 5 extreme total_bill/tip points). Build a 1×3 figure: OLS fit, Robust fit, Residuals comparison. Discuss impact with axis captions.
Logistic Regression Dashboard: Construct a binary target high_tip and create a 2×2 figure:
A: logistic regplot curve vs total_bill
B: predicted probability vs actual class (scatter jittered)
C: residual-like plot using deviance residuals (compute manually)
D: threshold sweep plot (precision/recall vs threshold using Matplotlib)
Polynomial Model Selection: Generate synthetic nonlinear data. Compare order=1,2,3,4 in a 2×2 lmplot/subplot grid. Annotate each panel with cross-validated MSE (computed) as figure text.
Grouped Trends with Uncertainty: With penguins, create lmplot (hue=species, col=sex) and ensure consistent y-limits. Add per-panel slope text extracted from np.polyfit.
Facet Residual Diagnostics: Build a FacetGrid over tips by time (Lunch/Dinner):
Top row: regplot fits
Bottom row: residplot for the same panels
Share x/y axes and add a suptitle.
Heteroscedasticity Illustration: Simulate y = x + ε·x where ε~N(0,σ). Show OLS fit with CI, then plot residuals vs x. Add a fitted LOWESS line (via sns.regplot(lowess=True, scatter=False) on residual panel).
Logit with Multiple Groups: Build lmplot(logistic=True) predicting smoker from total_bill with hue=”sex” and col=”day”. Ensure balanced legends and add a single shared legend below using figure handles.
End-to-End Publishing Figure: Create a 3-panel predictive story on tips:
Panel 1: polynomial fit (order=2) with CI
Panel 2: residual diagnostics
Panel 3: logistic high_tip curve
Apply a publishing style (font scale, grid, palette) and export at 300 DPI PNG & SVG.
Model Drift Scenario (Synthetic): Simulate two periods with different slopes. Create a side-by-side regplot comparison and a residuals delta plot. Annotate slope change and CI overlap.