Portfolio Project
Pizza Tips Regression Modeling
Excel Analytics & Regression Modeling
Context
Tips varied a lot by neighborhood and housing type. I wanted to see what actually drives them.
Approach
- Merged 1,251 delivery tickets with NOAA weather, then cleaned the data in Power Query.
- Ran a multiple regression in Excel: Tip = f(cost, delivery time, rain, max/min temperature).
Impact
- Order cost explains ~38% of tip variance (about +$1.10 tip per +$10 bill).
- Apartment customers tipped ~28% less than house residents (p < 0.001).
- Weather and delivery time didn’t show a meaningful effect on tip size.
Data Integration
I merged delivery tickets with NOAA weather to test common ideas about what drives tipping.
- Combined 1,251 deliveries with daily weather features (rain, max/min temperature, wind).
- Cleaned the dataset in Power Query and derived tip percentage and delivery time (minutes).
- Separated housing types (apartment vs. house) to test neighborhood effects.
Exploratory Analysis
- Found a strong positive correlation (0.62) between order cost and tip amount.
- Rainfall had only a mild relationship with delivery duration (correlation 0.14).
- Order counts more than doubled in summer/early fall (clear seasonality).
Regression and Hypothesis Tests
- Ran a multiple regression: Tip = f(cost, delivery time, rain, max/min temperature).
- Result: order cost was the main driver, explaining ~38% of tip variance (≈ +$1.10 tip per +$10 bill).
- Validated housing differences with a two-sample t-test: apartment customers tipped ~28% less (p < 0.001).
What I'd Improve
- Add distance, time-of-day, and driver controls to reduce omitted-variable bias.
- Model tip percentage and tip amount separately to avoid conflating larger orders with generosity.
- Use mixed-effects models to capture repeated customers or neighborhood-level variance.