Context
My wife asked me to suggest baby names. I wanted something that learns her taste instead of guessing.
Approach
- Aggregated and cleaned 140+ years of SSA records and engineered trend features.
- Built a simple 'quiz' script to collect like/dislike labels.
- Trained several models and averaged their scores to produce recommendations.
Impact
- Generated personalized top 50 name lists for boys and girls.
- Helped us narrow the list when naming our child.
Data and Labeling
I combined Social Security Administration name data with preference labels to build a personalized recommender.
- Aggregated 140+ years of SSA records and added recency/trend features so it doesn’t just recommend the most common names.
- Focused on Colorado to keep suggestions closer to what my wife actually hears day to day.
- Collected labels through quick quizzes so the model learns her taste over time.
Feature Engineering
- Name shape features: length, vowel/consonant mix, syllable count, start/end vowel flags, and entropy.
- Popularity features: total count, peak year, and recent-count features to capture saturation and momentum.
- A rough origin signal via `langdetect` as a lightweight proxy.
Modeling Strategy
- Trained multiple models (Random Forest, XGBoost, SVM, KNN, plus a deep learning baseline) with randomized hyperparameter search.
- Optimized for weighted F1 to handle class imbalance in 'liked' vs. 'not liked' labels.
- Averaged predictions across models to reduce quirks from any single model.
Recommendation Workflow
- Generated a ranked Top 50 list for boys and girls and exported results for easy review.
- Designed the loop so new feedback becomes new training data.
- Kept it explainable by surfacing which features correlated with higher predicted preference.
What I'd Improve
- Collect more labels and add calibration so probabilities map to real acceptance rates.
- Use name embeddings (character-level or phonetic) instead of hand-crafted features alone.
- Add diversity constraints so recommendations aren’t overly similar to each other.