Portfolio Project

Smart Sentence Retriever

NLP Embeddings & Serverless Retrieval

Data Science Machine Learning Automation Python AWS Docker NLP

Search a fixed Alice in Wonderland corpus by meaning, not exact wording.

  • Type a query (a phrase or full sentence).
  • Adjust “Top results” to control how many matches are returned.
  • Click “Find Sentences” and review the ranked semantic matches.
  • If the demo is warming up after idle time, wait for the status to turn ready and try again.

STAR Summary

Situation
I wanted a fast way to retrieve sentences that answer a question even when the wording doesn't match exactly.
Task
Built the retrieval pipeline, Lambda API, and browser demo end to end.
Action
  • Cleaned Project Gutenberg text, split it into sentences, and precomputed embeddings for the fixed corpus.
  • Compared several embedding models on a sample subset and balanced clustering quality against model size and serverless deployment cost.
  • Deployed the selected model behind an AWS Lambda Function URL and built a browser demo that checks `/health` before ranking top-k matches by cosine similarity.
Result
  • Shipped a working semantic-search demo for Alice in Wonderland that returns ranked sentence matches instead of keyword hits.
  • Kept the site embed usable with warm-up status, inline similarity bars, and an open-in-new-tab fallback for a larger view.

Notes

The live demo uses a fixed Alice in Wonderland corpus and checks endpoint health before enabling queries because cold starts can happen after idle time.