Portfolio Project

Smart Sentence Retriever

NLP Embeddings & Serverless Retrieval

Machine Learning Automation Python AWS Docker NLP

Search a fixed corpus (Alice in Wonderland) by meaning, not exact wording.

  • Type a query (a phrase or full sentence).
  • Adjust “Top results” to control how many matches are returned.
  • Click “Find Sentences” and review the ranked results.
  • If the status shows “Warming up”, wait a moment and try again.

STAR Summary

Situation
I wanted a fast way to find sentences that match a question, even when the wording is different.
Task
Owned the end-to-end build, from implementation through the final deliverable.
Action
  • Cleaned the text of Alice in Wonderland, split it into sentences, and precomputed embeddings.
  • Experimented with several embedding models on a small sample (~800 sentences) and compared clustering quality using silhouette score as a quick heuristic.
  • Deployed a selected model behind an AWS Lambda API (with CORS) that returns the top-k semantic matches.
Result
  • Selected an embedding model that clustered the corpus cleanly on the test set (dataset- and k-dependent).
  • Shipped a serverless demo: embed the query, then rank cached sentence embeddings by cosine similarity.