Portfolio Project

Smart Sentence Retriever

NLP Embeddings & Serverless Retrieval

Back to Portfolio

Machine Learning Automation Python AWS Docker NLP

Search a fixed corpus (Alice in Wonderland) by meaning, not exact wording.

Open demo in new tab

Links

Situation: I wanted a fast way to find sentences that match a question, even when the wording is different.
Task: Owned the end-to-end build, from implementation through the final deliverable.
Action: Cleaned the text of Alice in Wonderland, split it into sentences, and precomputed embeddings.

Experimented with several embedding models on a small sample (~800 sentences) and compared clustering quality using silhouette score as a quick heuristic.

Deployed a selected model behind an AWS Lambda API (with CORS) that returns the top-k semantic matches.
Result: Selected an embedding model that clustered the corpus cleanly on the test set (dataset- and k-dependent).

Shipped a serverless demo: embed the query, then rank cached sentence embeddings by cosine similarity.