Portfolio Project

Chatbot (LoRA + RAG)

RAG Chatbot Fine-Tuned with LoRA

Machine Learning Automation Python Ollama AWS Docker

Context

Off-the-shelf chatbots didn't sound like Visit Grand Junction, and they rarely pointed people to our pages.

Role

  • Built the prototype from crawl to deployment, including the web demo.

Approach

  • Crawled Visit Grand Junction pages and built a FAISS retrieval index.
  • Generated a fine-tuning dataset with GPT-OSS 20B (via Ollama).
  • Fine-tuned Mistral 7B with LoRA on the Q&A set and deployed it to AWS SageMaker.
  • Added Lambda endpoints so the website can talk to the model.

Impact

  • The demo answers with citations, but it needs a warm-up after idle time (about 10 minutes).

System Design

Pipeline: crawl → index → generate training data → LoRA fine-tune → deploy behind an API the site can call.

  • Ingestion: crawl pages and store cleaned text chunks.
  • Retrieval: use FAISS so answers can cite the right source passages.
  • Fine-tuning: train Mistral 7B with LoRA on auto-generated Q&A pairs.

Dataset Generation

  • Used GPT-OSS 20B (Ollama) to turn crawled content into Q&A pairs.
  • Automated crawl → index → dataset → fine-tune so runs are repeatable.
  • Kept it Docker-friendly so I could run locally and deploy the same artifacts.

Serving and Deployment

  • Merged the LoRA adapter into a 4-bit model and served it with FastAPI.
  • Hosted it behind AWS (SageMaker + Lambda) so the browser only calls an API.
  • Added a status check and a warm-up message to handle cold starts.

What I'd Improve

  • Add basic eval for retrieval quality and citation accuracy.
  • Cache common questions and stream tokens to make replies feel faster.
  • Add stronger guardrails against prompt injection.

Links

Notes

Uses public Visit Grand Junction pages. Answers include citations.