Portfolio Project
Chatbot (LoRA + RAG)
RAG Chatbot Fine-Tuned with LoRA
Context
Off-the-shelf chatbots didn't sound like Visit Grand Junction, and they rarely pointed people to our pages.
Role
- Built the prototype from crawl to deployment, including the web demo.
Approach
- Crawled Visit Grand Junction pages and built a FAISS retrieval index.
- Generated a fine-tuning dataset with GPT-OSS 20B (via Ollama).
- Fine-tuned Mistral 7B with LoRA on the Q&A set and deployed it to AWS SageMaker.
- Added Lambda endpoints so the website can talk to the model.
Impact
- The demo answers with citations, but it needs a warm-up after idle time (about 10 minutes).
System Design
Pipeline: crawl → index → generate training data → LoRA fine-tune → deploy behind an API the site can call.
- Ingestion: crawl pages and store cleaned text chunks.
- Retrieval: use FAISS so answers can cite the right source passages.
- Fine-tuning: train Mistral 7B with LoRA on auto-generated Q&A pairs.
Dataset Generation
- Used GPT-OSS 20B (Ollama) to turn crawled content into Q&A pairs.
- Automated crawl → index → dataset → fine-tune so runs are repeatable.
- Kept it Docker-friendly so I could run locally and deploy the same artifacts.
Serving and Deployment
- Merged the LoRA adapter into a 4-bit model and served it with FastAPI.
- Hosted it behind AWS (SageMaker + Lambda) so the browser only calls an API.
- Added a status check and a warm-up message to handle cold starts.
What I'd Improve
- Add basic eval for retrieval quality and citation accuracy.
- Cache common questions and stream tokens to make replies feel faster.
- Add stronger guardrails against prompt injection.
Links
Notes
Uses public Visit Grand Junction pages. Answers include citations.