About the Project

The Challenge

Built an AI-powered Document Q&A application that allows users to upload any document — PDF, Excel, Word, or plain text — and ask natural language questions, receiving accurate answers drawn directly from the document content.

The Approach

My approach was to implement a full RAG (Retrieval Augmented Generation) pipeline rather than feeding entire documents to an LLM — which would be expensive and hit context limits with large files. I chunked documents into overlapping segments, generated embeddings using OpenAI's embedding model, and stored them in a Chroma vector database. At query time, the most semantically relevant chunks are retrieved and passed to the LLM as context, keeping costs low and accuracy high.

LangChain was used to orchestrate the pipeline — handling document loaders for different file types, the chunking strategy, embedding generation, vector retrieval, and prompt construction in a clean, modular chain. The application also integrated Apify scrapers to crawl web pages on demand, extending the Q&A capability beyond uploaded files to live web content.

The Outcome

The result was a full-stack AI application demonstrating practical RAG architecture — the same pattern used in enterprise knowledge management tools — built with production-ready components rather than toy implementations.