Skip to content
Back to Projects
mlOctober 2025

AI-Powered Document Q&A App with LangChain & Chroma VectorDB OpenAI Agent Building + RAG + LangChain Claude Cursor Development

Imagine having a smart assistant that instantly answers questions from any document — PDF, Excel, Word, or text. 🌊 I built an AI-powered Document Q&A app where users upload files, and the system extracts, chunks, and stores data in a Chroma vector database. Using LangChain’s RAG pipeline with OpenAI embeddings, it retrieves the most relevant info and responds in natural language. 🚀 This project highlights my skills in full-stack dev, AI integration, vector DBs, and building user-friendly, real-world AI apps. app also crawls pages using Apify scrapers

View Live Project
AI-Powered Document Q&A App with LangChain & Chroma VectorDB  OpenAI Agent Building + RAG + LangChain  Claude Cursor Development preview

About the Project

The Challenge

Built an AI-powered Document Q&A application that allows users to upload any document — PDF, Excel, Word, or plain text — and ask natural language questions, receiving accurate answers drawn directly from the document content.

The Approach

My approach was to implement a full RAG (Retrieval Augmented Generation) pipeline rather than feeding entire documents to an LLM — which would be expensive and hit context limits with large files. I chunked documents into overlapping segments, generated embeddings using OpenAI's embedding model, and stored them in a Chroma vector database. At query time, the most semantically relevant chunks are retrieved and passed to the LLM as context, keeping costs low and accuracy high.

LangChain was used to orchestrate the pipeline — handling document loaders for different file types, the chunking strategy, embedding generation, vector retrieval, and prompt construction in a clean, modular chain. The application also integrated Apify scrapers to crawl web pages on demand, extending the Q&A capability beyond uploaded files to live web content.

The Outcome

The result was a full-stack AI application demonstrating practical RAG architecture — the same pattern used in enterprise knowledge management tools — built with production-ready components rather than toy implementations.