RAG in Practice: Working Example to Get You Started
Summary
A practical, working RAG (Retrieval Augmented Generation) sample application is presented, implemented in Node.js with TypeScript, Next.js 16, and React 19. This Dockerized project utilizes OpenAI's gpt-4o-mini for the LLM and text-embedding-3-small (1536 dims) for embeddings, with Qdrant serving as the vector database using cosine similarity. It allows users to upload PDF, Markdown, or plain text files, then ask questions, with answers derived exclusively from the uploaded content. The application details its architecture, including API endpoints for querying, uploading, and listing documents, alongside UI components for file upload and query interaction. Key code snippets illustrate PDF parsing with LangChain's PDFLoader, document chunking, embedding generation, and similarity search.
Key takeaway
For AI Engineers or Software Engineers building RAG applications, especially with Node.js and TypeScript, this project offers a robust starting point. You can clone the GitHub repository, set up your `.env` file, and run it with `docker compose up` to quickly get a functional RAG pipeline. Review the provided code, particularly the `ingest.service.ts` and `rag.service.ts` files, to understand the core logic and adapt it for your specific use cases, addressing any TODOs for production readiness.
Key insights
A practical RAG pipeline example is provided, demonstrating a full-stack implementation with modern web technologies.
Principles
- RAG uses embeddings to find relevant document chunks for LLM context.
- Chunking documents (e.g., 400-500 words) is crucial for LLM input management.
- Similarity scoring filters weak matches, preventing LLM hallucinations.
Method
The RAG pipeline involves retrieving text, chunking documents, generating embeddings, storing them in a vector DB, then embedding user questions to search for relevant chunks, which are finally fed to an LLM for answer generation.
In practice
- Use `PDFLoader` from LangChain for efficient PDF content extraction.
- Implement Zod for robust API input and LLM response validation.
- Filter retrieved chunks using a `MIN_SIMILARITY_SCORE` (e.g., 0.4).
Topics
- Retrieval-Augmented Generation
- Node.js
- TypeScript
- Qdrant
- OpenAI API
- Large Language Models
- Vector Databases
Code references
Best for: AI Engineer, Machine Learning Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by LLM on Medium.