RAG in Practice: Working Example to Get You Started

2026-05-31 · Source: LLM on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, medium

Summary

A practical, working RAG (Retrieval Augmented Generation) sample application is presented, implemented in Node.js with TypeScript, Next.js 16, and React 19. This Dockerized project utilizes OpenAI's gpt-4o-mini for the LLM and text-embedding-3-small (1536 dims) for embeddings, with Qdrant serving as the vector database using cosine similarity. It allows users to upload PDF, Markdown, or plain text files, then ask questions, with answers derived exclusively from the uploaded content. The application details its architecture, including API endpoints for querying, uploading, and listing documents, alongside UI components for file upload and query interaction. Key code snippets illustrate PDF parsing with LangChain's PDFLoader, document chunking, embedding generation, and similarity search.

Key takeaway

For AI Engineers or Software Engineers building RAG applications, especially with Node.js and TypeScript, this project offers a robust starting point. You can clone the GitHub repository, set up your `.env` file, and run it with `docker compose up` to quickly get a functional RAG pipeline. Review the provided code, particularly the `ingest.service.ts` and `rag.service.ts` files, to understand the core logic and adapt it for your specific use cases, addressing any TODOs for production readiness.

Key insights

A practical RAG pipeline example is provided, demonstrating a full-stack implementation with modern web technologies.

Principles

RAG uses embeddings to find relevant document chunks for LLM context.
Chunking documents (e.g., 400-500 words) is crucial for LLM input management.
Similarity scoring filters weak matches, preventing LLM hallucinations.

Method

The RAG pipeline involves retrieving text, chunking documents, generating embeddings, storing them in a vector DB, then embedding user questions to search for relevant chunks, which are finally fed to an LLM for answer generation.

In practice

Use `PDFLoader` from LangChain for efficient PDF content extraction.
Implement Zod for robust API input and LLM response validation.
Filter retrieved chunks using a `MIN_SIMILARITY_SCORE` (e.g., 0.4).

Topics

Retrieval-Augmented Generation
Node.js
TypeScript
Qdrant
OpenAI API
Large Language Models
Vector Databases

Code references

aligorkem/traditional-rag-pipeline

Best for: AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by LLM on Medium.