I Built a RAG System in 2025. The “RAG Is Dead” Posts Keep Telling Me to Delete It.

· Source: AI Advances - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Intermediate, quick

Summary

The author reflects on a RAG system tutorial published in early 2025, noting that half of its content is now obsolete due to advancements in large language models. Specifically, models like Llama 4 Scout with 10 million tokens, Gemini 3 Pro with 2 million tokens, and Claude with 1 million tokens have significantly expanded context windows. This development fuels a "RAG is dead" narrative, suggesting that chunking documents and using vector databases are unnecessary when entire documents can be pasted directly into prompts. However, the author, after a year of running RAG in production, finds that the other half of their original tutorial describes functionalities that even these massive context windows cannot yet replicate, indicating a nuanced and ongoing relevance for RAG in certain scenarios.

Key takeaway

For AI Engineers evaluating information retrieval architectures, do not prematurely abandon RAG systems based solely on large LLM context windows. While models like Llama 4 Scout offer 10 million tokens, RAG still provides distinct advantages for specific use cases that raw context cannot address. You should critically assess your application's needs, recognizing that some RAG functionalities remain essential even with expanded LLM capabilities.

Key insights

RAG's utility persists despite large LLM context windows, offering unique benefits.

Principles

In practice

Topics

Best for: AI Architect, NLP Engineer, AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Advances - Medium.