I Built a RAG System in 2025. The “RAG Is Dead” Posts Keep Telling Me to Delete It.
Summary
The author reflects on a RAG system tutorial published in early 2025, noting that half of its content is now obsolete due to advancements in large language models. Specifically, models like Llama 4 Scout with 10 million tokens, Gemini 3 Pro with 2 million tokens, and Claude with 1 million tokens have significantly expanded context windows. This development fuels a "RAG is dead" narrative, suggesting that chunking documents and using vector databases are unnecessary when entire documents can be pasted directly into prompts. However, the author, after a year of running RAG in production, finds that the other half of their original tutorial describes functionalities that even these massive context windows cannot yet replicate, indicating a nuanced and ongoing relevance for RAG in certain scenarios.
Key takeaway
For AI Engineers evaluating information retrieval architectures, do not prematurely abandon RAG systems based solely on large LLM context windows. While models like Llama 4 Scout offer 10 million tokens, RAG still provides distinct advantages for specific use cases that raw context cannot address. You should critically assess your application's needs, recognizing that some RAG functionalities remain essential even with expanded LLM capabilities.
Key insights
RAG's utility persists despite large LLM context windows, offering unique benefits.
Principles
- "RAG is dead" claims often oversimplify LLM capabilities.
- Large context windows don't fully replace RAG's specific strengths.
- Early RAG practices lacked rigorous chunking strategies.
In practice
- Re-evaluate RAG components against current LLM capabilities.
- Consider RAG for tasks beyond large context window limits.
Topics
- RAG Systems
- Large Language Models
- Context Windows
- Vector Databases
- Information Retrieval
- LLM Architectures
Best for: AI Architect, NLP Engineer, AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Advances - Medium.