Vector RAG vs LLM-Compiled Wiki: A Preregistered Comparison on a Small Multi-Domain Research
Summary
A preregistered study compared two methods for assisting a Large Language Model (LLM) in answering questions over a small research corpus: a single-round Vector RAG system and an LLM-compiled markdown wiki. Both systems addressed 13 questions across 24 papers using the same answer-generating model, with blinded LLM judges scoring the responses. The wiki significantly outperformed RAG in connecting findings across papers, though its organizational advantage diminished after judge adjustment. RAG, however, met the preregistered criteria for single-fact lookup questions. Contrary to expectations, the wiki incurred substantially higher query-side token costs than RAG, negating any potential upfront build cost recovery. Exploratory analyses revealed the wiki's superiority in claim-level citation checking, despite RAG's better overall groundedness score. A decomposition-based RAG variant achieved most of the wiki's cross-paper synthesis benefits at a lower token cost, but did not match the wiki's claim-by-claim citation support.
Key takeaway
For AI Engineers designing knowledge retrieval systems, this research indicates that no single architecture is universally optimal for grounded research synthesis. You should evaluate systems like Vector RAG and LLM-compiled wikis based on specific needs: use wikis for complex cross-paper synthesis and claim-level citation accuracy, while RAG is effective for single-fact lookups. Consider hybrid approaches, such as decomposition-based RAG, to balance synthesis capabilities with token cost efficiency.
Key insights
Grounded research synthesis involves distinct capabilities, with no single LLM architecture excelling across all metrics.
Principles
- Different LLM architectures excel at different synthesis tasks.
- Citation quality can differ from overall groundedness scores.
Method
Compared Vector RAG and LLM-compiled markdown wiki on 13 questions over 24 papers, using blinded LLM judges for scoring and analyzing query token costs.
In practice
- Consider wiki for cross-paper synthesis.
- Use RAG for single-fact lookups.
- Evaluate systems on specific synthesis capabilities.
Topics
- Vector RAG
- LLM-Compiled Wiki
- Research Synthesis
- Question Answering Systems
- LLM Groundedness
Best for: Research Scientist, AI Engineer, Machine Learning Engineer, AI Scientist, NLP Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.