Vector RAG vs LLM-Compiled Wiki: A Preregistered Comparison on a Small Multi-Domain Research

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

A preregistered study compared two methods for assisting a Large Language Model (LLM) in answering questions over a small research corpus: a single-round Vector RAG system and an LLM-compiled markdown wiki. Both systems addressed 13 questions across 24 papers using the same answer-generating model, with blinded LLM judges scoring the responses. The wiki significantly outperformed RAG in connecting findings across papers, though its organizational advantage diminished after judge adjustment. RAG, however, met the preregistered criteria for single-fact lookup questions. Contrary to expectations, the wiki incurred substantially higher query-side token costs than RAG, negating any potential upfront build cost recovery. Exploratory analyses revealed the wiki's superiority in claim-level citation checking, despite RAG's better overall groundedness score. A decomposition-based RAG variant achieved most of the wiki's cross-paper synthesis benefits at a lower token cost, but did not match the wiki's claim-by-claim citation support.

Key takeaway

For AI Engineers designing knowledge retrieval systems, this research indicates that no single architecture is universally optimal for grounded research synthesis. You should evaluate systems like Vector RAG and LLM-compiled wikis based on specific needs: use wikis for complex cross-paper synthesis and claim-level citation accuracy, while RAG is effective for single-fact lookups. Consider hybrid approaches, such as decomposition-based RAG, to balance synthesis capabilities with token cost efficiency.

Key insights

Grounded research synthesis involves distinct capabilities, with no single LLM architecture excelling across all metrics.

Principles

Method

Compared Vector RAG and LLM-compiled markdown wiki on 13 questions over 24 papers, using blinded LLM judges for scoring and analyzing query token costs.

In practice

Topics

Best for: Research Scientist, AI Engineer, Machine Learning Engineer, AI Scientist, NLP Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.