Reducing Hallucinations in Complex Question Answering using Simple Graph-based Retrieval-Augmented Generation (long version)

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Robotics & Autonomous Systems · Depth: Advanced, extended

Summary

A study by Wedge et al. introduces a novel agentic system that significantly reduces hallucinations and improves factual correctness in complex question answering (QA) by augmenting Retrieval-Augmented Generation (RAG) with a lightweight graph-based knowledge base. Evaluating on 510 questions from the challenging MoNaCo Wikipedia QA benchmark, the "vector+graph RAG" system, utilizing a curated August 2025 Wikipedia snapshot, achieved a fine-grained truthfulness score of approximately 63, an 80% increase over baseline vector RAG (score 35). It also more than doubled factual correctness precision and recall compared to vector RAG, and halved hallucinated answers compared to a zero-shot approach (coarse truthfulness improved from -127 to -49). The system, powered by GPT-5.4 and Microsoft's Harrier 0.6B embedding model, uses handwritten Cypher queries over a Neo4j graph database containing 5.7 million nodes and 22 million relationships, demonstrating improved performance with only a modest increase in token usage.

Key takeaway

For Machine Learning Engineers developing complex question answering systems, integrating a lightweight knowledge graph with agentic RAG can substantially improve answer accuracy and reduce hallucinations. Prioritize designing explicit, efficient graph query tools over relying solely on vector search, even if LLMs show a bias towards simpler tools. Your systems will deliver more trustworthy and factually correct responses, especially for multi-hop and multi-entity queries, providing better value despite a slight increase in token usage.

Key insights

Integrating lightweight knowledge graphs into RAG systems significantly reduces LLM hallucination and improves complex QA accuracy.

Principles

Agentic RAG with structured query tools outperforms pure vector search for complex QA.
Handwritten graph queries enhance LLM focus and avoid prompt injection risks.
Evaluating RAG systems requires metrics that penalize hallucination more severely than refusal.

Method

Construct a lightweight knowledge graph from semi-structured documents. Equip an LLM agent with vector search, structural navigation, and relational query tools (e.g., Cypher) to retrieve information efficiently.

In practice

Use Neo4j and Cypher for graph-based RAG implementation.
Employ smaller, efficient embedding models like Harrier 0.6B for large datasets.
Explicitly prompt LLM agents for efficient tool use (e.g., "Title search" -> "Section titles" -> "Get sections").

Topics

Retrieval-Augmented Generation
Knowledge Graphs
LLM Hallucination
Complex Question Answering
Neo4j Cypher
Agentic AI Systems

Code references

vibrantlabsai/ragas

Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.