MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination
Summary
The Multi-Agent Reinforced Self-Check for Hallucination (MARCH) framework addresses the critical problem of hallucination in large language models (LLMs), particularly within Retrieval-Augmented Generation (RAG) systems. Existing LLM-as-a-judge methods for hallucination detection often suffer from confirmation bias, as the verifier can reproduce errors from the initial generation. MARCH mitigates this by employing deliberate information asymmetry across three specialized agents: a Solver, a Proposer, and a Checker. The Solver creates an initial RAG response, which the Proposer then breaks down into atomic, verifiable claims. The Checker validates these claims against retrieved evidence independently, without access to the Solver's original output, thereby breaking the self-confirmation bias cycle. This pipeline is trained using multi-agent reinforcement learning (MARL), allowing agents to co-evolve for optimized factual adherence. Experiments show MARCH significantly reduces hallucination rates, with an 8B-parameter LLM achieving competitive performance against powerful closed-source models.
Key takeaway
For AI Architects and Research Scientists developing RAG systems, MARCH offers a robust method to combat LLM hallucination. By adopting its multi-agent, information-asymmetric verification pipeline, you can significantly improve factual accuracy and reliability. Consider integrating this reinforcement learning approach to enable LLMs to self-improve their factual adherence, potentially making 8B-parameter models competitive with larger, closed-source alternatives.
Key insights
MARCH uses multi-agent reinforcement learning and information asymmetry to reduce LLM hallucination in RAG systems.
Principles
- Information asymmetry breaks confirmation bias.
- Decompose responses into atomic propositions.
- Co-evolution optimizes factual adherence.
Method
MARCH employs a Solver for initial RAG response, a Proposer to decompose it into atomic claims, and a Checker to validate claims against evidence in isolation, trained via MARL.
In practice
- Implement multi-agent systems for verification.
- Isolate verifiers from original outputs.
- Apply MARL for factual self-improvement.
Topics
- LLM Hallucination
- Retrieval-Augmented Generation
- Multi-Agent Reinforcement Learning
- Factual Alignment
- Information Asymmetry
Code references
Best for: Research Scientist, AI Architect, AI Engineer, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.