MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

2026-03-25 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

The Multi-Agent Reinforced Self-Check for Hallucination (MARCH) framework addresses the critical problem of hallucination in large language models (LLMs), particularly within Retrieval-Augmented Generation (RAG) systems. Existing LLM-as-a-judge methods for hallucination detection often suffer from confirmation bias, as the verifier can reproduce errors from the initial generation. MARCH mitigates this by employing deliberate information asymmetry across three specialized agents: a Solver, a Proposer, and a Checker. The Solver creates an initial RAG response, which the Proposer then breaks down into atomic, verifiable claims. The Checker validates these claims against retrieved evidence independently, without access to the Solver's original output, thereby breaking the self-confirmation bias cycle. This pipeline is trained using multi-agent reinforcement learning (MARL), allowing agents to co-evolve for optimized factual adherence. Experiments show MARCH significantly reduces hallucination rates, with an 8B-parameter LLM achieving competitive performance against powerful closed-source models.

Key takeaway

For AI Architects and Research Scientists developing RAG systems, MARCH offers a robust method to combat LLM hallucination. By adopting its multi-agent, information-asymmetric verification pipeline, you can significantly improve factual accuracy and reliability. Consider integrating this reinforcement learning approach to enable LLMs to self-improve their factual adherence, potentially making 8B-parameter models competitive with larger, closed-source alternatives.

Key insights

MARCH uses multi-agent reinforcement learning and information asymmetry to reduce LLM hallucination in RAG systems.

Principles

Information asymmetry breaks confirmation bias.
Decompose responses into atomic propositions.
Co-evolution optimizes factual adherence.

Method

MARCH employs a Solver for initial RAG response, a Proposer to decompose it into atomic claims, and a Checker to validate claims against evidence in isolation, trained via MARL.

In practice

Implement multi-agent systems for verification.
Isolate verifiers from original outputs.
Apply MARL for factual self-improvement.

Topics

LLM Hallucination
Retrieval-Augmented Generation
Multi-Agent Reinforcement Learning
Factual Alignment
Information Asymmetry

Code references

Qwen-Applications/MARCH

Best for: Research Scientist, AI Architect, AI Engineer, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.