HIVE: Hidden-Evidence Verification for Hallucination Detection in Diffusion Large Language Models

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

HIVE is a novel hidden-evidence verification framework designed to detect hallucinations in Diffusion Large Language Models (D-LLMs) by analyzing their multi-step denoising trajectories. Unlike existing methods that rely on final output uncertainty or coarse trace statistics, HIVE extracts compressed hidden evidence from intermediate denoising steps and layers, selects the most informative "step-layer" evidence, and conditions a verifier language model (Qwen2.5-7B-Instruct) on this selected evidence via prefix embeddings. This framework provides both a continuous hallucination score and structured verification outputs, including hallucination types, evidence pairs, and rationales. Evaluated across two D-LLMs (Dream-7B-Instruct and LLaDA-8B-Instruct) and three QA benchmarks (TriviaQA, HotpotQA, NQOpenLike), HIVE consistently outperformed eight strong baselines, achieving up to 0.9236 AUROC and 0.9537 AUPRC, demonstrating the value of trajectory-level analysis for D-LLM reliability.

Key takeaway

For AI Engineers and Research Scientists developing or deploying D-LLMs, HIVE offers a robust approach to hallucination detection by leveraging internal denoising trajectories. You should consider integrating trajectory-aware evidence selection and verifier conditioning to improve the reliability and interpretability of your D-LLM applications, especially in domains requiring high factual consistency. This method provides both a precise hallucination score and structured diagnostic outputs, enabling more informed debugging and oversight.

Key insights

Analyzing hidden evidence from D-LLM denoising trajectories significantly improves hallucination detection and interpretability.

Principles

Method

HIVE extracts compressed hidden evidence, learns to select informative step-layer units, and injects them as prefix embeddings into a verifier LLM to produce continuous scores and structured verification outputs.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.