Zero-source LLM Hallucination Detection with Human-like Criteria Probing

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

Human-like Criteria Probing for Hallucination Detection (HCPD) introduces an interpretable, zero-source method for identifying factually incorrect or unfaithful content generated by Large Language Models. Operating solely on query-answer pairs, HCPD employs an LLM agent that adaptively decomposes truthfulness judgments into a weighted set of interpretable criteria, such as factual accuracy and logical consistency, then aggregates criterion-specific scores. This adaptive capability is achieved through a reward-based alignment scheme utilizing weak supervision from semantic consistency metrics like BLEURT. At inference, a multi-sampling aggregation strategy ensures robust decisions. HCPD consistently outperforms state-of-the-art baselines, achieving an average AUROC of 88.19% on LLaMA-3.1-8b and 88.02% on Qwen-3-8b across datasets like TriviaQA, SciQ, NQ Open, and CoQA, demonstrating its effectiveness and explainability.

Key takeaway

For Machine Learning Engineers deploying LLMs in safety-critical applications or auditing black-box models, HCPD provides a robust and interpretable hallucination detection solution. Its zero-source, multi-criteria approach, validated with high AUROC scores, means you can assess model truthfulness without internal access or external knowledge. Consider integrating HCPD into your CI/CD pipeline for pre-deployment auditing or continuous monitoring to enhance trust and debug model outputs effectively.

Key insights

Zero-source LLM hallucination detection is enhanced by emulating human multi-criteria reasoning with adaptive, weakly-supervised agents.

Principles

Decompose LLM evaluation into weighted, interpretable criteria.
Align LLM evaluators using weak semantic consistency supervision.
Multi-sampling aggregation stabilizes stochastic LLM judgments.

Method

An LLM agent adaptively generates context-aware criteria and weights, scores responses against them, and aggregates for a final truthfulness measure, trained via GRPO with weak semantic consistency supervision.

In practice

Instantiate a Qwen-2.5-7b agent for zero-source detection.
Apply multi-sampling (e.g., K=5) for robust hallucination scores.
Utilize BLEURT or DeepSeek-V3 as weak supervision signals.

Topics

LLM Hallucination Detection
Zero-source Evaluation
Human-like Criteria Probing
Reward-based Alignment
Group Relative Policy Optimization
Explainable AI

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.