Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning

2026-06-19 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

LUCID is introduced as the first hallucination detection method for Large Language Model-based Knowledge Graph (LLM-KG) reasoning frameworks. LLMs, despite incorporating KG information, exhibit an average hallucination rate of 29.65% across three frameworks (Readi, ToG, StructGPT) on KBQA datasets like GrailQA, WebQSP, and QALD-10. LUCID addresses this by jointly utilizing LLM attention scores, KG semantics, and structural information, integrating them via a graph neural network (GNN). Evaluated on manually annotated benchmark datasets, LUCID achieves state-of-the-art performance against 15 baselines, outperforming SelfCheckGPT by 6.76% and ReDeEP (chunk) by 5.48% on average. It also demonstrates high efficiency with 0.04 milliseconds inference time per sample and can reduce QA costs by 55.4% while maintaining accuracy.

Key takeaway

For MLOps engineers deploying LLM-based knowledge graph reasoning systems, you should integrate specialized hallucination detection like LUCID to improve reliability and manage costs. By using LUCID's hallucination probabilities, you can selectively reprocess high-risk outputs with more powerful, expensive models, reducing overall API costs by over 55% while maintaining accuracy comparable to using the larger model universally. This approach ensures more trustworthy outputs for critical applications.

Key insights

LUCID detects LLM hallucinations in KG reasoning by fusing LLM attention, KG semantics, and structural information via a GNN.

Principles

LLM attention, KG semantics, and structure are crucial for robust hallucination detection.
RAG-specific detection methods significantly outperform general-purpose approaches.
Graph neural networks effectively model KG topological relationships for enhanced detection.

Method

LUCID extracts node/edge features from LLM attention scores and KG semantic similarities, then feeds these into a GINE model to predict hallucination probability.

In practice

Employ LUCID's hallucination probabilities for cost-effective QA refinement.
Integrate LLM attention scores and KG semantics for robust detection.
Utilize GNNs to capture structural consistency in KG reasoning.

Topics

Large Language Models
Knowledge Graph Reasoning
Hallucination Detection
Graph Neural Networks
Retrieval-Augmented Generation
QA Refinement

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.