Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

LUCID is introduced as the first hallucination detection method for Large Language Model-based Knowledge Graph (LLM-KG) reasoning frameworks. LLMs, despite incorporating KG information, exhibit an average hallucination rate of 29.65% across three frameworks (Readi, ToG, StructGPT) on KBQA datasets like GrailQA, WebQSP, and QALD-10. LUCID addresses this by jointly utilizing LLM attention scores, KG semantics, and structural information, integrating them via a graph neural network (GNN). Evaluated on manually annotated benchmark datasets, LUCID achieves state-of-the-art performance against 15 baselines, outperforming SelfCheckGPT by 6.76% and ReDeEP (chunk) by 5.48% on average. It also demonstrates high efficiency with 0.04 milliseconds inference time per sample and can reduce QA costs by 55.4% while maintaining accuracy.

Key takeaway

For MLOps engineers deploying LLM-based knowledge graph reasoning systems, you should integrate specialized hallucination detection like LUCID to improve reliability and manage costs. By using LUCID's hallucination probabilities, you can selectively reprocess high-risk outputs with more powerful, expensive models, reducing overall API costs by over 55% while maintaining accuracy comparable to using the larger model universally. This approach ensures more trustworthy outputs for critical applications.

Key insights

LUCID detects LLM hallucinations in KG reasoning by fusing LLM attention, KG semantics, and structural information via a GNN.

Principles

Method

LUCID extracts node/edge features from LLM attention scores and KG semantic similarities, then feeds these into a GINE model to predict hallucination probability.

In practice

Topics

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.