Small Updates, Big Doubts: Does Parameter-Efficient Fine-tuning Enhance Hallucination Detection ?

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

A study systematically investigated the impact of Parameter-Efficient Fine-Tuning (PEFT) methods on hallucination detection in Large Language Models (LLMs). Researchers evaluated LoRA, DoRA, and PiSSA across three open-weight LLM backbones (LLaMA-3.2-3B-Instruct, Qwen2.5-3B-Instruct, Mistral-7B-Instruct-v0.3) and three fact-seeking QA benchmarks (TriviaQA, NQ-Open, SQuAD). The evaluation used seven unsupervised hallucination detection methods, categorized into semantic consistency, confidence, and entropy-based approaches, alongside white-box linear probes. Results indicate that PEFT consistently strengthens hallucination detection ability, significantly improving AUROC scores across a wide range of detectors, despite only marginal gains in QA accuracy (0.1% to 6.1%). PEFT primarily reshapes how uncertainty is encoded and surfaced, making hallucinations more detectable by shifting scores away from overconfident regimes, particularly for semantic consistency and confidence-based detectors. However, PEFT can disrupt supervised linear probe detectors, showing inconsistent performance.

Key takeaway

For AI Engineers deploying LLMs in knowledge-intensive applications, integrating PEFT methods like LoRA, DoRA, or PiSSA can significantly improve the detectability of hallucinations, even if it only modestly reduces their occurrence. You should prioritize semantic consistency and confidence-based hallucination detectors, as they show consistent performance gains with PEFT. Be aware that supervised linear probe detectors may not benefit consistently from PEFT, suggesting a shift in how uncertainty is represented internally.

Key insights

PEFT enhances LLM hallucination detectability by reshaping uncertainty signals, not primarily by improving factual accuracy.

Principles

Method

The study systematically compared three PEFT methods (LoRA, DoRA, PiSSA) on three LLM backbones and three QA benchmarks, evaluating seven black-box hallucination detectors and white-box linear probes.

In practice

Topics

Code references

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.