Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences, Software Development & Engineering · Depth: Expert, extended

Summary

Free-Energy Signatures (Fes) is a novel spectral descriptor designed to detect hallucinations in large language models (LLMs) at inference time without retraining. It processes each layer's attention Laplacian as a Hamiltonian, extracting thermodynamic potentials (partition function, free energy, spectral entropy, heat capacity) and the random-matrix-theory (RMT) spectral form factor. Empirically, Fes achieves the strongest aggregate AUROC among attention-spectral baselines across six open-weight LLMs and six benchmarks, improving over LapEig by +6.5 points and GoR-4 by +2.4 points on average. A lightweight probe on Fes descriptors, typically 2880 features for a 32-layer model, requires only ~0.4 seconds per sample. The unsupervised RMT-deviation score yields a mean AUROC of 0.71. Correct generations exhibit Wigner–Dyson-like spectral statistics, while hallucinations show Poisson-like statistics.

Key takeaway

For MLOps Engineers deploying LLMs, you should consider integrating Free-Energy Signatures (Fes) for robust, inference-time hallucination detection. Fes offers superior AUROC compared to prior spectral methods, requiring no LLM retraining and minimal labeled data for calibration. This allows you to enhance model reliability and user trust without significant computational overhead, especially for single-sample inference scenarios. Be aware that the unsupervised RMT detector's sign may flip for structured math tasks, requiring per-task calibration.

Key insights

Fes uses thermodynamic and RMT spectral analysis of attention Laplacians to detect LLM hallucinations.

Principles

Method

Symmetrize and mean-pool post-softmax attention maps into graph Laplacians per layer. Extract thermodynamic potentials ($Z, F, S, C$) and spectral form factor ($g$) from Laplacian eigenvalues. Concatenate these features across layers to form the Fes descriptor. Use a logistic probe or RMT-deviation score for hallucination detection.

In practice

Topics

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.