Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models
Summary
Free-Energy Signatures (Fes) is a novel spectral descriptor designed to detect hallucinations in large language models (LLMs) at inference time without retraining. It processes each layer's attention Laplacian as a Hamiltonian, extracting thermodynamic potentials (partition function, free energy, spectral entropy, heat capacity) and the random-matrix-theory (RMT) spectral form factor. Empirically, Fes achieves the strongest aggregate AUROC among attention-spectral baselines across six open-weight LLMs and six benchmarks, improving over LapEig by +6.5 points and GoR-4 by +2.4 points on average. A lightweight probe on Fes descriptors, typically 2880 features for a 32-layer model, requires only ~0.4 seconds per sample. The unsupervised RMT-deviation score yields a mean AUROC of 0.71. Correct generations exhibit Wigner–Dyson-like spectral statistics, while hallucinations show Poisson-like statistics.
Key takeaway
For MLOps Engineers deploying LLMs, you should consider integrating Free-Energy Signatures (Fes) for robust, inference-time hallucination detection. Fes offers superior AUROC compared to prior spectral methods, requiring no LLM retraining and minimal labeled data for calibration. This allows you to enhance model reliability and user trust without significant computational overhead, especially for single-sample inference scenarios. Be aware that the unsupervised RMT detector's sign may flip for structured math tasks, requiring per-task calibration.
Key insights
Fes uses thermodynamic and RMT spectral analysis of attention Laplacians to detect LLM hallucinations.
Principles
- Attention Laplacians' full spectrum contains rich hallucination signals.
- Wigner–Dyson statistics correlate with valid reasoning, Poisson with hallucination.
- Thermodynamic potentials offer multiscale spectral views.
Method
Symmetrize and mean-pool post-softmax attention maps into graph Laplacians per layer. Extract thermodynamic potentials ($Z, F, S, C$) and spectral form factor ($g$) from Laplacian eigenvalues. Concatenate these features across layers to form the Fes descriptor. Use a logistic probe or RMT-deviation score for hallucination detection.
In practice
- Apply Fes to frozen LLMs for training-free hallucination detection.
- Use a small labeled set (e.g., 100-500 examples) to calibrate the supervised Fes probe.
- Monitor spectral form factor for Wigner–Dyson vs. Poisson statistics as a reasoning quality indicator.
Topics
- LLM Hallucination Detection
- Attention Mechanisms
- Graph Laplacians
- Random Matrix Theory
- Thermodynamic Potentials
- Spectral Analysis
Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.