Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models
Summary
Free-Energy Signatures (Fes) is a novel spectral descriptor designed to detect hallucinations in large language models (LLMs) by analyzing the spectrum of attention-derived graph Laplacians. Unlike prior spectral diagnostics that use limited eigenvalues, Fes treats each layer's attention Laplacian as a Hamiltonian, extracting thermodynamic potentials like partition function, free energy, spectral entropy, and heat capacity, alongside the random-matrix-theory (RMT) spectral form factor. The method demonstrates Lipschitz stability under attention perturbation, enriches finite spectral summaries, and provides a finite-sample PAC bound on AUROC for a training-free detector. Empirically, across six open-weight LLMs and six benchmarks, a lightweight Fes probe achieved the strongest aggregate AUROC, improving over LapEig by +6.5 points and GoR-4 by +2.4 points on average without modifying the LLM. An unsupervised RMT-deviation score yielded a mean AUROC of 0.71, revealing that correct generations exhibit Wigner-Dyson-like spectral statistics, while hallucinations show Poisson-like statistics. The code was published on 2026-06-17.
Key takeaway
For machine learning engineers deploying large language models, Free-Energy Signatures (Fes) offer a robust, training-free method to detect hallucinations. You should consider integrating Fes diagnostics, which significantly outperform existing spectral baselines like LapEig and GoR-4, into your LLM evaluation pipelines. This approach provides strong aggregate AUROC without requiring any updates to your underlying LLM, enhancing reliability and trust in your deployed systems. Furthermore, the unsupervised RMT-deviation score offers a valuable label-free detection option.
Key insights
Free-Energy Signatures (Fes) leverage thermodynamic and random-matrix-theory diagnostics on attention Laplacians to detect LLM hallucinations effectively.
Principles
- Attention Laplacian spectra signal reasoning quality.
- Thermodynamic potentials characterize attention dynamics.
- RMT spectral statistics differentiate correct vs. hallucinated outputs.
Method
Treat each LLM layer's attention Laplacian as a Hamiltonian to extract thermodynamic potentials (partition function, free energy, spectral entropy, heat capacity) and the random-matrix-theory spectral form factor for hallucination detection.
In practice
- Implement Fes for training-free hallucination detection.
- Use RMT-deviation score for unsupervised detection.
- Apply spectral analysis to attention mechanisms.
Topics
- Large Language Models
- Hallucination Detection
- Attention Mechanisms
- Spectral Graph Theory
- Random Matrix Theory
- Free-Energy Signatures
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.