Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Free-Energy Signatures (Fes) is a novel spectral descriptor designed to detect hallucinations in large language models (LLMs) by analyzing the spectrum of attention-derived graph Laplacians. Unlike prior spectral diagnostics that use limited eigenvalues, Fes treats each layer's attention Laplacian as a Hamiltonian, extracting thermodynamic potentials like partition function, free energy, spectral entropy, and heat capacity, alongside the random-matrix-theory (RMT) spectral form factor. The method demonstrates Lipschitz stability under attention perturbation, enriches finite spectral summaries, and provides a finite-sample PAC bound on AUROC for a training-free detector. Empirically, across six open-weight LLMs and six benchmarks, a lightweight Fes probe achieved the strongest aggregate AUROC, improving over LapEig by +6.5 points and GoR-4 by +2.4 points on average without modifying the LLM. An unsupervised RMT-deviation score yielded a mean AUROC of 0.71, revealing that correct generations exhibit Wigner-Dyson-like spectral statistics, while hallucinations show Poisson-like statistics. The code was published on 2026-06-17.

Key takeaway

For machine learning engineers deploying large language models, Free-Energy Signatures (Fes) offer a robust, training-free method to detect hallucinations. You should consider integrating Fes diagnostics, which significantly outperform existing spectral baselines like LapEig and GoR-4, into your LLM evaluation pipelines. This approach provides strong aggregate AUROC without requiring any updates to your underlying LLM, enhancing reliability and trust in your deployed systems. Furthermore, the unsupervised RMT-deviation score offers a valuable label-free detection option.

Key insights

Free-Energy Signatures (Fes) leverage thermodynamic and random-matrix-theory diagnostics on attention Laplacians to detect LLM hallucinations effectively.

Principles

Method

Treat each LLM layer's attention Laplacian as a Hamiltonian to extract thermodynamic potentials (partition function, free energy, spectral entropy, heat capacity) and the random-matrix-theory spectral form factor for hallucination detection.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.