ActTraitBench: Quantifying the Knowledge-Decision Gap in Large Language Models via Human-Grounded Behavioral Validation
Summary
ActTraitBench is a human-grounded evaluation framework designed to quantify the Knowledge-Decision Gap ($G_{\text{KD}}$) in Large Language Models (LLMs), addressing the discrepancy between explicit self-reports and implicit behavioral decisions. Existing benchmarks often fail due to limited construct validity and biases. ActTraitBench establishes one-to-one mappings between psychometric facets and behavioral paradigms, employing a Distributional Calibration via Quantile Mapping procedure to align LLM-judge scores with human norms. Experiments on 14 mainstream LLMs revealed a pervasive knowledge-decision asymmetry, with larger and more capable models often exhibiting stronger behavioral divergence despite highly consistent self-reports. To mitigate this, the Chain of Cognitive Alignment (CoCA), a plug-and-play inference-time intervention, was introduced, improving alignment in reasoning-capable frontier models while exposing limitations in smaller architectures.
Key takeaway
For AI Scientists and Machine Learning Engineers evaluating LLM persona consistency, you should consider ActTraitBench for its human-grounded approach to quantify the Knowledge-Decision Gap. This framework offers a robust method to identify behavioral divergence, even in larger models with strong self-reports. Implement the Chain of Cognitive Alignment (CoCA) as a plug-and-play intervention for reasoning-capable frontier models to improve alignment, while recognizing its limitations for smaller architectures.
Key insights
ActTraitBench quantifies the Knowledge-Decision Gap in LLMs, revealing behavioral divergence despite consistent self-reports.
Principles
- LLMs exhibit a Knowledge-Decision Gap ($G_{\text{KD}}$).
- Larger LLMs can show stronger behavioral divergence.
- Evaluation needs human-grounded psychometric validity.
Method
ActTraitBench uses psychometric-behavioral mappings and Distributional Calibration via Quantile Mapping. CoCA is an inference-time intervention for alignment.
In practice
- Measure LLM personality consistency.
- Apply CoCA to improve frontier model alignment.
- Identify capability limits in smaller LLM architectures.
Topics
- Large Language Models
- ActTraitBench
- Knowledge-Decision Gap
- Personality Consistency
- Behavioral Validation
- Chain of Cognitive Alignment
- Psychometrics
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.