Mechanistic Decoding of Cognitive Constructs in LLMs
Summary
A Cognitive Reverse-Engineering framework, based on Representation Engineering (RepE), has been developed to analyze how Large Language Models (LLMs) process complex emotions like social-comparison jealousy. This framework combines appraisal theory with subspace orthogonalization, regression-based weighting, and bidirectional causal steering to isolate and quantify two psychological antecedents: "Superiority of Comparison Person" and "Domain Self-Definitional Relevance." Experiments on eight LLMs from the Llama, Qwen, and Gemma families indicate that these models natively encode jealousy as a structured linear combination of these factors. The internal representations align with human psychological constructs, where Superiority acts as a foundational trigger and Relevance as an intensity multiplier. The framework also shows potential for detecting and suppressing toxic emotional states in LLMs.
Key takeaway
For research scientists investigating LLM interpretability or AI safety, this framework offers a concrete method to reverse-engineer and intervene on complex emotional states. You should consider applying this Cognitive Reverse-Engineering approach to other nuanced cognitive constructs to enhance model transparency and control, particularly in multi-agent systems where emotional dynamics are critical.
Key insights
LLMs encode complex emotions like jealousy as structured linear combinations of psychological antecedents.
Principles
- Complex emotions are decomposable into constituent factors.
- Internal representations can be causally steered.
Method
The Cognitive Reverse-Engineering framework uses RepE, appraisal theory, subspace orthogonalization, regression-based weighting, and bidirectional causal steering to analyze emotional antecedents.
In practice
- Detect and suppress toxic emotional states in LLMs.
- Monitor AI representations in multi-agent environments.
Topics
- Cognitive Reverse-Engineering
- Representation Engineering
- LLM Interpretability
- Social-Comparison Jealousy
- Appraisal Theory
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.