Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact
Summary
A recent study reveals that apparent psychological profiles assigned to large language models (LLMs) using human-designed instruments are largely measurement artifacts. Administering personality and risk-preference instruments to 56 instruction-tuned LLMs and human samples, researchers found that 81-90% of between-model variation stems from a directional response bias, where LLMs tend to favor one end of a scale regardless of item content, compared to 9-16% in humans. This bias, though declining with model capability, persists. The study introduces "response orthogonality," noting that an instrument's apparent reliability for LLMs is predicted by the proportion of items where trait and bias point in opposite directions. Consequently, a model's perceived profile can be manipulated through item selection, demonstrating these profiles are instrument-dependent, not inherent model properties.
Key takeaway
For AI Scientists or NLP Engineers assessing LLM safety, usability, or suitability as research proxies, you must critically evaluate any assigned psychological profiles. Recognize that these profiles are likely artifacts of measurement instruments, heavily influenced by directional response bias rather than genuine traits. Prioritize developing or utilizing assessments specifically designed for LLMs, focusing on response orthogonality, to ensure valid and reliable characterizations of model behavior.
Key insights
Apparent psychological profiles of LLMs are primarily measurement artifacts driven by directional response bias, not inherent traits.
Principles
- LLM differences in psychological assessments stem from directional response bias, not targeted traits.
- An instrument's reliability for LLMs correlates with its response orthogonality.
- LLM psychological profiles are mutable via item selection.
Method
Dedicated LLM assessments should prioritize response orthogonality in instrument design.
In practice
- Do not directly apply human psychological instruments to LLMs for profiling.
- Scrutinize LLM psychological profiles for underlying response biases.
Topics
- Large Language Models
- Psychometrics
- Measurement Artifacts
- Response Bias
- AI Safety
- Human-Computer Interaction
Best for: AI Scientist, Research Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.