Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact

2026-06-19 · Source: cs.CL updates on arXiv.org · Field: Science & Research — Research Methodology & Innovation, Social Sciences & Behavioral Studies, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

A study involving 56 instruction-tuned Large Language Models (LLMs) and large human reference samples reveals that apparent psychological profiles assigned to LLMs are primarily measurement artifacts. Researchers administered a battery of personality and risk-preference instruments, including the IPIP-NEO-300 and Frey et al.'s risk battery, finding that 81–90% of between-model variation stems from a directional response bias—a tendency to respond towards one end of a scale regardless of content—compared to 9–16% in humans. This bias, though attenuated, persists even in more capable models. The study introduces "response orthogonality" and demonstrates that an instrument's apparent reliability for LLMs is almost entirely predicted by this factor. Consequently, a model's perceived psychological profile can be manipulated through item selection, highlighting that these profiles are properties of the measurement instrument, not the LLM itself.

Key takeaway

For AI Scientists and Research Scientists assessing LLM behavior or using models as human proxies, recognize that apparent psychological profiles are largely artifacts of measurement instruments. Your current assessments may reflect response bias, not genuine model traits. You should prioritize instruments with high response orthogonality or employ formal psychometric frameworks to accurately separate trait from bias, ensuring valid and stable characterizations for safety and usability evaluations.

Key insights

LLM psychological profiles are largely measurement artifacts driven by response bias, not genuine traits.

Principles

Response bias, not trait, drives 81–90% of LLM behavioral variation.
Instrument reliability for LLMs correlates strongly with response orthogonality.
Psychological profiles of LLMs are unstable and steerable by item selection.

Method

The study used a formal psychometric framework to decompose LLM responses into latent trait and response bias components, applying it to 56 LLMs across 29 self-report and behavioral instruments.

In practice

Re-evaluate existing LLM psychological profiles against response orthogonality.
Design LLM assessment instruments with balanced forward/reverse-keyed items.
Consider balancing label direction within instruments to separate bias from trait.

Topics

Large Language Models
Psychometric Assessment
Response Bias
Measurement Artifacts
Model Reliability
Personality Profiling

Code references

jelenameyer/llm-profile-artifact

Best for: AI Scientist, Research Scientist, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.