Do Large Language Models Possess a Theory of Mind? A Comparative Evaluation Using the Strange Stories Paradigm
Summary
A study investigated whether five Large Language Models (LLMs) possess Theory of Mind (ToM) capabilities, specifically the ability to infer beliefs, intentions, and emotions from text, comparing their performance to human controls. The research utilized an adapted version of the text-based "Strange Stories" paradigm, a tool commonly used in human ToM research, requiring models to answer questions about story characters' mental states. Results indicated a significant performance disparity among the LLMs. Earlier and smaller models showed sensitivity to the quantity of inferential cues and vulnerability to distracting information. In contrast, GPT-4o achieved high accuracy and robustness, performing comparably to humans even under the most challenging test conditions, suggesting advanced social-cognitive reasoning capabilities.
Key takeaway
For research scientists evaluating LLM social intelligence, this study indicates that GPT-4o exhibits robust Theory of Mind capabilities, performing on par with humans in text-based scenarios. You should consider GPT-4o for applications demanding sophisticated inference of beliefs, intentions, and emotions, while recognizing that smaller models may struggle with distracting information or limited inferential cues.
Key insights
GPT-4o demonstrates human-comparable Theory of Mind capabilities in text-based evaluations, outperforming smaller LLMs.
Principles
- LLM ToM capabilities vary significantly by model size and architecture.
- Robust ToM requires handling irrelevant information and sparse cues.
Method
The study adapted the "Strange Stories" paradigm, a human ToM research tool, to evaluate LLMs' ability to infer character beliefs, intentions, and emotions from text-based scenarios.
In practice
- Use GPT-4o for tasks requiring complex social-cognitive reasoning.
- Be aware of cue sensitivity in smaller LLMs for ToM-related applications.
Topics
- Large Language Models
- Theory of Mind
- GPT-4o
- Cognitive Status
- Natural Language Processing
Best for: Research Scientist, AI Researcher, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.