GPT-4o Lacks Core Features of Theory of Mind

2026-02-12 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new evaluation framework assesses whether Large Language Models (LLMs) possess a Theory of Mind (ToM) by probing for a causal model of mental states and behavior. This research specifically investigates if LLMs have a coherent, domain-general, and consistent understanding of how mental states drive actions, independent of human-like ToM. The study found that while LLMs, including GPT-4o, can approximate human judgments in basic ToM scenarios, they fail at logically equivalent tasks and show low consistency between their predicted actions and inferred mental states. These results indicate that the observed social proficiency in LLMs does not stem from a domain-general or consistent ToM.

Key takeaway

For AI Researchers developing socially intelligent agents, you should critically re-evaluate current ToM benchmarks. Your models' apparent social proficiency may not reflect a true understanding of mental states, necessitating new evaluation methods that probe for consistent, causal models of behavior rather than just approximating human judgments. This shift is crucial for building truly robust and reliable AI systems.

Key insights

LLMs like GPT-4o lack a consistent, domain-general causal model of mental states and behavior, despite social task success.

Principles

Social proficiency does not imply ToM.
ToM requires causal mental state models.

Method

The framework tests LLMs for a coherent, domain-general, and consistent model of how mental states cause behavior, using logically equivalent tasks and consistency checks between action predictions and mental state inferences.

In practice

Evaluate LLMs beyond simple benchmarks.
Focus on causal models for ToM assessment.

Topics

Theory of Mind
Large Language Models
GPT-4o
AI Evaluation
Social Cognition

Best for: AI Researcher, AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.