This AI knew the answers but didn’t understand the questions
Summary
A recent study published in *National Science Open* challenges claims made in July 2025 about the AI model "Centaur," which was introduced in *Nature*. Centaur, built on standard large language models and refined with psychological experiment data, reportedly mimicked human thinking across 160 cognitive tasks, including decision-making and executive control. However, researchers from Zhejiang University argue that Centaur's apparent success stems from overfitting, suggesting it memorized patterns rather than truly understanding tasks. New evaluation scenarios, such as replacing original multiple-choice prompts with a direct instruction like "Please choose option A," revealed Centaur continued to select the original "correct answers," indicating a lack of genuine language comprehension and intent recognition.
Key takeaway
For AI Scientists evaluating cognitive models, you should prioritize rigorous testing beyond standard benchmarks to differentiate true understanding from pattern memorization. Your evaluation strategies must include scenarios that probe instruction comprehension, such as altering prompt structures, to prevent overestimating a model's capabilities and mitigate risks like hallucinations or misinterpretations in deployed systems.
Key insights
AI models like Centaur may exhibit apparent cognitive abilities through pattern memorization rather than true understanding.
Principles
- Overfitting can mask a lack of genuine comprehension.
- Varied testing is crucial for assessing AI capabilities.
Method
Researchers tested Centaur by replacing original task prompts with direct, simple instructions (e.g., "Please choose option A") to evaluate its instruction understanding.
In practice
- Design diverse evaluation scenarios.
- Test models for instruction comprehension.
Topics
- Centaur AI Model
- Cognitive Simulation
- Large Language Models
- Overfitting
- Language Understanding
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence News -- ScienceDaily.