This AI knew the answers but didn’t understand the questions

2026-04-30 · Source: Artificial Intelligence News -- ScienceDaily · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Research Methodology & Innovation · Depth: Intermediate, quick

Summary

A recent study published in *National Science Open* challenges claims made in July 2025 about the AI model "Centaur," which was introduced in *Nature*. Centaur, built on standard large language models and refined with psychological experiment data, reportedly mimicked human thinking across 160 cognitive tasks, including decision-making and executive control. However, researchers from Zhejiang University argue that Centaur's apparent success stems from overfitting, suggesting it memorized patterns rather than truly understanding tasks. New evaluation scenarios, such as replacing original multiple-choice prompts with a direct instruction like "Please choose option A," revealed Centaur continued to select the original "correct answers," indicating a lack of genuine language comprehension and intent recognition.

Key takeaway

For AI Scientists evaluating cognitive models, you should prioritize rigorous testing beyond standard benchmarks to differentiate true understanding from pattern memorization. Your evaluation strategies must include scenarios that probe instruction comprehension, such as altering prompt structures, to prevent overestimating a model's capabilities and mitigate risks like hallucinations or misinterpretations in deployed systems.

Key insights

AI models like Centaur may exhibit apparent cognitive abilities through pattern memorization rather than true understanding.

Principles

Overfitting can mask a lack of genuine comprehension.
Varied testing is crucial for assessing AI capabilities.

Method

Researchers tested Centaur by replacing original task prompts with direct, simple instructions (e.g., "Please choose option A") to evaluate its instruction understanding.

In practice

Design diverse evaluation scenarios.
Test models for instruction comprehension.

Topics

Centaur AI Model
Cognitive Simulation
Large Language Models
Overfitting
Language Understanding

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence News -- ScienceDaily.