Why AIs make mistakes that a child wouldn't make
Summary
An analysis of AI's common sense failures highlights how large language models (LLMs) struggle with real-world coherence. The "Mona" AI cafe experiment in Stockholm, powered by Google Gemini 3.1 Pro, demonstrated this by ordering 120 eggs without a stovetop and 22.5 kilos of canned tomatoes for fresh sandwiches, consuming nearly \$21,000 of its budget for 44,000 kronor in sales. Similarly, LLMs often fail the "car wash" test, suggesting walking 100 meters to wash a car. This stems from AIs operating on semantic models, connecting words statistically, rather than possessing a "world model" like humans. While newer models like Claude Opus 4.8 and Gemini 3.5 show improvement, they can still be misled by "semantic attractors," as seen in the "ten past ten" watch image fixation.
Key takeaway
For AI Engineers and ML practitioners deploying models in real-world applications, recognize that current LLMs operate on semantic coherence, not true common sense. You must proactively test your AI's understanding of physical constraints and logical implications. Implement prompt engineering techniques like "think step by step before answering" or "analyze all the data before deciding" to mitigate errors. Continuously validate AI outputs against real-world logic to prevent costly and absurd mistakes, as models can still be swayed by semantic attractors.
Key insights
AI's common sense failures stem from semantic models predicting word coherence, not real-world understanding.
Principles
- AIs predict based on statistical word connections, not physical facts.
- Generative AIs lack internal world models and planning abilities.
- Self-supervised learning for world models should predict in representation space.
Method
To improve AI coherence, prompt models to "reason step by step before answering" or "analyze all the data before deciding."
In practice
- Test AI models with real-world coherence challenges.
- Use "think step by step before answering" in prompts.
- Use "analyze all the data before deciding" in prompts.
Topics
- AI Common Sense
- World Models
- Large Language Models
- Prompt Engineering
- AI Hallucinations
- Self-Supervised Learning
- JEPA
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Génération IA.