When AI goes haywire: the case of the skyscraper and the slide trombone
Summary
Generative AI, exemplified by models like ChatGPT, Gemini, and Mistral, has achieved unprecedented adoption since its launch in November 2022, with ChatGPT alone reportedly reaching 800 million weekly active users. Despite its widespread use and ability to perform complex tasks such as passing bar exams or interpreting medical scans, these models exhibit a fundamental lack of common sense and understanding of the physical world. Experiments involving prompts to generate images of vastly different-sized objects side-by-side, such as a skyscraper and a trombone, consistently produce illogical results where objects are depicted at similar scales. This limitation stems from their statistical learning approach, where diffusion models are trained on image-text pairs and lack an internal representation of concepts like "compare" or the relative dimensions of objects not frequently co-occurring in their training data. The models' reliance on statistical inference, rather than logical reasoning, leads to "glitches" that highlight their inability to grasp real-world context.
Key takeaway
For AI Scientists evaluating generative models, recognize that current systems like Gemini and Mistral, despite advanced capabilities, fundamentally lack common sense and a logical world model. Your evaluation should include tests that push beyond learned patterns, such as comparing unrelated objects or complex logical queries, to identify critical limitations in contextual understanding and prevent "off the mark" outputs in real-world applications.
Key insights
Generative AI lacks common sense and real-world understanding, relying solely on statistical patterns from training data.
Principles
- AI results are based on learned data patterns.
- Models lack internal representation of concepts.
- Statistical inference can lead to logical glitches.
Method
Diffusion models generate images by reversing a noise addition process, trained on image-text pairs, but struggle with novel object comparisons due to a lack of contextual understanding.
In practice
- Test AI with prompts combining disparate objects.
- Verify AI-generated content for logical consistency.
Topics
- Generative AI
- AI Limitations
- Diffusion Models
- Large Language Models
- AI Training Data
Best for: Computer Vision Engineer, AI Scientist, Research Scientist, AI Engineer, Data Scientist, Tech Journalist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial intelligence (AI) – The Conversation.