๐บ Watch: Elorian wants to fix AI's toddler vision
Summary
Elorian, a new research lab co-founded by former Google Brain and DeepMind expert Andrew Dai, aims to resolve AI's critical deficiency in visual reasoning. Current AI models, despite excelling at coding and language, struggle with complex visual tasks that even toddlers can perform, such as understanding spatial relationships, counting objects, or identifying broken UI layouts. This limitation, described as a "toddler vision" problem, significantly bottlenecks agentic engineering and agent-driven software development. Elorian, backed by \$55M in funding, is developing models to natively understand and reason through images, diagrams, designs, and the physical world, moving beyond simple image-to-text translation to enable true visual comprehension for applications like design review, engineering, and robotics.
Key takeaway
For AI engineers and product teams developing agent-driven software or physical world automation, recognize that current AI's visual reasoning is a significant bottleneck. Prioritize integrating models capable of native visual understanding, like those Elorian is developing, to move beyond superficial image descriptions. This shift will enable agents to truly "see" and reason about interfaces, designs, and physical environments, preventing costly errors and unlocking new automation possibilities.
Key insights
Current AI lacks human-like visual reasoning, hindering agentic software development and physical world applications.
Principles
- AI's visual reasoning lags its language and coding capabilities.
- Complex visual relationships are difficult to textualize for AI.
- Visual benchmarks require frequent updates to prevent data leakage.
Method
Elorian is building models for native visual understanding, focusing on spatial relationships, physical constraints, and design intent, rather than translating images to text.
In practice
- Improve AI for UI/UX design review.
- Accelerate mechanical engineering iterations.
- Enhance robotics' real-time environmental understanding.
Topics
- Visual Reasoning
- AI Agents
- Multimodal AI
- Elorian
- Andrew Dai
- Computer Vision
Best for: Computer Vision Engineer, Research Scientist, Investor, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Neuron.