Three ways AI is learning to understand the physical world

· Source: VentureBeat · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, long

Summary

Large language models (LLMs) are encountering limitations in applications requiring physical world understanding, such as robotics and autonomous driving, due to their lack of grounding in physical causality. This constraint is driving significant investment into "world models," with AMI Labs raising $1.03 billion and World Labs securing $1 billion in seed funding. World models act as internal simulators, allowing AI systems to test hypotheses before physical action. Three distinct architectural approaches are emerging: Joint Embedding Predictive Architecture (JEPA) for real-time, efficient latent representation learning; Gaussian splats for generating complete, interactive 3D spatial environments; and end-to-end generative models like Google DeepMind's Genie 3 and Nvidia's Cosmos, which continuously generate scenes and physical dynamics on the fly. Hybrid architectures are also beginning to appear, combining strengths from different approaches.

Key takeaway

For AI Scientists developing systems for physical interaction, understanding the three world model architectures is crucial. Your choice between JEPA for real-time efficiency, Gaussian splats for spatial environment creation, or end-to-end generation for synthetic data and complex physics will dictate system capabilities. Consider hybrid approaches to combine strengths, ensuring your AI can reliably operate in dynamic, real-world scenarios.

Key insights

World models address LLM limitations in physical causality by providing AI with internal simulation capabilities.

Principles

Method

World models employ JEPA for latent feature learning, Gaussian splats for 3D environment generation, or end-to-end generative models for continuous scene and physics simulation.

In practice

Topics

Best for: Computer Vision Engineer, AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.