Robbyant Open Sources LingBot World: a Real Time World Model for Interactive Simulation and Embodied AI

· Source: Machine Learning ML & Generative AI News · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Gaming & Interactive Media · Depth: Advanced, quick

Summary

Robbyant, from Ant Group, has open-sourced LingBot World, an action-conditioned world model designed for real-time, interactive video simulations in embodied AI, driving, and gaming. This model translates text and control inputs into long-horizon simulations. It is built upon a 28B parameter mixture of experts diffusion transformer, initialized from Wan2.2, and learns dynamics from a unified data engine integrating web videos, game logs with actions, and Unreal Engine trajectories. LingBot World utilizes hierarchical captions to differentiate static layouts from motion, and incorporates actions via camera embeddings and adaptive keyboard adapters. A distilled version, LingBot World Fast, achieves approximately 16 frames per second at 480p on a single GPU node with under 1 second latency, demonstrating strong emergent memory and structural consistency, and leading VBench scores.

Key takeaway

For AI Scientists developing embodied agents or interactive simulations, LingBot World offers a robust, open-source solution for learning long-horizon dynamics. Its architecture, combining a large diffusion transformer with hierarchical captions and action conditioning, provides a significant advancement over frame-to-frame reactive models. Consider integrating LingBot World into your simulation environments to improve agent planning stability and achieve more consistent, memory-aware behaviors.

Key insights

LingBot World enables long-horizon, interactive video simulations for embodied AI using a 28B parameter diffusion transformer.

Principles

Method

LingBot World uses a 28B parameter diffusion transformer, initialized from Wan2.2, trained on a unified data engine combining web videos, game logs, and Unreal Engine trajectories with hierarchical captions.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist, AI Engineer, Machine Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning ML & Generative AI News.