Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models
Summary
Embodied-R1.5 is introduced as a unified Embodied Foundation Model (EFM) designed to integrate comprehensive embodied reasoning, including cognition, task planning, correction, and pointing, within a single architecture for general physical intelligence. The model leverages three automated data construction pipelines to build a large-scale data system of over 15B tokens and employs a multi-task balanced RL recipe to manage heterogeneous task conflicts. It also features a Planner-Grounder-Corrector (PGC) closed-loop framework, enabling autonomous execution and self-correction for long-horizon tasks. With only 8B parameters, Embodied-R1.5 achieves state-of-the-art performance on 16 out of 24 embodied VLM benchmarks, outperforming models like Gemini-Robotics-ER-1.5 and GPT-5.4. It can be fine-tuned into a Visual Language Agent (VLA) with minimal data, surpassing $π_{0.5}$ across four popular manipulation benchmark suites. Extensive zero-shot real-robot experiments confirm its strong generalization in instruction following and complex manipulation tasks. The project open-sources model weights, datasets, training code, and EmbodiedEvalKit.
Key takeaway
For robotics engineers developing embodied AI systems, Embodied-R1.5 offers a powerful foundation for general physical intelligence. You should consider integrating this 8B-parameter EFM, especially given its state-of-the-art performance across 16 VLM benchmarks and strong real-robot generalization. Utilize the open-sourced model weights and EmbodiedEvalKit to accelerate your development and evaluation of long-horizon, self-correcting robotic tasks.
Key insights
Embodied-R1.5 unifies diverse embodied reasoning capabilities into an 8B-parameter EFM, achieving SOTA performance and real-world generalization.
Principles
- Unified architecture for diverse embodied reasoning.
- Large-scale automated data construction improves coverage.
- Closed-loop PGC framework enables self-correction.
Method
The Planner-Grounder-Corrector (PGC) framework enables autonomous execution and self-correction by integrating planning, grounding, and corrective actions in a closed loop.
In practice
- Fine-tune Embodied-R1.5 into a VLA with small datasets.
- Use EmbodiedEvalKit for embodied task evaluation.
- Apply zero-shot for real-robot instruction following.
Topics
- Embodied Foundation Models
- Robotics
- Visual Language Models
- Embodied AI
- Planner-Grounder-Corrector
- Open-source AI
Best for: Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.