Intelligent Robots in 2026: Are We There Yet? with Nikita Rudin - #760
Summary
Nikita Rudin, co-founder and CEO of Flexion Robotics, discusses the current state of robotics, highlighting the significant gap between existing capabilities and the requirements for fully autonomous real-world deployment. He explains that while reinforcement learning and simulation have advanced robot locomotion, particularly for "blind" quadruped robots, adding visual inputs introduces noise and complicates sim-to-real transfer. The discussion covers the pragmatic benefits of modular approaches over end-to-end models for planning and locomotion, and Flexion's hierarchical strategy using pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Rudin also introduces "real-to-sim" for refining simulation parameters and shares insights into the challenges of humanoid robot demos, reward tuning, and the future of robotics in industrial and home settings.
Key takeaway
For AI Scientists and Research Scientists developing autonomous robots, prioritize closing the sim-to-real gap, especially when incorporating visual perception. Your efforts should focus on modular architectures that leverage pre-trained VLMs for high-level reasoning, while refining low-level control through robust simulation and real-to-sim data. This approach will accelerate the transition from controlled demos to reliable, value-generating deployments in industrial settings, before tackling the complexities of home environments.
Key insights
Bridging the sim-to-real gap and integrating perception are key challenges for deploying autonomous robots.
Principles
- Locomotion is not "solved" until robots can reliably navigate any human-accessible terrain.
- Modular approaches can pragmatically overcome end-to-end training challenges in complex robotics.
- High-fidelity simulation requires deep understanding of both simulated and real-world robot physics.
Method
Flexion Robotics employs a hierarchical approach: VLMs orchestrate high-level tasks, VLAs plan kinematic motions, and whole-body trackers control motors. This combines RL, imitation learning, and teleoperation data, refined by "real-to-sim" processes.
In practice
- Use pre-trained VLMs for high-level task orchestration in complex robotic systems.
- Employ "real-to-sim" to refine simulation parameters using real-world robot data.
- Start with simpler, repetitive industrial tasks for early humanoid robot deployment.
Topics
- Reinforcement Learning
- Sim-to-Real Transfer
- Humanoid Robotics
- Vision-Language Models
- Robot Locomotion
Best for: AI Scientist, Research Scientist, Robotics Engineer, Machine Learning Engineer, AI Researcher
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence).