Intelligent Robots in 2026: Are We There Yet? with Nikita Rudin - #760

· Source: The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) · Field: Technology & Digital — Robotics & Autonomous Systems, Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, extended

Summary

Nikita Rudin, co-founder and CEO of Flexion Robotics, discusses the current state of robotics, highlighting the significant gap between existing capabilities and the requirements for fully autonomous real-world deployment. He explains that while reinforcement learning and simulation have advanced robot locomotion, particularly for "blind" quadruped robots, adding visual inputs introduces noise and complicates sim-to-real transfer. The discussion covers the pragmatic benefits of modular approaches over end-to-end models for planning and locomotion, and Flexion's hierarchical strategy using pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Rudin also introduces "real-to-sim" for refining simulation parameters and shares insights into the challenges of humanoid robot demos, reward tuning, and the future of robotics in industrial and home settings.

Key takeaway

For AI Scientists and Research Scientists developing autonomous robots, prioritize closing the sim-to-real gap, especially when incorporating visual perception. Your efforts should focus on modular architectures that leverage pre-trained VLMs for high-level reasoning, while refining low-level control through robust simulation and real-to-sim data. This approach will accelerate the transition from controlled demos to reliable, value-generating deployments in industrial settings, before tackling the complexities of home environments.

Key insights

Bridging the sim-to-real gap and integrating perception are key challenges for deploying autonomous robots.

Principles

Method

Flexion Robotics employs a hierarchical approach: VLMs orchestrate high-level tasks, VLAs plan kinematic motions, and whole-body trackers control motors. This combines RL, imitation learning, and teleoperation data, refined by "real-to-sim" processes.

In practice

Topics

Best for: AI Scientist, Research Scientist, Robotics Engineer, Machine Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence).