ReCoVLA: VLM-Guided Reward Compilation for Failure Recovery in Vision-Language-Action Policies
Summary
ReCoVLA is a novel failure-conditioned residual recovery framework designed to enhance the robustness of Vision-Language-Action (VLA) policies in off-nominal states. This framework keeps a pretrained VLA policy frozen while employing an external Vision-Language Model (VLM) to infer the specific failure mode and necessary recovery stage. ReCoVLA then compiles a structured reward by selecting task-relevant components, using the VLM as a semantic reward selector rather than for direct action or reward generation. This approach predicts a recovery descriptor and reward mask for in-simulation residual-policy training, enabling zero-shot sim-to-real deployment of the trained recovery policies. This method effectively decouples high-level failure understanding from low-level corrective control. Experiments demonstrate ReCoVLA's superior performance, improving average simulation success from 36.7% to 66.7% and achieving 61.7% average success in physical zero-shot sim-to-real tasks.
Key takeaway
For Machine Learning Engineers developing robust manipulation policies, ReCoVLA offers a compelling strategy to overcome VLA policy brittleness. If your current VLA systems struggle with off-nominal states, consider integrating a VLM-guided reward compilation framework to decouple high-level failure understanding from low-level control. This approach can significantly improve success rates in both simulated and real-world zero-shot deployments, enhancing the reliability of your robotic systems.
Key insights
ReCoVLA uses a VLM to compile structured rewards for residual policy training, enabling robust VLA policy recovery.
Principles
- Decouple failure understanding from control.
- Freeze pretrained policies for stability.
- VLMs can select semantic rewards.
Method
Infer failure mode and recovery stage using a VLM. Compile structured rewards via VLM-predicted recovery descriptors and reward masks. Train residual policies in simulation, then deploy zero-shot sim-to-real.
In practice
- Enhance VLA policy robustness.
- Improve manipulation task success rates.
- Apply zero-shot sim-to-real recovery.
Topics
- Vision-Language-Action Policies
- Failure Recovery
- Vision-Language Models
- Reward Compilation
- Sim-to-Real Transfer
- Robotic Manipulation
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.