ReCoVLA: VLM-Guided Reward Compilation for Failure Recovery in Vision-Language-Action Policies

2026-06-08 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

ReCoVLA is a novel failure-conditioned residual recovery framework designed to enhance the robustness of Vision-Language-Action (VLA) policies in off-nominal states. This framework keeps a pretrained VLA policy frozen while employing an external Vision-Language Model (VLM) to infer the specific failure mode and necessary recovery stage. ReCoVLA then compiles a structured reward by selecting task-relevant components, using the VLM as a semantic reward selector rather than for direct action or reward generation. This approach predicts a recovery descriptor and reward mask for in-simulation residual-policy training, enabling zero-shot sim-to-real deployment of the trained recovery policies. This method effectively decouples high-level failure understanding from low-level corrective control. Experiments demonstrate ReCoVLA's superior performance, improving average simulation success from 36.7% to 66.7% and achieving 61.7% average success in physical zero-shot sim-to-real tasks.

Key takeaway

For Machine Learning Engineers developing robust manipulation policies, ReCoVLA offers a compelling strategy to overcome VLA policy brittleness. If your current VLA systems struggle with off-nominal states, consider integrating a VLM-guided reward compilation framework to decouple high-level failure understanding from low-level control. This approach can significantly improve success rates in both simulated and real-world zero-shot deployments, enhancing the reliability of your robotic systems.

Key insights

ReCoVLA uses a VLM to compile structured rewards for residual policy training, enabling robust VLA policy recovery.

Principles

Decouple failure understanding from control.
Freeze pretrained policies for stability.
VLMs can select semantic rewards.

Method

Infer failure mode and recovery stage using a VLM. Compile structured rewards via VLM-predicted recovery descriptors and reward masks. Train residual policies in simulation, then deploy zero-shot sim-to-real.

In practice

Enhance VLA policy robustness.
Improve manipulation task success rates.
Apply zero-shot sim-to-real recovery.

Topics

Vision-Language-Action Policies
Failure Recovery
Vision-Language Models
Reward Compilation
Sim-to-Real Transfer
Robotic Manipulation

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.