CreFlow: Corrective Reflow for Sparse-Reward Embodied Video Diffusion RL

· Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

CreFlow is a novel online Reinforcement Learning (RL) framework designed to improve embodied video generation models (VGMs) by addressing physical implausibility in manipulation tasks. Traditional VGMs, trained on heterogeneous data, often produce visually plausible but physically unrealistic rollouts. Existing video-RL rewards typically rely on low-level visual metrics, failing to capture the compositional task requirements of manipulation. CreFlow introduces a compositional constraint-based reward model that automatically formulates task requirements using Linear Temporal Logic (LTL) constraints, providing accurate rewards and localized error information. The framework incorporates two key designs: a credit-aware NFT loss that confines RL updates to reward-relevant regions, preventing perturbations to unrelated areas, and a corrective reflow loss that leverages within-group positive samples to estimate correction directions, stabilizing and accelerating training. Experiments on eight bimanual manipulation tasks demonstrate that CreFlow's reward judgments align better with human and simulator success labels and improve downstream execution success by 23.8 percentage points.

Key takeaway

For research scientists developing robotic manipulation systems with video generation models, CreFlow offers a robust approach to overcome physical implausibility. You should consider integrating compositional LTL constraints for more faithful reward signals and implement localized, credit-aware RL updates. This strategy will lead to more stable training and significantly higher downstream task success rates, particularly for long-horizon, multi-stage manipulation tasks.

Key insights

CreFlow enhances embodied video generation by localizing RL updates with LTL-based compositional rewards and corrective reflow.

Principles

Method

CreFlow uses a compositional LTL monitor for reward and violation traces. It applies a credit-aware NFT loss to focus updates on relevant regions and a corrective reflow loss using the empirical mean of successful rollouts as a target.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.