VisionCreator-R1: A Reflection-Enhanced Native Visual-Generation Agentic Model
Summary
VisionCreator-R1 is a new native visual generation agent developed by Tencent Hunyuan and Hong Kong University of Science and Technology, designed to overcome limitations in existing plan-driven agents by incorporating explicit reflection mechanisms. The model utilizes a Reflection–Plan Co-Optimization (RPCO) training methodology, which addresses an identified asymmetry in reinforcement learning where planning is reliably optimized, but reflection learning is hindered by noisy credit assignment in multi-image tasks. RPCO first trains on the VCR-SFT dataset, combining reflection-strong single-image trajectories and planning-strong multi-image trajectories, then co-optimizes on the VCR-RL dataset via RL. VisionCreator-R1 consistently outperforms Gemini2.5Pro on existing benchmarks and the new VCR-bench, which covers single-image, multi-image, and image-to-image tasks, achieving an overall score of 7.23 on GEdit-Bench and significant gains on multi-image tasks in VCR-Bench.
Key takeaway
For Computer Vision Engineers developing advanced visual generation agents, integrating explicit reflection mechanisms is critical for overcoming error accumulation in complex, multi-image workflows. You should consider adopting a "decoupled-then-fused" training strategy, starting with supervised fine-tuning on diverse, high-quality datasets before applying multi-task reinforcement learning, to effectively co-optimize planning and reflection capabilities in long-horizon stochastic environments.
Key insights
Explicit reflection and co-optimization of planning and reflection are crucial for robust visual generation agents.
Principles
- Reflection optimization is difficult in long-horizon RL due to noisy credit assignment.
- Planning rewards are stable; reflection rewards are highly stochastic.
- Decoupled-then-fused training improves reflection and planning synergy.
Method
The Reflection–Plan Co-Optimization (RPCO) methodology involves initial supervised fine-tuning on mixed datasets, followed by multi-task reinforcement learning to synergistically optimize planning and reflection capabilities.
In practice
- Use VLM-based judges for reflection reward computation.
- Combine reflection-strong and planning-strong data for SFT initialization.
- Implement multi-dimensional reward systems for comprehensive RL guidance.
Topics
- Visual Generation Agents
- Reflection Mechanisms
- Reinforcement Learning Optimization
- Multi-Image Generation
- RPCO Training Methodology
Code references
Best for: Research Scientist, Computer Vision Engineer, AI Researcher, AI Scientist, Deep Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.