Vero: An Open RL Recipe for General Visual Reasoning

2025-08-07 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, extended

Summary

Vero is a new family of fully open vision-language models (VLMs) designed for general visual reasoning, matching or exceeding existing open-weight models. Developed using an open reinforcement learning (RL) recipe, Vero leverages Vero-600K, a 600,000-sample dataset compiled from 59 datasets across six broad task categories, alongside task-routed reward functions. This approach yields strong performance, improving over four base models by 3.7–5.5 points on average across VeroEval, a suite of 30 challenging benchmarks. For instance, Vero-Qwen3T-8B surpassed Qwen3-VL-8B-Thinking on 24 of 30 benchmarks, with notable gains such as +12.1 on ScreenSpotPro. Ablations confirm that broad data coverage, uniform task category weighting, and task-specific reward design are critical for robust RL scaling and preserving visual chat capabilities. All data, code, and models are publicly released.

Key takeaway

For machine learning engineers developing general visual reasoning VLMs, you should prioritize diverse, multi-task training data and task-specific reward functions. Vero's open recipe demonstrates that broad data coverage, uniform category mixing, and routed rewards are critical for achieving high performance and preventing reasoning blind spots. Consider adopting the Vero-600K dataset and VeroEval suite to enhance your model's capabilities across varied visual tasks.

Key insights

Broad data diversity and task-routed rewards are crucial for general visual reasoning in open VLMs.

Principles

Broad data coverage drives strong RL scaling for VLMs.
Uniform task category weighting optimizes multi-task VLM training.
Task-routed reward design is essential for heterogeneous answer formats.

Method

Vero's single-stage RL recipe uses Vero-600K (59 datasets, 6 categories) with task-routed reward functions. It involves multi-stage data filtering and uniform task category mixing.

In practice

Use Vero-600K dataset for VLM RL training.
Implement task-routed rewards for diverse visual tasks.
Evaluate with VeroEval's 30 benchmarks.

Topics

Vision-Language Models
Reinforcement Learning
Visual Reasoning
Multi-task Learning
Vero-600K Dataset
VeroEval Benchmark
Reward Design

Code references

zlab-princeton/vero

Best for: Research Scientist, AI Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.