Vero: An Open RL Recipe for General Visual Reasoning

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, extended

Summary

Vero is a new family of fully open vision-language models (VLMs) designed for general visual reasoning, matching or exceeding existing open-weight models. Developed using an open reinforcement learning (RL) recipe, Vero leverages Vero-600K, a 600,000-sample dataset compiled from 59 datasets across six broad task categories, alongside task-routed reward functions. This approach yields strong performance, improving over four base models by 3.7–5.5 points on average across VeroEval, a suite of 30 challenging benchmarks. For instance, Vero-Qwen3T-8B surpassed Qwen3-VL-8B-Thinking on 24 of 30 benchmarks, with notable gains such as +12.1 on ScreenSpotPro. Ablations confirm that broad data coverage, uniform task category weighting, and task-specific reward design are critical for robust RL scaling and preserving visual chat capabilities. All data, code, and models are publicly released.

Key takeaway

For machine learning engineers developing general visual reasoning VLMs, you should prioritize diverse, multi-task training data and task-specific reward functions. Vero's open recipe demonstrates that broad data coverage, uniform category mixing, and routed rewards are critical for achieving high performance and preventing reasoning blind spots. Consider adopting the Vero-600K dataset and VeroEval suite to enhance your model's capabilities across varied visual tasks.

Key insights

Broad data diversity and task-routed rewards are crucial for general visual reasoning in open VLMs.

Principles

Method

Vero's single-stage RL recipe uses Vero-600K (59 datasets, 6 categories) with task-routed reward functions. It involves multi-stage data filtering and uniform task category mixing.

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.