On Asymmetric Optimization of Reasoning and Perception in Vision-Language Model Post-Training

2026-05-28 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new diagnostic framework reveals a consistent perception-reasoning asymmetry in frontier Vision-Language Models (VLMs) during post-training, where reasoning gains significantly more than perception, creating an end-to-end visual reasoning bottleneck. For supervised fine-tuning (SFT), this imbalance stems from perception occupying fewer tokens in chain-of-thought supervision, leading to a weaker training signal. Dynamically reweighting the loss mitigates this, boosting end-to-end performance by up to 18.2%. In reinforcement learning (RL), the asymmetry arises from reward coupling, where outcome rewards correlate more strongly with reasoning. Adding a perception-aware reward improves end-to-end accuracy by up to 6.0%, with a reliable surrogate reward still yielding gains of 3.2 points.

Key takeaway

For Machine Learning Engineers optimizing Vision-Language Models, you must address the identified perception-reasoning asymmetry. If using supervised fine-tuning, reweighting loss can boost end-to-end performance by up to 18.2%. For reinforcement learning, incorporating perception-aware rewards, or even reliable surrogates, can improve accuracy by up to 6.0%, ensuring balanced visual reasoning capabilities.

Key insights

VLM post-training creates a perception-reasoning asymmetry due to token imbalance (SFT) or reward coupling (RL), hindering end-to-end performance.

Principles

Post-training gains for VLM perception are limited.
Token imbalance weakens SFT perception signals.
Reward coupling weakens RL perception signals.

Method

For SFT, dynamically reweight loss; for RL, add perception-aware or surrogate rewards to balance training signals for perception and reasoning.

In practice

Implement loss reweighting in SFT for VLMs.
Design perception-aware rewards for RL-trained VLMs.
Utilize surrogate perception rewards if ground truth is unavailable.

Topics

Vision-Language Models
Post-training Optimization
Supervised Fine-tuning
Reinforcement Learning
Perception-Reasoning Asymmetry
Reward Design

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.