On Asymmetric Optimization of Reasoning and Perception in Vision-Language Model Post-Training
Summary
A new diagnostic framework reveals a consistent perception-reasoning asymmetry in frontier Vision-Language Models (VLMs) during post-training, where reasoning gains significantly more than perception, creating an end-to-end visual reasoning bottleneck. For supervised fine-tuning (SFT), this imbalance stems from perception occupying fewer tokens in chain-of-thought supervision, leading to a weaker training signal. Dynamically reweighting the loss mitigates this, boosting end-to-end performance by up to 18.2%. In reinforcement learning (RL), the asymmetry arises from reward coupling, where outcome rewards correlate more strongly with reasoning. Adding a perception-aware reward improves end-to-end accuracy by up to 6.0%, with a reliable surrogate reward still yielding gains of 3.2 points.
Key takeaway
For Machine Learning Engineers optimizing Vision-Language Models, you must address the identified perception-reasoning asymmetry. If using supervised fine-tuning, reweighting loss can boost end-to-end performance by up to 18.2%. For reinforcement learning, incorporating perception-aware rewards, or even reliable surrogates, can improve accuracy by up to 6.0%, ensuring balanced visual reasoning capabilities.
Key insights
VLM post-training creates a perception-reasoning asymmetry due to token imbalance (SFT) or reward coupling (RL), hindering end-to-end performance.
Principles
- Post-training gains for VLM perception are limited.
- Token imbalance weakens SFT perception signals.
- Reward coupling weakens RL perception signals.
Method
For SFT, dynamically reweight loss; for RL, add perception-aware or surrogate rewards to balance training signals for perception and reasoning.
In practice
- Implement loss reweighting in SFT for VLMs.
- Design perception-aware rewards for RL-trained VLMs.
- Utilize surrogate perception rewards if ground truth is unavailable.
Topics
- Vision-Language Models
- Post-training Optimization
- Supervised Fine-tuning
- Reinforcement Learning
- Perception-Reasoning Asymmetry
- Reward Design
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.