Learning Adaptive Reasoning Paths for Efficient Visual Reasoning

2026-04-16 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision, Natural Language Processing · Depth: Expert, quick

Summary

Visual reasoning models (VRMs) frequently generate excessively long reasoning chains, a problem termed "Reasoning Path Redundancy." To mitigate this, a new framework called AVR (Adaptive Visual Reasoning) has been developed. AVR decomposes visual reasoning into three core cognitive functions: visual perception, logical reasoning, and answer application. It allows models to dynamically select from three response formats: Full Format, Perception-Only Format, and Direct Answer, based on task complexity. AVR is trained using FS-GRPO, a modified Group Relative Policy Optimization, which optimizes for reasoning efficiency without sacrificing accuracy. Evaluations on several vision-language benchmarks demonstrate that AVR reduces token usage by 50-90% while maintaining accuracy, particularly in tasks heavily reliant on perception.

Key takeaway

For AI Engineers developing visual reasoning models, consider integrating adaptive reasoning frameworks like AVR. This approach can drastically cut token usage by 50-90% without compromising accuracy, especially in perception-heavy tasks, leading to more efficient and cost-effective model deployments. Explore the provided code and data to adapt AVR's principles to your specific VRM architectures.

Key insights

Adaptive visual reasoning can significantly reduce token usage in VRMs by dynamically selecting optimal reasoning paths.

Principles

Decompose visual reasoning into distinct cognitive functions.
Dynamically choose reasoning formats based on task needs.

Method

AVR decomposes visual reasoning into perception, logical reasoning, and answer application, then uses FS-GRPO to train models to dynamically select among Full, Perception-Only, or Direct Answer formats for efficiency.

In practice

Implement dynamic reasoning path selection in VRMs.
Utilize FS-GRPO for training efficiency-aware models.

Topics

Visual Reasoning Models
Adaptive Visual Reasoning
Reasoning Path Redundancy
FS-GRPO
Vision-Language Benchmarks

Code references

RunRiotComeOn/AVR

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.