StreamPro: From Reactive Perception to Proactive Decision-Making in Streaming Video

2026-05-19 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Researchers have introduced StreamPro-Bench, a new benchmark designed to evaluate streaming video understanding models across three dimensions: Perception Understanding, Temporal Reasoning, and Proactive Agency. This benchmark addresses the limitations of existing "see-then-answer" paradigms by focusing on a model's ability to make early yet reliable decisions under partial observations. Alongside the benchmark, they propose StreamPro, a two-stage training framework. The first stage employs CB-Stream Loss to mitigate severe supervision imbalance during supervised fine-tuning (SFT), while the second stage utilizes Group Relative Policy Optimization (GRPO) with a multi-grained reward design, incorporating both turn-level and trajectory-level rewards. Experiments show that StreamPro significantly improves proactive performance, achieving 41.5 on StreamPro-Bench, substantially outperforming the previous best of 10.4, and maintaining strong performance on real-time streaming benchmarks with 78.9 on StreamingBench-RTVU.

Key takeaway

For research scientists developing streaming video understanding models, this work highlights the critical need to move beyond reactive "see-then-answer" paradigms towards true proactive agency. You should consider adopting the StreamPro-Bench for comprehensive evaluation and explore the two-stage StreamPro training framework, particularly its CB-Stream Loss and multi-grained GRPO rewards, to enhance your models' ability to make timely and reliable decisions under incomplete observations.

Key insights

Proactive video understanding requires models to decide *when* to respond, balancing early prediction with sufficient evidence.

Principles

Proactive agency demands timely, reliable decisions under partial observations.
Training proactive models requires addressing severe silence-response imbalance.
Multi-grained rewards improve holistic proactive behavior optimization.

Method

StreamPro uses a two-stage framework: SFT with CB-Stream Loss for imbalance, then GRPO with multi-grained (turn-level F1, trajectory-level rubric) rewards for proactive behavior.

In practice

Use CB-Stream Loss to re-weight imbalanced streaming control tokens.
Employ multi-grained rewards for RL, combining per-response and holistic trajectory evaluation.
A larger temporal tolerance in RL rewards provides denser optimization signals.

Topics

Proactive Video Understanding
StreamPro-Bench
Proactive Agency
CB-Stream Loss
Reinforcement Learning

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.