StreamPro: From Reactive Perception to Proactive Decision-Making in Streaming Video
Summary
Researchers have introduced StreamPro-Bench, a new benchmark designed to evaluate streaming video understanding models across three dimensions: Perception Understanding, Temporal Reasoning, and Proactive Agency. This benchmark addresses the limitations of existing "see-then-answer" paradigms by focusing on a model's ability to make early yet reliable decisions under partial observations. Alongside the benchmark, they propose StreamPro, a two-stage training framework. The first stage employs CB-Stream Loss to mitigate severe supervision imbalance during supervised fine-tuning (SFT), while the second stage utilizes Group Relative Policy Optimization (GRPO) with a multi-grained reward design, incorporating both turn-level and trajectory-level rewards. Experiments show that StreamPro significantly improves proactive performance, achieving 41.5 on StreamPro-Bench, substantially outperforming the previous best of 10.4, and maintaining strong performance on real-time streaming benchmarks with 78.9 on StreamingBench-RTVU.
Key takeaway
For research scientists developing streaming video understanding models, this work highlights the critical need to move beyond reactive "see-then-answer" paradigms towards true proactive agency. You should consider adopting the StreamPro-Bench for comprehensive evaluation and explore the two-stage StreamPro training framework, particularly its CB-Stream Loss and multi-grained GRPO rewards, to enhance your models' ability to make timely and reliable decisions under incomplete observations.
Key insights
Proactive video understanding requires models to decide *when* to respond, balancing early prediction with sufficient evidence.
Principles
- Proactive agency demands timely, reliable decisions under partial observations.
- Training proactive models requires addressing severe silence-response imbalance.
- Multi-grained rewards improve holistic proactive behavior optimization.
Method
StreamPro uses a two-stage framework: SFT with CB-Stream Loss for imbalance, then GRPO with multi-grained (turn-level F1, trajectory-level rubric) rewards for proactive behavior.
In practice
- Use CB-Stream Loss to re-weight imbalanced streaming control tokens.
- Employ multi-grained rewards for RL, combining per-response and holistic trajectory evaluation.
- A larger temporal tolerance in RL rewards provides denser optimization signals.
Topics
- Proactive Video Understanding
- StreamPro-Bench
- Proactive Agency
- CB-Stream Loss
- Reinforcement Learning
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.