RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO
Summary
The Real-time Autoregressive Video Extrapolation Network (RAVEN) is a new training-time test framework designed to improve real-time streaming video generation using causal autoregressive video diffusion models. These models extrapolate future video chunks from past content, but a mismatch between training and inference history distributions often degrades long-horizon generation quality. RAVEN addresses this by repacking self-rollouts into interleaved sequences of clean historical endpoints and noisy denoising states, aligning training attention with inference-time extrapolation and enabling supervision of history representations. Additionally, the paper introduces Consistency-model Group Relative Policy Optimization (CM-GRPO), which applies online Reinforcement Learning directly to a consistency sampling step, treating it as a conditional Gaussian transition. Experiments show RAVEN outperforms existing causal video distillation baselines in quality, semantic, and dynamic degree evaluations, with CM-GRPO providing further improvements.
Key takeaway
For research scientists developing real-time video generation systems, RAVEN offers a robust framework to mitigate distribution shifts between training and inference, leading to higher quality and more consistent long-horizon video extrapolation. You should consider integrating RAVEN's training-time test framework and CM-GRPO's reinforcement learning approach to significantly enhance the performance of your autoregressive video diffusion models.
Key insights
RAVEN and CM-GRPO enhance real-time video extrapolation by aligning training with inference and applying RL to consistency models.
Principles
- Align training history with inference history.
- Supervise history representations for future predictions.
Method
RAVEN repacks self-rollouts into interleaved clean historical endpoints and noisy denoising states. CM-GRPO applies online RL to a consistency sampling step as a conditional Gaussian transition.
In practice
- Use RAVEN for improved causal video distillation.
- Integrate CM-GRPO for additional quality gains.
Topics
- RAVEN
- Video Extrapolation
- Autoregressive Video Diffusion
- Consistency Models
- Reinforcement Learning
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.