RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

2026-05-14 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

The Real-time Autoregressive Video Extrapolation Network (RAVEN) is a new training-time test framework designed to improve real-time streaming video generation using causal autoregressive video diffusion models. These models extrapolate future video chunks from past content, but a mismatch between training and inference history distributions often degrades long-horizon generation quality. RAVEN addresses this by repacking self-rollouts into interleaved sequences of clean historical endpoints and noisy denoising states, aligning training attention with inference-time extrapolation and enabling supervision of history representations. Additionally, the paper introduces Consistency-model Group Relative Policy Optimization (CM-GRPO), which applies online Reinforcement Learning directly to a consistency sampling step, treating it as a conditional Gaussian transition. Experiments show RAVEN outperforms existing causal video distillation baselines in quality, semantic, and dynamic degree evaluations, with CM-GRPO providing further improvements.

Key takeaway

For research scientists developing real-time video generation systems, RAVEN offers a robust framework to mitigate distribution shifts between training and inference, leading to higher quality and more consistent long-horizon video extrapolation. You should consider integrating RAVEN's training-time test framework and CM-GRPO's reinforcement learning approach to significantly enhance the performance of your autoregressive video diffusion models.

Key insights

RAVEN and CM-GRPO enhance real-time video extrapolation by aligning training with inference and applying RL to consistency models.

Principles

Align training history with inference history.
Supervise history representations for future predictions.

Method

RAVEN repacks self-rollouts into interleaved clean historical endpoints and noisy denoising states. CM-GRPO applies online RL to a consistency sampling step as a conditional Gaussian transition.

In practice

Use RAVEN for improved causal video distillation.
Integrate CM-GRPO for additional quality gains.

Topics

RAVEN
Video Extrapolation
Autoregressive Video Diffusion
Consistency Models
Reinforcement Learning

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.