TetherCache: Stabilizing Autoregressive Long-Form Video Generation with Gated Recall and Trusted Alignment
Summary
TetherCache is a training-free, plug-and-play cache management strategy designed to stabilize autoregressive long-form video generation. It addresses challenges in minute-level video generation, such as visual artifacts, quality degradation, and temporal drift, which arise from limited KV-cache budgets and context distribution shifts. TetherCache employs two mechanisms: GRAB (Gated Recall with Attention-Diversity Balancing), which selects diverse, informative long-range memory frames, and TAME (Trusted Alignment via Memory Editing), which aligns newly recalled memory tokens to a trusted context distribution to reduce feature pollution. Built on Self-Forcing, TetherCache consistently improves long-video generation quality on VBench-Long across 30s, 60s, and 240s settings, notably reducing quality drift from 7.84 to 1.33 for 240s generation.
Key takeaway
For machine learning engineers extending autoregressive video diffusion models to minute-level durations, TetherCache offers a critical, training-free solution. Its GRAB and TAME mechanisms directly mitigate accumulated context distribution shift, preventing visual artifacts and temporal drift. You should consider integrating these cache management principles to achieve stable, high-quality long-horizon video generation, especially when targeting 240s or longer outputs.
Key insights
TetherCache stabilizes long-form video generation by intelligently managing cache and aligning historical context.
Principles
- Gated recall can preserve diverse historical context.
- Memory editing reduces pollution from drifted features.
Method
TetherCache organizes cache into sink, memory, and recent regions. GRAB selects long-range memory frames using a gated score combining attention relevance and temporal diversity. TAME edits recalled memory tokens by aligning their statistics to a trusted context distribution.
In practice
- Combine attention-based relevance with temporal diversity for cache selection.
- Align statistics of recalled tokens to a trusted distribution.
Topics
- Autoregressive Video Generation
- Video Diffusion Models
- Cache Management
- Temporal Drift
- Gated Recall
- Memory Editing
- VBench-Long
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.