TeDiO: Temporal Diagonal Optimization for Training-Free Coherent Video Diffusion

2026-05-15 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

TeDiO, or Temporal Diagonal Optimization, is a novel training-free, inference-time method designed to enhance temporal coherence in text-to-video diffusion transformers. Recent models like Wan2.1 and CogVideoX often produce videos with flickering, drifting, or unstable motion despite generating visually compelling frames. TeDiO addresses this by observing that incoherent videos exhibit irregular, fragmented temporal diagonals in their intermediate self-attention maps, while stable motion corresponds to smooth, band-diagonal patterns. The method regularizes these internal attention patterns by estimating diagonal smoothness, identifying unstable regions, and performing lightweight latent updates. This process promotes coherent frame-to-frame dynamics without modifying model weights or requiring external motion supervision, ultimately delivering smoother motion while preserving per-frame visual quality.

Key takeaway

For research scientists developing or deploying text-to-video diffusion models, TeDiO offers a plug-and-play solution to significantly improve temporal coherence and reduce artifacts like flickering. You can integrate this training-free method at inference time to achieve smoother motion in generated videos, enhancing dynamic realism without the need for model retraining or additional datasets. Consider TeDiO as a crucial post-processing step for production-ready video generation systems.

Key insights

Temporal coherence in video diffusion models correlates with smooth, band-diagonal self-attention patterns.

Principles

Incoherent video manifests as fragmented temporal diagonals.
Regularizing internal attention patterns improves video stability.

Method

TeDiO estimates diagonal smoothness in self-attention maps, identifies unstable regions, and applies lightweight latent updates to promote coherent frame-to-frame dynamics, all without training or external supervision.

In practice

Apply TeDiO to existing video diffusion models.
Improve motion stability in generated videos.
Preserve per-frame visual quality.

Topics

TeDiO
Video Diffusion
Temporal Coherence
Self-Attention Maps
Training-Free Optimization

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.