Real-Time AttentionBender: Granular Interactive Network Bending of Video Diffusion Transformers
Summary
Real-Time AttentionBender is a new tool designed to enhance creative agency in generative video models by enabling granular, interactive manipulation of Video Diffusion Transformers (DiT). Released as a plugin within the DayDream Scope ecosystem and utilizing open-source real-time Wan pipelines, this tool exposes self-attention, cross-attention, and feed-forward networks for independent control. Users can target specific diffusion steps, DiT layers, prompt tokens, and individual hidden neurons. This immediate, live manipulation fosters "material intimacy" with the model, offering a responsive understanding of how different components shape generated video. The authors, Adam Cole, Rebecca Fiebrink, and Mick Grierson, position AttentionBender as both an XAIxArts probe into transformer internals and an expressive instrument for exploring novel aesthetics beyond default model outputs. The paper was accepted to ACM Creativity & Cognition XAIxArts Workshop 2026 and revised on June 8, 2026.
Key takeaway
For creative technologists and AI scientists developing generative video applications, Real-Time AttentionBender offers a critical shift from prompt-only interfaces. You should consider integrating granular network bending to gain "material intimacy" with Video Diffusion Transformers, enabling direct manipulation of attention and feed-forward networks. This approach allows you to explore unique aesthetic outcomes and deeply understand model mechanics, moving beyond default outputs and fostering more expressive, interactive video generation workflows.
Key insights
Real-Time AttentionBender offers granular, interactive control over Video Diffusion Transformers, enhancing creative agency and model understanding.
Principles
- Direct network bending boosts creative agency.
- Live manipulation builds model intimacy.
- XAI tools can be expressive instruments.
Method
Real-Time AttentionBender operates as a DayDream Scope plugin, integrating open-source Wan pipelines. It exposes DiT components like self-attention, cross-attention, and feed-forward networks for independent, real-time manipulation at granular levels, including diffusion steps, layers, tokens, and neurons.
In practice
- Explore novel video aesthetics.
- Probe transformer internal workings.
- Enhance creative control in video generation.
Topics
- Video Diffusion Transformers
- Network Bending
- Generative Video
- Human-Computer Interaction
- Explainable AI
- Creative AI Tools
Best for: Computer Vision Engineer, Research Scientist, Creative Technologist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.