TimeROME-DLM: Temporal Causal Tracing and Low-Rank Inference-Time Knowledge Editing for Masked Diffusion Language Models

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

TimeROME-DLM is the first training-free, gradient-free, inference-time knowledge-editing framework specifically designed for masked diffusion language models (MDLMs). Existing knowledge-editing methods for autoregressive LLMs fail on MDLMs due to their iterative denoising process. TimeROME-DLM integrates a Temporal Indirect Effect (TIE) causal-tracing protocol to pinpoint the exact coordinate (layer, denoising step, module) where a fact is recoverable, and a closed-form, low-rank residual edit memory. This framework reduces forget-set log-probability by approximately 83 nats on TOFU forget01 with LLaDA-8B-Base, preserves retain-set log-probability within ~1 nat across 50 sequential insertions, and offers a four- to fourteen-fold wall-clock speedup with zero additional VRAM. It scales sub-linearly to 400 facts and transfers across six MDLM backbones, including LLaDA-8B-Instruct and Dream-7B.

Key takeaway

For AI Scientists and Machine Learning Engineers working with masked diffusion language models, you should consider TimeROME-DLM for knowledge editing and unlearning tasks. This framework enables efficient, gradient-free fact removal or modification at inference time, avoiding the high VRAM and computational costs of traditional gradient-based methods. Its ability to scale to hundreds of facts and maintain utility stability during sequential insertions makes it a robust solution for managing knowledge in MDLM deployments.

Key insights

MDLM knowledge editing requires tracing causality along the denoising trajectory, enabling gradient-free, inference-time updates.

Principles

Method

TimeROME-DLM uses Temporal Indirect Effect (TIE) tracing to find the optimal (layer, denoising-step, module) coordinate. A low-rank residual update, aggregating all forget facts, is then applied at this coordinate during every diffusion forward pass.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.