Re-feeding Is Not Replaying: Measuring Replay Noise in Counterfactual Token-Credit Estimation

2026-06-14 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A study on per-token counterfactual credit estimation reveals that re-feeding a transcript prefix as a fresh prompt, a common practice, introduces significant "replay noise." This method, which assumes it reproduces the model's decode-time state, was measured across six configurations and three language models, including a GRPO-trained checkpoint. Re-feeding changes credit estimates by 14-28 percentage points above a replica noise floor (7-21pp under treatment-independent conditioning) at low-margin decision tokens. While averaged quantities remain largely safe, critical-token selection is impacted, showing a Jaccard overlap of 0.34-0.90 compared to a 0.63-0.96 replica ceiling. The research, costing under 10 USD, confirms that batch-invariant kernels, like those in vLLM, eliminate this noise, achieving zero disagreement.

Key takeaway

For machine learning engineers evaluating token attribution in language models, understand that re-feeding transcript prefixes introduces significant replay noise, altering critical token selection. You should resume decoder state or use batch-invariant kernels like vLLM to ensure accurate counterfactual credit estimates. Always report a replica floor to account for inherent measurement unreliability, as even replica passes show 9-23% disagreement.

Key insights

Re-feeding transcript prefixes in counterfactual token-credit estimation introduces significant replay noise, impacting critical token selection.

Principles

Single-sample credit measurements are unreliable under any replay.
Averaged credit quantities are largely safe from re-feed noise.
Batch-invariant kernels eliminate replay noise in credit estimation.

Method

Measure replay noise using a three-pass design: exact resume from KV state, an identical replica pass, and a re-feed pass, then compare outcomes.

In practice

Resume decoder state for counterfactual credit studies.
Utilize batch-invariant kernels for accurate credit estimation.
Report a replica floor to quantify measurement unreliability.

Topics

Counterfactual Credit Estimation
Replay Noise
Language Models
Token Attribution
vLLM
Decoder State
Batch-Invariant Kernels

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.