Re-feeding Is Not Replaying: Measuring Replay Noise in Counterfactual Token-Credit Estimation
Summary
A study on per-token counterfactual credit estimation reveals that re-feeding a transcript prefix as a fresh prompt, a common practice, introduces significant "replay noise." This method, which assumes it reproduces the model's decode-time state, was measured across six configurations and three language models, including a GRPO-trained checkpoint. Re-feeding changes credit estimates by 14-28 percentage points above a replica noise floor (7-21pp under treatment-independent conditioning) at low-margin decision tokens. While averaged quantities remain largely safe, critical-token selection is impacted, showing a Jaccard overlap of 0.34-0.90 compared to a 0.63-0.96 replica ceiling. The research, costing under 10 USD, confirms that batch-invariant kernels, like those in vLLM, eliminate this noise, achieving zero disagreement.
Key takeaway
For machine learning engineers evaluating token attribution in language models, understand that re-feeding transcript prefixes introduces significant replay noise, altering critical token selection. You should resume decoder state or use batch-invariant kernels like vLLM to ensure accurate counterfactual credit estimates. Always report a replica floor to account for inherent measurement unreliability, as even replica passes show 9-23% disagreement.
Key insights
Re-feeding transcript prefixes in counterfactual token-credit estimation introduces significant replay noise, impacting critical token selection.
Principles
- Single-sample credit measurements are unreliable under any replay.
- Averaged credit quantities are largely safe from re-feed noise.
- Batch-invariant kernels eliminate replay noise in credit estimation.
Method
Measure replay noise using a three-pass design: exact resume from KV state, an identical replica pass, and a re-feed pass, then compare outcomes.
In practice
- Resume decoder state for counterfactual credit studies.
- Utilize batch-invariant kernels for accurate credit estimation.
- Report a replica floor to quantify measurement unreliability.
Topics
- Counterfactual Credit Estimation
- Replay Noise
- Language Models
- Token Attribution
- vLLM
- Decoder State
- Batch-Invariant Kernels
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.