Research
Re-feeding Is Not Replaying: Measuring Replay Noise in Counterfactual Token-Credit Estimation
The paper presents a study on per-token counterfactual credit estimation in language models, highlighting the limitations of re-feeding transcripts as fresh prompts for estimating token contributions. It introduces a three-pass experimental design involving a verified decode-time KV state, a noise floor replica, and a re-feed pass, revealing that re-feeding can alter credit estimates by 14-28 percentage points compared to a stable baseline. The findings underscore the unreliability of single-sample credit measurements and recommend the use of resumed decoder states or batch-invariant kernels to enhance the accuracy of counterfactual credit studies.
token-creditllmreplay