Research
Residual Context Diffusion Language Models
The paper introduces Residual Context Diffusion (RCD), a novel module for improving diffusion large language models (dLLMs) by recycling information from discarded tokens during the decoding process. RCD enhances model performance by converting these tokens into contextual residuals that are reintegrated into subsequent decoding iterations, achieving accuracy improvements of 4-11 percentage points across various benchmarks with only ~300 million tokens for training. This approach significantly reduces the number of denoising steps required, particularly on challenging AIME tasks, making it a valuable advancement for practitioners seeking efficiency and accuracy in dLLM architectures.
diffusionllmcontext