Research
DPRM: A Plug-in Doob h transform-induced Token-Ordering Module for Diffusion Language Models
The article introduces DPRM (Doob-transform Process Reward Model), a novel plug-in token-ordering module designed for diffusion language models that enhances token ordering without altering the underlying architecture or denoising objectives. DPRM transitions from a confidence-driven ordering approach to a process-reward-guided ordering method, demonstrating improved performance across nine host models in various domains, including language reasoning and multimodal tasks. This advancement is significant for practitioners as it offers a more effective ordering policy that can lead to better model performance and efficiency in diverse applications.
diffusion-modelstoken-ordering