Inference
DSB: Dynamic Sliding Block Scheduling for Diffusion LLMs
The article introduces Dynamic Sliding Block (DSB), a novel training-free block scheduling method for diffusion large language models (dLLMs) that adapts block sizes based on semantic difficulty, enhancing both output quality and inference efficiency. It also presents DSB Cache, a KV-cache mechanism designed to optimize the DSB approach. Experimental results show significant improvements in generation quality and efficiency across multiple models and benchmarks, making it a valuable advancement for practitioners working with dLLMs.
diffusionllmscheduling