InferencearXiv cs.CL — 16 d ago

S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

The article introduces S2D2, a training-free self-speculative decoding framework designed for block-diffusion language models, which enhances decoding speed without additional training or significant test-time compute. S2D2 allows a pretrained block-diffusion model to function as both drafter and verifier by reducing block size to one, resulting in a hybrid decoding method that improves the accuracy-speed tradeoff. Benchmark results indicate S2D2 achieves up to 4.7× speedup over autoregressive decoding and up to 1.57× over dynamic baselines, while enhancing accuracy by up to 4.5 points, making it a valuable tool for practitioners seeking efficient LLM generation.

diffusion-modelsdecodingself-speculationrelevance 0.00 · engagement 0.00

Read at source ↗← all news