Models
Sumi: Open Uniform Diffusion Language Model from Scratch
The article introduces Sumi, a 7 billion parameter uniform diffusion language model (UDLM) pretrained from scratch on 1.5 trillion tokens. Sumi demonstrates competitive performance on knowledge, reasoning, and coding benchmarks compared to autoregressive models trained with similar token budgets, although it underperforms in commonsense tasks due to its specific data mixture. This release, including model weights and training recipes, aims to provide a reference point for future research on the scaling behavior and dynamics of uniform diffusion models in AI.
diffusionlanguage-modelpretraining