Models
Diffusion Gemma is 4x faster, but makes 6x more mistakes!
The new DiffusionGemma model, which is 4x faster than its autoregressive counterpart Gemma4, processes text by generating 256 tokens simultaneously and refining them in multiple passes, achieving a throughput of 763 tokens per second. However, it demonstrates significantly lower accuracy, with 33 correct facts and 28 mistakes compared to Gemma4's 45 correct and 5 mistakes, particularly struggling with less popular topics. This trade-off highlights the model's focus on fluency over factual accuracy, suggesting that practitioners should prefer Gemma4 when factual precision is critical.
diffusiongemmabenchmark