Research
Gemma 4 models benchmarked on with Triple GPU
The benchmark results for the Gemma 4 models were released, featuring various configurations including gemma-4-31B-it-UD-Q4_K_XL (17.52 GiB) and gemma-4-12B-it-UD-Q8_K_XL (12.69 GiB). Performance metrics indicate significant throughput, with the 12B model achieving 128.85 tokens per second on pp512 and 13.47 on tg128. These results are relevant for practitioners as they provide insights into performance scalability and efficiency when deploying models on multi-GPU setups, particularly with the GTX-1070 architecture.
gemmabenchmarkmodelsperformance