InferenceReddit r/LocalLLaMA — 14 d ago

Gemma 4 31B Q6 on Dual 9060 XT

Gemma 4, a 31 billion parameter model, has been tested on a dual setup of 9060 XT GPUs with 16GB memory each, achieving a throughput of approximately 8-9 tokens per second. This performance is perceived as lower than expected by some users, indicating potential optimization opportunities for practitioners. The findings are relevant for developers seeking to optimize LLM performance on specific hardware configurations.

gemmaperformancebenchmarkingrelevance 0.00 · engagement 0.00

Read at source ↗← all news