ai-digest.dev
last updated 2 h ago
InferenceReddit r/LocalLLaMA 13 d ago

100+ t/s on Qwen3.6-27B Q8 across a 5090 + 3090 Ti — switching to tensor split-mode got me from 70 to 100+

The user reports achieving approximately 100 tokens per second (t/s) with the Qwen3.6-27B model at Q8_0 using a dual GPU setup of an RTX 5090 and RTX 3090 Ti. The significant performance improvement from 70 t/s to 100+ t/s was attributed to switching to tensor split-mode, which allows both GPUs to work on the same tensors simultaneously, rather than alternating layers. This optimization is crucial for practitioners as it maximizes GPU utilization and throughput, particularly in setups with heterogeneous GPU architectures.

qwengpuperformancerelevance 0.00 · engagement 0.00
Read at source ↗← all news
100+ t/s on Qwen3.6-27B Q8 across a 5090 + 3090 Ti — switching to tensor split-mode got me from 70 to 100+ — AI News Digest