Inference
Maximizing performance of 2x3090 + NVLink
The article discusses a user setup featuring dual NVIDIA GeForce RTX 3090 GPUs connected via NVLink, running on Ubuntu 24.04 with a Ryzen 7950x3d processor and 64GB of DDR5 RAM. The user reports achieving a maximum throughput of approximately 60 tokens per second (TPS) during brief bursts, with an average around 40-45 TPS while utilizing the Qwen 3.6 27B Q8_0 model with MTP and graph splitting techniques. This highlights the performance limitations of high-end consumer hardware configurations in AI workloads, prompting discussions on optimization strategies among practitioners.
performance3090nvlink