Training
Strange numbers of pp and tg rx7900xtx on ROCm and Vulcan with Qwen3.6-27b nonMTP and MTP
The article discusses performance benchmarks for the Qwen 3.6 model (27B parameters) running on AMD's RX 7900 XTX GPU with ROCm 7.2.4 and Vulkan API 1.4.330. Key metrics include prompt token throughput of 238.42 tok/s and generation token throughput of 26.84 tok/s under normal conditions, while MTP (Multi-Token Prompting) configurations yielded lower performance. These findings highlight the challenges faced by practitioners utilizing ROCm and Vulkan for LLM deployment, particularly in optimizing performance on specific hardware setups.
qwentrainingperformance