Inference
ROCm vs Vulkan vs vLLM on Dual R9700's
The article presents performance benchmarks for the Qwen 3.6 models (35B-A3B and 27B) using different backends: ROCm, Vulkan, and vLLM. The vLLM backend demonstrated significant improvements, achieving up to 156 tokens per second (t/s) for the 35B-A3B model with ROCm + AITER, compared to 106 t/s and 87 t/s for ROCm and Vulkan, respectively. This indicates that vLLM could be a more efficient option for practitioners looking to optimize model performance and concurrency in AI applications.
qwenperformance