Inference
A100 slow Qwen3.6-27B-FP8
The performance of the Qwen3.6-27B-FP8 model on an NVIDIA A100 80GB GPU was benchmarked, revealing a decoding rate of 43 transactions per second (tps) for single requests and 177 tps for eight concurrent requests. In contrast, the same model configuration on an RTX 6000 PRO achieved 130 tps for single requests and 509 tps for concurrent requests, indicating a significant performance discrepancy. This highlights potential optimization considerations for practitioners using the A100 with FP8 models and raises questions about the efficiency of hardware utilization for specific workloads.
qwenperformance