InferenceHacker News — 7 d ago▲ 291 · 108 cmts

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

The article discusses a setup utilizing the RTX 5080 and RTX 3090 GPUs to achieve 80 tokens per second (Tok/s) performance on the Qwen 3.6 model, which features 27 billion parameters and operates with quantization at 8 bits (Q8). This performance metric is significant for practitioners as it highlights the efficiency and capability of high-end consumer GPUs in accelerating large language model inference, potentially influencing hardware selection for AI workloads.

rtxqwenrelevance 0.00 · engagement 0.56

Read at source ↗HN discussion ← all news