CodingReddit r/LocalLLaMA — 15 d ago

Best Settings for 48GB VRAM + Qwen 3.6 27B

The article discusses optimal settings for running the Qwen 3.6 27B model on a dual-GPU setup comprising an RTX 4090 and RTX 3090, totaling 48GB of VRAM. Key configurations include using Q8_0 quantization, tensor split mode, 250k context length, and enabling speculative decoding with draft MTP, achieving performance metrics of 75-100t/s token generation and 1500 tokens per request. These settings are significant for practitioners as they maximize resource utilization and performance in high-demand AI applications.

qwenllmvramsettingsrelevance 0.00 · engagement 0.00

Read at source ↗← all news