CodingReddit r/LocalLLaMA — 15 d ago

7900XTX 24GB vram, can finally fit Q6K+MTP with Qwen 3.6 27B at 131k context

The article discusses the successful execution of the Qwen 3.6 27B model with a context length of 131k on a 7900XTX GPU with 24GB of VRAM, utilizing techniques to optimize memory usage. By configuring the system to use integrated graphics for booting and employing kvcache quantization at Q5_0/Q4_0, the implementation achieves approximately 55-60 tokens per second while reducing VRAM usage by 12% compared to Q8. This setup is significant for practitioners as it demonstrates a practical approach to maximizing context length in large language models while managing VRAM constraints effectively.

qwenllmcontextrelevance 0.00 · engagement 0.00

Read at source ↗← all news