TrainingReddit r/LocalLLaMA — 12 d ago

UPDATE: Qwen-27B-IQ4_KS and Qwen-27B-IQ_KS_KT for ik_llama.cpp, especially for NVIDIA with 16GB VRAM

The release includes two new quantized models, Qwen3.6-27B.i1-IQ4_KS-attn_qkv-IQ4_KS.gguf and Qwen3.6-27B.i1-IQ4_KS_KT-attn_qkv-IQ4_KS.gguf, optimized for 16GB VRAM on NVIDIA GPUs. Both models maintain a perplexity (PPL) score around 7.41, demonstrating similar performance, while the second model leverages the Trellis algorithm for quantization, applied selectively to tensors with Gaussian distributions. These advancements are significant for practitioners focused on optimizing LLMs for resource-constrained environments, particularly in coding tasks.

qwenquantizationnvidiarelevance 0.00 · engagement 0.00

Read at source ↗← all news