Training
I mapped the KLD of KV cache quantization for Qwen3.6-35B-A3B and Gemma4-E2B QAT
The analysis presents a detailed mapping of Kullback-Leibler Divergence (KLD) for key-value (KV) cache quantization in the Qwen3.6-35B-A3B and Gemma4-E2B models. It reveals that quantization levels q8/q8 are nearly lossless for both models, while q4/q4 is effective for Qwen but detrimental for Gemma. Additionally, turbo quantization methods allow for significant cache compression, albeit with performance trade-offs. This information is crucial for practitioners optimizing LLMs, particularly in balancing model performance and resource efficiency during deployment.
kv-cachequantizationqwen