Training
The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training
The article introduces Averis, a mean-residual splitting quantization method designed to enhance FP4 training for large language models by addressing issues related to activation outliers caused by a coherent rank-one mean bias. This method separates the mean component prior to quantization, leading to improved robustness in training Qwen3 models, with loss gaps reduced to 1.19% and 0.81% compared to NVIDIA's Hadamard-based method. Averis offers a hardware-efficient solution with only 2.20% overhead over vanilla NVFP4, making it a significant advancement for practitioners aiming to optimize low-bit quantization in LLMs.
quantizationllmfp4