Training
Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe
This study introduces UFP4, a novel uniform 4-bit pretraining recipe for large language models (LLMs) that addresses the Shrinkage Bias inherent in non-uniform formats like E2M1. The research demonstrates that UFP4, which applies Random Hadamard Transform (RHT) to all training GEMMs while limiting stochastic rounding, achieves superior performance on Dense 1.5B, MoE 7.9B, and MoE 124B models by reducing BF16-relative loss degradation compared to E2M1-based methods. This work highlights the need for future hardware to support uniform 4-bit grids, which can enhance quantization quality and training stability.
llmpretrainingshrinkage bias