Inference
LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization
The paper introduces LC-QAT, a novel 2-bit quantization-aware training framework for large language models that employs linear-constrained vector quantization to optimize quantized weights through a learned affine mapping. This approach allows for end-to-end differentiable training without the need for discrete codebook lookups, significantly enhancing data efficiency by achieving strong post-training initialization with only 0.1% to 10% of the training data. LC-QAT demonstrates superior performance compared to existing QAT methods, making it a viable solution for deploying extremely low-bit models in practical applications.
quantizationLLMtraining