Training
UniSVQ: 2-bit Unified Scalar-Vector Quantization
UniSVQ is a newly proposed 2-bit unified quantization framework that integrates scalar and vector quantization techniques by parameterizing codewords as an affine transform of integer lattices. This approach minimizes quantization reconstruction error through a block-wise fine-tuning strategy, resulting in performance that surpasses state-of-the-art scalar quantization methods and matches advanced vector quantization techniques across various large language model families. This framework is significant for practitioners as it offers a low-cost deployment solution with improved inference throughput, facilitating the efficient use of LLMs in resource-constrained environments.
quantizationllminference