Inference
LLMCodec: Adapting Video Codecs for Efficient Weight Compression of Large Language Models
LLMCodec is a novel compression method for large language models (LLMs) that leverages video codecs, specifically the VVC/H.266 codec, to efficiently reduce model weights without relying on fine-tuning or calibration data. The method integrates affine quantization and demonstrates significant performance improvements, achieving over 1.5x reduction in perplexity and a 21% increase in downstream task accuracy on the LLaMA-3-8B model at 2-bit precision. This approach offers a robust and generalizable solution for practitioners facing challenges in model storage and deployment, enhancing the efficiency of LLMs while maintaining performance.
compressionllmvideo-codecs