InferencearXiv cs.AI — 15 d ago

LLMCodec: Adapting Video Codecs for Efficient Weight Compression of Large Language Models

LLMCodec is a novel compression method for large language models (LLMs) that leverages video codecs, specifically the VVC/H.266 codec, to efficiently reduce model weights without relying on fine-tuning or calibration data. The method integrates affine quantization and demonstrates significant performance improvements, achieving over 1.5x reduction in perplexity and a 21% increase in downstream task accuracy on the LLaMA-3-8B model at 2-bit precision. This approach offers a robust and generalizable solution for practitioners facing challenges in model storage and deployment, enhancing the efficiency of LLMs while maintaining performance.

compressionllmvideo-codecsrelevance 0.00 · engagement 0.00

Read at source ↗← all news