LLMCodec: Adapting Video Codecs for Efficient Weight Compression of Large Language Models

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

LLMCodec is a novel method that adapts video codecs for efficient weight compression of large language models (LLMs), addressing challenges in storage and deployment. This approach integrates affine quantization with the VVC/H.266 video codec, leveraging codecs' compatibility with matrix-structured data and configurable compression. Experiments demonstrate LLMCodec's robustness and generality, particularly at low-bit precision. For LLaMA-3-8B at 2-bit precision, it reduces perplexity by over 1.5x and improves downstream task accuracy by 21% compared to FlatQuant. It also shows consistent performance gains on LLaMA-2-7B and Qwen-2.5-Instruct-7B, achieving up to a 36% perplexity reduction on WikiText2. The framework uses a learnable affine transformation to mitigate outliers and maps transformed weights to YUV420 format for compression via VVenC with an All-Intra profile.

Key takeaway

For MLOps Engineers or AI Scientists deploying large language models, if you are struggling with memory constraints or high inference costs, consider integrating LLMCodec. This method significantly improves performance at ultra-low bit-widths. It reduces perplexity by 36% and boosts downstream accuracy by 21% for models like LLaMA-3-8B at 2-bit precision. You should explore video codec-based compression to achieve more efficient and scalable LLM deployment.

Key insights

Video codecs, combined with outlier mitigation, offer superior LLM weight compression, especially at ultra-low bit-widths.

Principles

Method

LLMCodec applies a learnable affine transformation to mitigate weight outliers, then quantizes FP32 weights to INT8 using RTN. These are mapped to YUV420 video sequences and compressed with VVC/H.266 (VVenC) using an All-Intra profile.

In practice

Topics

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.