SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices
Summary
SigmaScale is a novel method for Large Language Model (LLM) compression that employs learned auxiliary scaling matrices S to enhance truncated Singular Value Decomposition (SVD). Unlike analytical approaches, SigmaScale optimizes diagonal row and column scaling vectors using an activation-aware compression loss. This learned scaling effectively reduces the intrinsic rank of weight matrices, as evidenced by lower effective-rank entropy, which strongly correlates with improved compression loss. Experiments on Llama 3.1 8B Instruct and Qwen3-8B models demonstrate that SigmaScale is competitive with existing SVD-based compression methods like SVD-LLM and ASVD+ across perplexity and zero-shot benchmarks. It shows particular effectiveness in mild-to-moderate compression regimes, such as 0.90x and 0.75x parameter retention, but performance degrades sharply at aggressive 0.50x retention.
Key takeaway
For Machine Learning Engineers optimizing LLM deployment costs, SigmaScale offers a viable approach for achieving efficient model compression. If your application tolerates mild-to-moderate compression (e.g., 0.75x-0.90x parameter retention), implementing SigmaScale's learned scaling matrices can improve perplexity and zero-shot performance compared to analytical SVD methods. Be aware that performance degrades significantly under aggressive compression, so evaluate its suitability for your specific compression targets.
Key insights
SigmaScale learns activation-aware scaling matrices to reduce effective intrinsic rank, improving SVD-based LLM compression performance.
Principles
- Learned scaling lowers effective intrinsic rank.
- Intrinsic rank reduction correlates with compression loss.
- SVD compression benefits from activation-aware transformations.
Method
SigmaScale involves sensitivity probing for truncation ranks, learning diagonal row/column scaling matrices via activation-aware loss, applying truncated SVD, and post-compression fine-tuning.
In practice
- Apply SigmaScale for mild-to-moderate LLM compression.
- Consider learned scaling for SVD-based methods.
- Use post-compression fine-tuning for realignment.
Topics
- LLM Compression
- Singular Value Decomposition
- Low-Rank Decomposition
- Scaling Matrices
- Llama 3.1 8B Instruct
- Qwen3-8B
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.