SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices

2026-06-08 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

SigmaScale is a novel method for Large Language Model (LLM) compression that employs learned auxiliary scaling matrices S to enhance truncated Singular Value Decomposition (SVD). Unlike analytical approaches, SigmaScale optimizes diagonal row and column scaling vectors using an activation-aware compression loss. This learned scaling effectively reduces the intrinsic rank of weight matrices, as evidenced by lower effective-rank entropy, which strongly correlates with improved compression loss. Experiments on Llama 3.1 8B Instruct and Qwen3-8B models demonstrate that SigmaScale is competitive with existing SVD-based compression methods like SVD-LLM and ASVD+ across perplexity and zero-shot benchmarks. It shows particular effectiveness in mild-to-moderate compression regimes, such as 0.90x and 0.75x parameter retention, but performance degrades sharply at aggressive 0.50x retention.

Key takeaway

For Machine Learning Engineers optimizing LLM deployment costs, SigmaScale offers a viable approach for achieving efficient model compression. If your application tolerates mild-to-moderate compression (e.g., 0.75x-0.90x parameter retention), implementing SigmaScale's learned scaling matrices can improve perplexity and zero-shot performance compared to analytical SVD methods. Be aware that performance degrades significantly under aggressive compression, so evaluate its suitability for your specific compression targets.

Key insights

SigmaScale learns activation-aware scaling matrices to reduce effective intrinsic rank, improving SVD-based LLM compression performance.

Principles

Learned scaling lowers effective intrinsic rank.
Intrinsic rank reduction correlates with compression loss.
SVD compression benefits from activation-aware transformations.

Method

SigmaScale involves sensitivity probing for truncation ranks, learning diagonal row/column scaling matrices via activation-aware loss, applying truncated SVD, and post-compression fine-tuning.

In practice

Apply SigmaScale for mild-to-moderate LLM compression.
Consider learned scaling for SVD-based methods.
Use post-compression fine-tuning for realignment.

Topics

LLM Compression
Singular Value Decomposition
Low-Rank Decomposition
Scaling Matrices
Llama 3.1 8B Instruct
Qwen3-8B

Code references

tatsu-lab/stanford_alpaca

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.