SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices

2026-06-05 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

SigmaScale is a novel method for Large Language Model (LLM) compression that enhances truncated Singular Value Decomposition (SVD) by learning auxiliary scaling matrices. Instead of analytical derivation, SigmaScale optimizes two sets of vectors to define diagonal row and column scaling transformations under an activation-aware compression loss. This learned scaling effectively lowers the intrinsic rank of weight matrices, evidenced by reductions in effective-rank entropy, which strongly correlates with compression loss. Experiments on Llama 3.1 8B Instruct and Qwen3-8B demonstrate that SigmaScale performs competitively against closely related state-of-the-art SVD-based compression methods across perplexity and zero-shot benchmarks. This approach offers a flexible route to low-rank LLM compression by adapting to individual model weight structures.

Key takeaway

For Machine Learning Engineers or AI Scientists aiming to reduce LLM inference computing costs, SigmaScale presents a competitive option. By utilizing learned activation-aware scaling matrices with SVD-based compression, you can achieve efficient model compression that adapts to specific model weight structures. Consider evaluating SigmaScale for applications requiring reduced computational overhead, especially when deploying Llama 3.1 8B Instruct or Qwen3-8B, to optimize performance on resource-constrained environments.

Key insights

SigmaScale enhances SVD-based LLM compression by learning activation-aware scaling matrices, improving efficiency.

Principles

Learned scaling matrices reduce effective intrinsic rank of weights.
Effective-rank entropy reduction correlates with compression loss.
Activation-aware transformations enable flexible low-rank compression.

Method

SigmaScale optimizes two sets of vectors for diagonal row/column scaling transformations under an activation-aware compression loss to learn auxiliary scaling matrices for SVD-based LLM compression.

In practice

Apply learned scaling to SVD-based LLM compression.
Adapt compression to individual model weight structures.
Reduce LLM inference computing cost.

Topics

LLM Compression
Singular Value Decomposition
Low-Rank Decomposition
Scaling Matrices
Llama 3.1
Qwen3

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.