CompreSSM: Compressing State-Space Models During Training with Hankel Singular Values
Summary
CompreSSM, a new approach highlighted by MIT CSAIL News and accepted at ICLR 2026, enables the compression of state-space models during their training phase. Unlike traditional methods that prune or distill models after full training, CompreSSM integrates pruning directly into the training loop. It leverages control-theoretic Hankel singular values (HSVs) to identify and remove dispensable model subcomponents on-the-fly. This method quantifies each state's contribution to input-output behavior, allowing for the iterative pruning of low-HSV subspaces. Early benchmarks indicate CompreSSM achieved up to 3x speedups in training state-space models for sequence tasks, maintaining test accuracy without perceptible loss compared to standard training followed by post-hoc pruning. This innovation significantly reduces computational costs and hardware requirements.
Key takeaway
For Machine Learning Engineers developing state-space models for sequential tasks, CompreSSM offers a significant shift in optimization strategy. You should consider integrating in-training pruning using Hankel singular values to achieve up to 3x faster training and reduced memory footprints without sacrificing model accuracy. This approach allows your teams to develop efficient models more rapidly, particularly beneficial for resource-constrained deployments or hyperparameter search explorations.
Key insights
CompreSSM uses Hankel singular values to prune state-space models during training, achieving faster, leaner models without accuracy loss.
Principles
- Hankel singular values quantify state contribution.
- Small HSVs indicate dispensable model components.
- Pruning during training accelerates learning.
Method
CompreSSM computes Hankel singular values at intervals during training, prunes unnecessary state subspaces on-the-fly, then continues training the reduced model for adaptation.
In practice
- Apply to S4, LSSL, DSS models.
- Optimize sequence tasks like speech or time series.
- Reduce training costs on limited GPUs.
Topics
- State-Space Models
- Model Compression
- Hankel Singular Values
- In-Training Pruning
- Control Theory
- Sequential Tasks
Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.