New technique makes AI models leaner and faster while they’re still learning
Summary
Researchers from MIT's CSAIL, Max Planck Institute, ELLIS, ETH, and Liquid AI have developed CompreSSM, a new technique to compress artificial intelligence models during their training process, rather than after. Published on April 9, 2026, this method targets state-space models, which are used in applications like language processing and robotics. CompreSSM utilizes control theory and Hankel singular values to identify and remove unnecessary model components early in training, specifically after only about 10 percent of the process. This allows the remaining 90 percent of training to proceed with a smaller, faster model without sacrificing performance. For example, on image classification benchmarks, compressed models trained up to 1.5 times faster while maintaining accuracy, and a Mamba model saw approximately 4x training speedups, reducing a 128-dimensional model to about 12 dimensions with competitive performance.
Key takeaway
For AI engineers developing state-space models, CompreSSM offers a significant opportunity to reduce computational costs and training time without compromising model performance. By integrating compression directly into the learning process, you can achieve the performance of larger models with the efficiency of smaller ones. This approach is particularly beneficial for architectures like Mamba and could extend to transformer alternatives, providing a principled way to build more efficient AI systems from the outset.
Key insights
CompreSSM compresses AI models during training by identifying and removing non-essential components early using control theory.
Principles
- Component importance stabilizes early in training.
- Control theory can guide model compression.
- In-training compression avoids post-training costs.
Method
CompreSSM uses Hankel singular values to rank internal state contributions after ~10% of training, then discards less important dimensions, allowing the remaining training to proceed with a smaller model.
In practice
- Apply to state-space models for efficiency.
- Effective for multi-input, multi-output (MIMO) models.
- Consider for Mamba and linear attention architectures.
Topics
- CompreSSM
- State-Space Models
- AI Model Compression
- Control Theory
- AI Training Efficiency
Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by MIT News - Artificial intelligence.