New technique makes AI models leaner and faster while they’re still learning

2026-04-09 · Source: MIT News - Artificial intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, medium

Summary

Researchers from MIT's CSAIL, Max Planck Institute, ELLIS, ETH, and Liquid AI have developed CompreSSM, a new technique to compress artificial intelligence models during their training process, rather than after. Published on April 9, 2026, this method targets state-space models, which are used in applications like language processing and robotics. CompreSSM utilizes control theory and Hankel singular values to identify and remove unnecessary model components early in training, specifically after only about 10 percent of the process. This allows the remaining 90 percent of training to proceed with a smaller, faster model without sacrificing performance. For example, on image classification benchmarks, compressed models trained up to 1.5 times faster while maintaining accuracy, and a Mamba model saw approximately 4x training speedups, reducing a 128-dimensional model to about 12 dimensions with competitive performance.

Key takeaway

For AI engineers developing state-space models, CompreSSM offers a significant opportunity to reduce computational costs and training time without compromising model performance. By integrating compression directly into the learning process, you can achieve the performance of larger models with the efficiency of smaller ones. This approach is particularly beneficial for architectures like Mamba and could extend to transformer alternatives, providing a principled way to build more efficient AI systems from the outset.

Key insights

CompreSSM compresses AI models during training by identifying and removing non-essential components early using control theory.

Principles

Component importance stabilizes early in training.
Control theory can guide model compression.
In-training compression avoids post-training costs.

Method

CompreSSM uses Hankel singular values to rank internal state contributions after ~10% of training, then discards less important dimensions, allowing the remaining training to proceed with a smaller model.

In practice

Apply to state-space models for efficiency.
Effective for multi-input, multi-output (MIMO) models.
Consider for Mamba and linear attention architectures.

Topics

CompreSSM
State-Space Models
AI Model Compression
Control Theory
AI Training Efficiency

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by MIT News - Artificial intelligence.