Foundation-Preserving Adaptation via Generalized Rayleigh-Quotient Optimization
Summary
Foundation-Preserving LoRA (FoLoRA) is a new forgetting-aware optimization framework designed to adapt foundation models to specialized tasks without degrading their pretraining-acquired nontarget capabilities. While traditional finetuning often compromises these broader skills, FoLoRA addresses this by regulating the adaptation-preservation trade-off during training. It defines a forgetting penalty using pretraining-proxy activations and a task utility from downstream task activations. Update directions are then scored by task utility per unit forgetting penalty via a generalized Rayleigh quotient, enabling direction-wise gated Adam updates that attenuate low utility-to-penalty directions. FoLoRA estimates the forgetting penalty by constructing pretraining proxy calibration data through sampling from the pretrained model itself. Experiments across math, code, and instruction following adaptation tasks demonstrate that FoLoRA achieves the strongest balance, improving target task performance while best preserving non-target capabilities compared to baselines.
Key takeaway
For Machine Learning Engineers adapting foundation models to specialized downstream tasks, you should consider FoLoRA to mitigate the degradation of nontarget capabilities. This framework offers a robust approach to balance task performance with the preservation of pretraining knowledge, outperforming existing forgetting-aware methods. Integrating FoLoRA can ensure your adapted models maintain broader utility while excelling in specific applications, avoiding the common pitfall of catastrophic forgetting.
Key insights
FoLoRA adapts foundation models by optimizing task utility per unit forgetting penalty, balancing specialization with preservation.
Principles
- Regulate adaptation-preservation trade-off during training.
- Score update directions by utility per forgetting penalty.
- Sample pretraining data from the model itself.
Method
FoLoRA defines forgetting penalty and task utility, then scores update directions via a generalized Rayleigh quotient. This guides gated Adam updates, attenuating low utility-to-penalty directions.
In practice
- Improve math, code, and instruction following models.
- Adapt models without losing general capabilities.
- Apply generalized Rayleigh quotient for optimization.
Topics
- Foundation Models
- Model Adaptation
- Catastrophic Forgetting
- LoRA
- Rayleigh Quotient Optimization
- Machine Learning
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.