Foundation-Preserving Adaptation via Generalized Rayleigh-Quotient Optimization

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Foundation-Preserving LoRA (FoLoRA) is a new forgetting-aware optimization framework designed to adapt foundation models to specialized tasks without degrading their pretraining-acquired nontarget capabilities. While traditional finetuning often compromises these broader skills, FoLoRA addresses this by regulating the adaptation-preservation trade-off during training. It defines a forgetting penalty using pretraining-proxy activations and a task utility from downstream task activations. Update directions are then scored by task utility per unit forgetting penalty via a generalized Rayleigh quotient, enabling direction-wise gated Adam updates that attenuate low utility-to-penalty directions. FoLoRA estimates the forgetting penalty by constructing pretraining proxy calibration data through sampling from the pretrained model itself. Experiments across math, code, and instruction following adaptation tasks demonstrate that FoLoRA achieves the strongest balance, improving target task performance while best preserving non-target capabilities compared to baselines.

Key takeaway

For Machine Learning Engineers adapting foundation models to specialized downstream tasks, you should consider FoLoRA to mitigate the degradation of nontarget capabilities. This framework offers a robust approach to balance task performance with the preservation of pretraining knowledge, outperforming existing forgetting-aware methods. Integrating FoLoRA can ensure your adapted models maintain broader utility while excelling in specific applications, avoiding the common pitfall of catastrophic forgetting.

Key insights

FoLoRA adapts foundation models by optimizing task utility per unit forgetting penalty, balancing specialization with preservation.

Principles

Method

FoLoRA defines forgetting penalty and task utility, then scores update directions via a generalized Rayleigh quotient. This guides gated Adam updates, attenuating low utility-to-penalty directions.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.