JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

Bitdefender researchers and the University of Bucharest have introduced JumpLoRA, a novel framework designed to enhance continual learning (CL) in Large Language Models (LLMs) by adaptively inducing sparsity in Low-Rank Adaptation (LoRA) blocks. This method leverages JumpReLU gating to achieve dynamic parameter isolation, which effectively mitigates catastrophic forgetting and task interference. JumpLoRA is highly modular and compatible with existing LoRA-based CL approaches, demonstrating significant performance improvements when integrated with IncLoRA and outperforming the state-of-the-art ELLA method on both the Standard CL Benchmark and the Long Sequence Benchmark. The framework works by learning a threshold alongside LoRA weights to cut off low-magnitude updates, creating sparse adapters that minimize overlap between tasks. Experiments were conducted on T5 models with 770M parameters using 8 Nvidia H200 GPUs, showing consistent gains in Overall Accuracy, Backward Transfer, and Forward Transfer across various task orders.

Key takeaway

For AI Engineers and Research Scientists working on continual learning for LLMs, JumpLoRA offers a robust solution to mitigate catastrophic forgetting. You should consider integrating JumpLoRA into your existing PEFT-based CL pipelines, especially if using IncLoRA or ELLA, as it consistently improves performance by adaptively inducing sparsity and reducing task interference. This approach allows for more efficient knowledge acquisition without costly full model retraining.

Key insights

JumpLoRA uses adaptive sparsity via JumpReLU gating to reduce catastrophic forgetting in LLM continual learning.

Principles

Method

JumpLoRA applies a learnable JumpReLU function to LoRA weight updates, gradually introducing sparsification. It initializes a threshold based on parameter magnitudes and interpolates between dense and sparse updates, ensuring stable learning and effective parameter isolation.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.