How LoRA Remembers? A Parametric Memory Law for LLM Finetuning
Summary
A new study introduces the Parametric Memory Law, a robust power law that quantifies the exact parametric memory capacity of Large Language Models (LLMs) when finetuned with Low-Rank Adaptation (LoRA). Addressing the gap in qualitative evaluations, researchers employed LoRA as a controlled memory capacity probe within the latent space. Their analysis reveals a deterministic phase transition at the token level, establishing that a prediction probability of p > 0.5 is a sufficient condition for verbatim recall under greedy decoding. Based on these insights, the study proposes MemFT, a threshold-guided optimization strategy. MemFT dynamically redistributes the training budget towards sub-threshold tokens, demonstrating enhanced memory fidelity and efficiency in empirical evaluations. Code for this work will be released at https://github.com/zjunlp/ParametricMemoryLaw.
Key takeaway
For Machine Learning Engineers finetuning LLMs with LoRA for continuous knowledge updates, understanding the Parametric Memory Law is crucial. This law quantifies how LoRA stores information, revealing that tokens with prediction probability p > 0.5 are reliably recalled. You should consider implementing MemFT, the proposed threshold-guided optimization strategy. MemFT dynamically focuses your training budget on less-remembered tokens (p ≤ 0.5). This approach can significantly enhance memory fidelity and training efficiency in your LoRA finetuning workflows.
Key insights
The Parametric Memory Law quantifies LoRA's exact memory capacity in LLMs, linking loss reduction to effective parameters and sequence length.
Principles
- Loss reduction (Delta L) follows a power law with effective parameters and sequence length.
- Prediction probability p > 0.5 enables verbatim recall via greedy decoding.
- Dynamic training budget redistribution improves memory fidelity.
Method
LoRA is used as a controlled memory capacity probe in the latent space to systematically quantify exact parametric memory. MemFT optimizes by dynamically redistributing training budget to sub-threshold tokens.
In practice
- Apply MemFT to improve LoRA finetuning efficiency.
- Target tokens with p ≤ 0.5 for focused memory updates.
Topics
- Large Language Models
- LoRA Finetuning
- Parametric Memory Law
- MemFT Optimization
- Memory Fidelity
- Greedy Decoding
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.