GRASS: Gradient-based Adaptive Layer-wise Importance Sampling for Memory-efficient Large Language Model Fine-tuning

2026-04-10 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

GRASS (Gradient-based Adaptive Layer-wise Importance Sampling) is a new framework designed to overcome the memory constraints of full-parameter fine-tuning for large language models (LLMs). Existing low-rank adaptation methods, while memory-efficient, often compromise model expressiveness and performance. Layer-wise fine-tuning methods, which use static importance sampling, fail to adapt to varying layer importance across tasks and training stages. GRASS addresses these issues by employing mean gradient norms as a dynamic, task-aware, and training-stage-aware metric for estimating layer importance. It adaptively adjusts layer sampling probabilities and incorporates a layer-wise optimizer state offloading mechanism to further reduce memory usage. Experiments show GRASS improves accuracy by up to 4.38 points and reduces memory usage by up to 19.97% compared to state-of-the-art methods across multiple models and benchmarks.

Key takeaway

For AI Engineers and Research Scientists struggling with GPU memory limitations during LLM fine-tuning, GRASS offers a compelling solution. By dynamically identifying and prioritizing important layers, you can achieve superior performance with significantly reduced memory footprint. Consider integrating GRASS into your fine-tuning workflows, especially for large models where full-parameter updates are infeasible, to improve both efficiency and accuracy.

Key insights

GRASS adaptively samples LLM layers for fine-tuning based on gradient norms, significantly reducing memory while boosting performance.

Principles

Layer importance varies by task and training stage.
Gradient norms indicate layer importance.
Adaptive sampling improves fine-tuning efficiency.

Method

GRASS uses mean gradient norms to estimate layer importance, adaptively adjusts sampling probabilities, and offloads optimizer states to reduce memory during LLM fine-tuning.

In practice

Apply GRASS for memory-constrained LLM fine-tuning.
Utilize gradient norms to identify critical layers.
Implement optimizer state offloading for memory savings.

Topics

GRASS Framework
Large Language Model Fine-tuning
Memory-efficient Training
Gradient-based Sampling
Layer-wise Importance

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.