LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning

2026-04-16 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

LongAct is a novel strategy for Reinforcement Learning (RL) that improves Large Language Model (LLM) reasoning in long contexts by exploiting intrinsic activation patterns. Researchers observed high-magnitude activations in query and key vectors during long-context processing, similar to critical weights identified in model quantization. Hypothesizing these activations are pivotal for optimization, LongAct shifts from uniform to saliency-guided sparse weight updates, focusing only on weights associated with these significant activations. This method achieved an approximate 8% improvement on the LongBench v2 benchmark and enhanced generalization on the RULER benchmark. LongAct also demonstrated universality, consistently boosting performance across various RL algorithms like GRPO and DAPO, with ablation studies confirming the importance of focusing on salient features.

Key takeaway

For AI Engineers optimizing LLMs for long-context reasoning, consider implementing saliency-guided sparse weight updates. LongAct's approach, which focuses on high-magnitude query and key activations, has shown an 8% improvement on LongBench v2 and improved generalization, suggesting a more efficient and effective training paradigm than uniform updates. This could significantly enhance model performance and resource utilization in your long-context applications.

Key insights

High-magnitude activations in query/key vectors are critical for long-context LLM reasoning and RL optimization.

Principles

Long-context reasoning exhibits sparse structure.
Saliency-guided updates outperform uniform updates.

Method

LongAct selectively updates only weights associated with high-magnitude query and key activations, shifting from uniform to saliency-guided sparse updates for RL optimization in LLMs.

In practice

Apply sparse updates based on activation magnitude.
Integrate with existing RL algorithms like GRPO or DAPO.

Topics

LongAct
Long-Context Reinforcement Learning
Large Language Models
Saliency-Guided Updates
Activation Patterns

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.