SSM Adapters via Hankel Reduced-order Modeling: Injection Site Determines Task Suitability in Long-Context Fine-Tuning

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

SSM Adapters via Hankel Reduced-order Modeling (HRM adapter) are introduced as a parameter-efficient fine-tuning (PEFT) method specifically designed for tasks requiring sequential state accumulation. This SSM-based residual module is initialized using Balanced Truncation of empirical Hankel Grammians and leverages the time-invariance of its system matrix $\bar{A}$ to enable an exact FFT-based parallel scan, achieving computational parity with LoRA across all context lengths. In evaluations on Mistral-7B with 8.4M trainable parameters, HRM consistently outperformed LoRA variants on LongBench tasks, demonstrating a +34.8% relative accuracy improvement on QuALITY and a +71.6% relative ROUGE-1 score increase on QMSum. Furthermore, HRM showed superior performance across 18 configurations of synthetic state-tracking (DFA, Parity) and character-level language modeling (enwik8), with gate analysis revealing its ability to effectively modulate recurrence.

Key takeaway

For Machine Learning Engineers fine-tuning large language models for long-context sequential tasks, consider implementing SSM Adapters via Hankel Reduced-order Modeling (HRM) as a superior alternative to LoRA. HRM adapters significantly boost performance on benchmarks like LongBench, QuALITY, and QMSum, offering computational parity with LoRA while effectively modulating recurrence. Your teams should evaluate HRM for applications requiring robust sequential state accumulation, potentially achieving substantial accuracy gains over traditional low-rank adaptation methods.

Key insights

HRM adapters, an SSM-based PEFT method, outperform LoRA for long-context sequential tasks by modulating recurrence via Hankel Reduced-order Modeling.

Principles

Method

The HRM adapter is an SSM-based residual module initialized via Balanced Truncation of empirical Hankel Grammians, enabling an exact FFT-based parallel scan through the time-invariance of its system matrix $\bar{A}$.

In practice

Topics

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.