IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization
Summary
The IDEAL framework introduces an efficient, query-aware adaptation method for Large Language Models (LLMs) in Query-focused Summarization (QFS) tasks. It addresses challenges in lengthy document processing and fine-grained query-LLM alignment through two modules. The Query-aware HyperExpert dynamically generates LLM parameter shifts using a HyperNetwork with parameter-efficient fine-tuning (PEFT) strategies like LoRA, Prompt-tuning, and Parallel Adapter, enhancing query alignment. The Query-focused Infini-attention module processes long documents by integrating a novel query-focused compressive memory with long-term linear attention, enabling models like LLAMA2-7B to handle 13,000 input tokens on a single 24GB Nvidia GeForce RTX 3090. Experiments on CovidET, QMSum, and SQuALITY datasets demonstrate IDEAL's superior performance, with IDEALLoRA surpassing baselines by up to 1.64 ROUGE-L points.
Key takeaway
For machine learning engineers developing query-focused summarization systems, especially with long documents, you should consider integrating the IDEAL framework. Its Query-aware HyperExpert dynamically fine-tunes LLMs for precise query alignment, while Query-focused Infini-attention efficiently handles extensive inputs on limited GPU memory, such as 13,000 tokens on a 24GB Nvidia GeForce RTX 3090. This approach significantly boosts performance and resource efficiency over traditional methods.
Key insights
LLMs can achieve efficient, query-focused summarization of long documents by dynamically adapting parameters and using query-aware memory.
Principles
- HyperNetworks dynamically adjust LLM parameters for fine-grained query alignment.
- Query-focused compressive memory is essential for long-document QFS.
- Repeating query instructions improves summary generation guidance.
Method
IDEAL combines a Query-aware HyperExpert (HyperNetwork-driven PEFT for dynamic parameter shifts) and Query-focused Infini-attention (compressive memory with an additional query-focused block) into decoder-only LLMs like LLaMA.
In practice
- Apply HyperNetwork-based PEFT (e.g., IDEALLoRA) for query-specific LLM adaptation.
- Utilize Query-focused Infini-attention for long document processing on 24GB GPUs.
- Replicate query instructions at document end to enhance summary relevance.
Topics
- Query-focused Summarization
- Large Language Models
- Parameter-Efficient Fine-Tuning
- HyperNetworks
- Infini-attention
- Long-context Transformers
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.