Attention Expansion: Enhancing Keyphrase Extraction from Long Documents with Attention-Augmented Contextualized Embeddings

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

Attention Expansion is a novel mechanism designed to enhance keyphrase extraction (KPE) from long documents by augmenting pre-trained language model (PLM) token representations. It integrates information from out-of-context document chunks using lightweight pre-trained word embeddings (PWE) via a cross-attention layer. This approach expands the effective contextual scope of PLM-based KPE without incurring the high computational costs of full long-context attention or large language model (LLM) inference. Evaluated across five PLM backbones, two training regimes, and five benchmark corpora, Attention Expansion consistently improved KPE performance, yielding notable F1 score gains. The mechanism demonstrated benefits even for specialized models like SciBERT and KBIR, and long-context encoders such as ModernBERT (8,192 tokens), suggesting it provides complementary evidence. It introduces a modest average forward-pass overhead of 3.6% and parameter increase of 0.05-0.21%.

Key takeaway

For machine learning engineers developing keyphrase extraction systems for long documents, you should integrate attention expansion to significantly improve performance without substantial computational overhead. This mechanism efficiently broadens contextual understanding for PLM-based taggers, even with specialized or long-context models. Consider implementing the multi-head variant, which consistently outperformed baselines across diverse datasets and training regimes, to enhance your high-throughput KPE pipelines.

Key insights

Attention expansion efficiently broadens PLM context for keyphrase extraction by integrating lightweight out-of-context word embeddings.

Principles

Method

Augment PLM hidden states with cross-attention to pre-trained word embeddings (PWE) of surrounding out-of-context chunks. This enriches token representations for BIO sequence tagging without full PLM re-encoding.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.