From Global to Local: Learning Context-Aware Graph Representations for Document Classification and Summarization
Summary
A new data-driven method for constructing graph-based document representations is proposed, extending previous work by Bugueño and de Melo (2025). This approach replaces full attention with a dynamic sliding-window attention module to efficiently capture local and mid-range semantic dependencies and structural relations within documents. Graph Attention Networks (GATs) trained on these learned graphs achieve competitive results in document classification with lower computational resource requirements compared to prior methods. The research systematically evaluates multiple model configurations on three document classification benchmarks: BBC News, Hyperpartisan News Detection (HND), and arXiv classification (AX). An exploratory evaluation for extractive document summarization on the GovReport (GR) dataset also highlights the method's potential and current limitations, particularly regarding the trade-off between computational cost and performance, and the impact of statistical filtering on graph sparsity and efficiency.
Key takeaway
For AI Scientists and Research Scientists developing NLP systems, adopting data-driven graph construction with sliding-window attention offers a path to more computationally efficient and performant document representations. You should prioritize max-bound filtering for long documents to balance sparsity and information retention, and consider experimenting with unfiltered graphs for potential performance boosts if increased resource usage is acceptable. This approach can significantly reduce the quadratic computational complexity associated with full attention mechanisms.
Key insights
Sliding-window attention for graph construction improves document representation efficiency and performance in NLP tasks.
Principles
- Local context modeling is effective for long documents.
- Statistical filtering enhances graph structural coherence.
- Sparsity improves efficiency and generalization.
Method
The method encodes sentences using Sentence Transformers, applies sliding-window multi-head attention with non-linear activation functions (ReLU, Sigmoid, Softmax), and constructs document graphs by statistically filtering attention weights (mean-bound, max-bound) to define edges.
In practice
- Use sliding-window attention for efficient graph construction.
- Apply max-bound filtering for longer documents.
- Consider unfiltered graphs for potential performance gains.
Topics
- Graph-based Document Representations
- Sliding-Window Attention
- Graph Attention Networks
- Document Classification
- Extractive Summarization
Code references
Best for: AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.