From Global to Local: Learning Context-Aware Graph Representations for Document Classification and Summarization

2026-03-03 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

A new data-driven method for constructing graph-based document representations is proposed, extending previous work by Bugueño and de Melo (2025). This approach replaces full attention with a dynamic sliding-window attention module to efficiently capture local and mid-range semantic dependencies and structural relations within documents. Graph Attention Networks (GATs) trained on these learned graphs achieve competitive results in document classification with lower computational resource requirements compared to prior methods. The research systematically evaluates multiple model configurations on three document classification benchmarks: BBC News, Hyperpartisan News Detection (HND), and arXiv classification (AX). An exploratory evaluation for extractive document summarization on the GovReport (GR) dataset also highlights the method's potential and current limitations, particularly regarding the trade-off between computational cost and performance, and the impact of statistical filtering on graph sparsity and efficiency.

Key takeaway

For AI Scientists and Research Scientists developing NLP systems, adopting data-driven graph construction with sliding-window attention offers a path to more computationally efficient and performant document representations. You should prioritize max-bound filtering for long documents to balance sparsity and information retention, and consider experimenting with unfiltered graphs for potential performance boosts if increased resource usage is acceptable. This approach can significantly reduce the quadratic computational complexity associated with full attention mechanisms.

Key insights

Sliding-window attention for graph construction improves document representation efficiency and performance in NLP tasks.

Principles

Local context modeling is effective for long documents.
Statistical filtering enhances graph structural coherence.
Sparsity improves efficiency and generalization.

Method

The method encodes sentences using Sentence Transformers, applies sliding-window multi-head attention with non-linear activation functions (ReLU, Sigmoid, Softmax), and constructs document graphs by statistically filtering attention weights (mean-bound, max-bound) to define edges.

In practice

Use sliding-window attention for efficient graph construction.
Apply max-bound filtering for longer documents.
Consider unfiltered graphs for potential performance gains.

Topics

Graph-based Document Representations
Sliding-Window Attention
Graph Attention Networks
Document Classification
Extractive Summarization

Code references

idalr/SlidingWindowAttnGraphs

Best for: AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.