From Global to Local: Learning Context-Aware Graph Representations for Document Classification and Summarization

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

A new data-driven method for constructing graph-based document representations is proposed, extending previous work by Bugueño and de Melo (2025). This approach replaces full attention with a dynamic sliding-window attention module to efficiently capture local and mid-range semantic dependencies and structural relations within documents. Graph Attention Networks (GATs) trained on these learned graphs achieve competitive results in document classification with lower computational resource requirements compared to prior methods. The research systematically evaluates multiple model configurations on three document classification benchmarks: BBC News, Hyperpartisan News Detection (HND), and arXiv classification (AX). An exploratory evaluation for extractive document summarization on the GovReport (GR) dataset also highlights the method's potential and current limitations, particularly regarding the trade-off between computational cost and performance, and the impact of statistical filtering on graph sparsity and efficiency.

Key takeaway

For AI Scientists and Research Scientists developing NLP systems, adopting data-driven graph construction with sliding-window attention offers a path to more computationally efficient and performant document representations. You should prioritize max-bound filtering for long documents to balance sparsity and information retention, and consider experimenting with unfiltered graphs for potential performance boosts if increased resource usage is acceptable. This approach can significantly reduce the quadratic computational complexity associated with full attention mechanisms.

Key insights

Sliding-window attention for graph construction improves document representation efficiency and performance in NLP tasks.

Principles

Method

The method encodes sentences using Sentence Transformers, applies sliding-window multi-head attention with non-linear activation functions (ReLU, Sigmoid, Softmax), and constructs document graphs by statistically filtering attention weights (mean-bound, max-bound) to define edges.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.