Extending an Ensemble Baseline with Corpus-Based Graph Features for Portuguese Pun Detection

· Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A study presented at PROPOR 2026 investigates enhancing Portuguese pun detection by integrating corpus-based graph features with existing TF-IDF ensemble methods. Researchers constructed three graph representations from the Puntuguese corpus: a Co-occurrence graph, a PPMI-weighted graph, and a Pun-Context graph. Each graph was converted into low-dimensional node embeddings using TruncatedSVD, aggregated into document-level features, and then concatenated with TF-IDF representations within a soft-voting ensemble. Experimental results on the test set indicate that graph-based enrichment does not consistently improve performance. Specifically, Pun-Context and PPMI graphs yielded the strongest augmented results, while combining all graph types degraded overall performance. These findings suggest that the effectiveness of graph-based information is highly dependent on the encoding and aggregation methods for lexical relations at the document level.

Key takeaway

For research scientists developing natural language processing models for nuanced lexical tasks like pun detection, you should consider integrating graph-based features to capture complex contextual interactions. However, carefully evaluate different graph representations and aggregation strategies, as not all combinations will yield performance improvements. Focus on methods like Pun-Context or PPMI-weighted graphs, and avoid naive aggregation of diverse graph types to prevent performance degradation.

Key insights

Graph-based features can augment pun detection, but their utility depends on specific lexical relation encoding.

Principles

Method

Construct Co-occurrence, PPMI-weighted, and Pun-Context graphs. Convert graphs to low-dimensional node embeddings via TruncatedSVD. Aggregate embeddings into document-level features. Concatenate with TF-IDF in a soft-voting ensemble.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.