Portho: A Corpus-Based Resource of Orthographic Neighbors in European Portuguese

· Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Social Sciences & Behavioral Studies · Depth: Expert, medium

Summary

Portho is a new corpus-based lexical resource for European Portuguese, introduced at the PROPOR 2026 conference, that provides detailed orthographic neighbor (ON) information for over 43,000 word forms. This resource includes multiple ON metrics, various ON definitions, classical neighborhood size measures, frequency-based statistics, and graded orthographic distance (OD) features. Researchers analyzed Portho's statistical properties and evaluated its effectiveness in automatic text complexity assessment using the iRead4Skills corpus. While ON features alone were not sufficient to predict readability, they offered complementary information and performed well compared to existing Portuguese resources. Portho is publicly available to support research in psycholinguistics, readability modeling, and Natural Language Processing (NLP) for Portuguese.

Key takeaway

For NLP engineers and psycholinguists working with European Portuguese, Portho offers a valuable, publicly available resource for orthographic neighbor analysis. You should integrate its detailed ON metrics and frequency-based statistics into your models to enhance research in visual word recognition, lexical access, and text complexity assessment, even though ON features alone may not fully predict readability.

Key insights

Portho provides comprehensive orthographic neighbor metrics for European Portuguese, aiding psycholinguistics and NLP.

Principles

Method

Portho was developed as a corpus-based lexical resource, providing ON metrics, frequency statistics, and graded OD features for over 43,000 word forms, then evaluated for text complexity assessment.

In practice

Topics

Best for: AI Scientist, Research Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.