Portho: A Corpus-Based Resource of Orthographic Neighbors in European Portuguese
Summary
Portho is a new corpus-based lexical resource for European Portuguese, introduced at the PROPOR 2026 conference, that provides detailed orthographic neighbor (ON) information for over 43,000 word forms. This resource includes multiple ON metrics, various ON definitions, classical neighborhood size measures, frequency-based statistics, and graded orthographic distance (OD) features. Researchers analyzed Portho's statistical properties and evaluated its effectiveness in automatic text complexity assessment using the iRead4Skills corpus. While ON features alone were not sufficient to predict readability, they offered complementary information and performed well compared to existing Portuguese resources. Portho is publicly available to support research in psycholinguistics, readability modeling, and Natural Language Processing (NLP) for Portuguese.
Key takeaway
For NLP engineers and psycholinguists working with European Portuguese, Portho offers a valuable, publicly available resource for orthographic neighbor analysis. You should integrate its detailed ON metrics and frequency-based statistics into your models to enhance research in visual word recognition, lexical access, and text complexity assessment, even though ON features alone may not fully predict readability.
Key insights
Portho provides comprehensive orthographic neighbor metrics for European Portuguese, aiding psycholinguistics and NLP.
Principles
- ONs influence reading speed and lexical access.
- ON features complement readability prediction.
Method
Portho was developed as a corpus-based lexical resource, providing ON metrics, frequency statistics, and graded OD features for over 43,000 word forms, then evaluated for text complexity assessment.
In practice
- Utilize Portho for psycholinguistic studies.
- Integrate ON features into readability models.
- Apply Portho in Portuguese NLP tasks.
Topics
- Portho
- Orthographic Neighbors
- European Portuguese
- Lexical Resource
- Text Complexity Assessment
Best for: AI Scientist, Research Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.