Enhanced Universal Dependencies in the Wild: Evaluating Portuguese EUD Parsing in Realistic Scenarios
Summary
A study evaluated the robustness of Enhanced Universal Dependencies (EUD) conversion rules for Portuguese across diverse text genres and domains, moving beyond previous evaluations limited to journalistic text and gold-standard basic syntactic trees. The research specifically assessed the performance of Portuguese-specific EUD rules within realistic parsing pipelines that utilize automatically generated basic syntax. Findings indicate that the Portuguese-specific rules consistently surpass universal rules in performance. However, the accuracy of the EUD parsing significantly degrades when relying on automatically generated basic syntax, especially when the input text's domain differs from the basic parser's training data. The study also includes a detailed error analysis, pinpointing challenging linguistic phenomena and scenarios.
Key takeaway
For NLP Engineers developing Portuguese dependency parsers, you should prioritize using Portuguese-specific EUD conversion rules over universal ones, as they consistently yield better results. Be aware that relying on automatically generated basic syntax significantly degrades performance, especially with domain shifts, necessitating careful consideration of your basic parser's training data and potential fine-tuning for target domains.
Key insights
Portuguese-specific EUD rules outperform universal rules, but automatic basic syntax severely impacts performance.
Principles
- Domain mismatch degrades parser performance.
- Automatic syntax impacts EUD accuracy.
Method
Evaluated Portuguese EUD conversion rules using diverse text genres and domains, comparing performance with automatically generated basic syntax against gold-standard basic trees to assess robustness "in the wild."
In practice
- Prioritize domain-matched training data.
- Use Portuguese-specific EUD rules.
Topics
- Enhanced Universal Dependencies
- Universal Dependencies
- Syntactic Parsing
- Portuguese Language Processing
- Domain Mismatch
Best for: AI Scientist, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.