Generating Concept Lexicalizations via Dictionary-Based Cross-Lingual Sense Projection
Summary
A new method for automatically expanding WordNet-style lexical resources to new languages has been developed, focusing on generating senses by associating target-language lemmas with existing lexical concepts through semantic projection. The approach utilizes a sense-tagged English corpus and its translation, projecting English synsets onto aligned target-language tokens and assigning corresponding lemmas to those synsets. To enhance alignment quality, a pre-trained base aligner is augmented with a bilingual dictionary, which also serves to filter out incorrect sense projections. This project-and-filter strategy has been evaluated across multiple languages, demonstrating improved precision compared to prior methods, dictionary-based, and large language model baselines, while maintaining interpretability and requiring minimal external resources. The code, documentation, and generated sense inventories are planned for public release.
Key takeaway
For research scientists working on multilingual natural language processing or lexical resource creation, this dictionary-based cross-lingual sense projection method offers a precise and interpretable way to expand WordNet-style resources. You should consider integrating this project-and-filter strategy to improve the quality of your generated sense inventories, especially given its low external resource requirements.
Key insights
A dictionary-augmented projection method improves cross-lingual WordNet expansion precision with minimal resources.
Principles
- Semantic projection links lemmas to synsets.
- Bilingual dictionaries enhance alignment quality.
- Filtering incorrect projections improves precision.
Method
The method projects English synsets onto aligned target-language tokens from a translated corpus, assigning lemmas. A bilingual dictionary augments alignment and filters projections to ensure quality.
In practice
- Expand WordNet-style resources.
- Generate cross-lingual sense inventories.
Topics
- WordNet
- Lexical Resources
- Sense Generation
- Cross-Lingual Projection
- Bilingual Dictionary
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.