Generating Concept Lexicalizations via Dictionary-Based Cross-Lingual Sense Projection

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new method for automatically expanding WordNet-style lexical resources to new languages has been developed, focusing on generating senses by associating target-language lemmas with existing lexical concepts through semantic projection. The approach utilizes a sense-tagged English corpus and its translation, projecting English synsets onto aligned target-language tokens and assigning corresponding lemmas to those synsets. To enhance alignment quality, a pre-trained base aligner is augmented with a bilingual dictionary, which also serves to filter out incorrect sense projections. This project-and-filter strategy has been evaluated across multiple languages, demonstrating improved precision compared to prior methods, dictionary-based, and large language model baselines, while maintaining interpretability and requiring minimal external resources. The code, documentation, and generated sense inventories are planned for public release.

Key takeaway

For research scientists working on multilingual natural language processing or lexical resource creation, this dictionary-based cross-lingual sense projection method offers a precise and interpretable way to expand WordNet-style resources. You should consider integrating this project-and-filter strategy to improve the quality of your generated sense inventories, especially given its low external resource requirements.

Key insights

A dictionary-augmented projection method improves cross-lingual WordNet expansion precision with minimal resources.

Principles

Method

The method projects English synsets onto aligned target-language tokens from a translated corpus, assigning lemmas. A bilingual dictionary augments alignment and filters projections to ensure quality.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.