Multilingual Coreference Resolution via Cycle-Consistent Machine Translation

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

A novel pipeline addresses the challenge of coreference resolution in low-resource languages, a task crucial for applications like machine translation, question answering, and document summarization. This method leverages machine translation (MT) from English to a target low-resource language to generate or expand necessary training data. To ensure data quality, the pipeline back-translates the generated samples and evaluates their similarity to the original English samples using cosine similarity within a BERT model's latent space. These similarity scores are then integrated into the loss function, weighting training samples based on their MT cycle consistency. Experiments across four low-resource languages demonstrate significant performance gains in coreference resolution, even enabling accurate resolution in languages previously lacking any dedicated corpora.

Key takeaway

For NLP Engineers expanding coreference resolution to low-resource languages, this pipeline offers a robust method to overcome data scarcity. You should consider implementing cycle-consistent machine translation to generate high-quality training data, leveraging back-translation and BERT-based similarity scoring. This approach enables accurate coreference resolution even in languages where no prior corpora exist, significantly broadening your model's applicability.

Key insights

Cycle-consistent machine translation effectively generates training data for low-resource multilingual coreference resolution, improving performance where corpora are scarce.

Principles

Method

Translate English coreference data to a target language, back-translate, then assess similarity with original English via BERT's latent space cosine similarity. Integrate these scores into the loss function to weight training samples.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.