A Comparison of Methods to Bias Translation Toward Portuguese Variants

· Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

This study compares four methods for biasing Machine Translation (MT) systems toward European Portuguese (EP) when translating into Portuguese, addressing the resource imbalance that typically favors Brazilian Portuguese (BP). The methods evaluated target different stages of the MT lifecycle: reranking n-best MT outputs using a variant classifier, biasing hypothesis generation during inference, fine-tuning existing models specifically for EP, and employing a Large Language Model (LLM)-based approach. Researchers found that all methods successfully biased translation outputs to some degree. The LLM-based approach achieved the numerically highest results, though the influence of memorization on its performance requires further investigation.

Key takeaway

For research scientists developing MT systems for multilingual contexts, you should investigate methods to bias translation outputs towards specific language variants, especially when dealing with resource-imbalanced languages like Portuguese. Experiment with reranking, inference-time biasing, and fine-tuning, but also explore LLM-based approaches while carefully assessing potential memorization effects to ensure robust variant control.

Key insights

Biasing MT towards minority language variants is achievable through various lifecycle interventions.

Principles

Method

Methods include reranking n-best outputs, biasing inference generation, fine-tuning, and using LLM-based translation to favor a target language variant.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.