A Comparison of Methods to Bias Translation Toward Portuguese Variants
Summary
This study compares four methods for biasing Machine Translation (MT) systems toward European Portuguese (EP) when translating into Portuguese, addressing the resource imbalance that typically favors Brazilian Portuguese (BP). The methods evaluated target different stages of the MT lifecycle: reranking n-best MT outputs using a variant classifier, biasing hypothesis generation during inference, fine-tuning existing models specifically for EP, and employing a Large Language Model (LLM)-based approach. Researchers found that all methods successfully biased translation outputs to some degree. The LLM-based approach achieved the numerically highest results, though the influence of memorization on its performance requires further investigation.
Key takeaway
For research scientists developing MT systems for multilingual contexts, you should investigate methods to bias translation outputs towards specific language variants, especially when dealing with resource-imbalanced languages like Portuguese. Experiment with reranking, inference-time biasing, and fine-tuning, but also explore LLM-based approaches while carefully assessing potential memorization effects to ensure robust variant control.
Key insights
Biasing MT towards minority language variants is achievable through various lifecycle interventions.
Principles
- Resource imbalance favors dominant language variants.
- Multiple MT lifecycle stages allow for variant biasing.
Method
Methods include reranking n-best outputs, biasing inference generation, fine-tuning, and using LLM-based translation to favor a target language variant.
In practice
- Rerank MT outputs with a variant classifier.
- Fine-tune models for specific language variants.
- Consider LLM-based translation for variant control.
Topics
- Machine Translation
- Portuguese Variants
- European Portuguese Bias
- LLM-based Translation
- N-best Reranking
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.