Gender bias in machine translation: an evaluation in the English-Portuguese language pair
Summary
A study evaluated gender bias in machine translation (MT) from English to Portuguese across three commercial translators (Google Translate, Microsoft Translator, Amazon Translate) and three general-purpose language models (GPT-3.5 Turbo, GPT-4o-mini, Llama-3 8B-Instruct). Utilizing the WinoMT test corpus, the quantitative analysis measured translation accuracy and bias using ΔG and ΔS metrics. All systems demonstrated bias, performing better with masculine target entities (positive ΔG) and those aligning with occupational stereotypes (positive ΔS). A qualitative analysis, applying Systemic-Functional Theory to "nurse" and "physician" professions, revealed how bias alters meaning and compromises referential cohesion. The research validated an adapted evaluation algorithm for Portuguese and emphasized the ongoing nature of gender bias as a socio-technical issue, advocating for continuous evaluation and context-specific mitigation strategies.
Key takeaway
For AI Product Managers developing or deploying MT systems, you should prioritize continuous evaluation of gender bias, especially in critical domains. Your teams must integrate adapted evaluation algorithms, like the one validated for Portuguese, to measure and mitigate harm from biased translations. This proactive approach is crucial to ensure referential cohesion and prevent unintended meaning shifts in translated content.
Key insights
Machine translation systems exhibit gender bias, favoring masculine and stereotyped occupational translations from English to Portuguese.
Principles
- Gender bias persists across commercial MT and large language models.
- Bias manifests as better performance for masculine and stereotyped entities.
- Qualitative analysis reveals how bias alters meaning and cohesion.
Method
The study used the WinoMT corpus, adapted an evaluation algorithm for Portuguese, and applied quantitative metrics (accuracy, ΔG, ΔS) alongside qualitative Systemic-Functional Theory analysis.
In practice
- Use WinoMT corpus for gender bias evaluation.
- Apply ΔG and ΔS metrics to quantify bias.
- Consider Systemic-Functional Theory for qualitative analysis.
Topics
- Gender Bias
- Machine Translation
- English-Portuguese MT
- WinoMT Corpus
- Language Model Evaluation
Best for: Research Scientist, AI Product Manager, AI Scientist, NLP Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.