ConGA: Guidelines for Contextual Gender Annotation. A Framework for Annotating Gender in Machine Translation
Summary
The ConGA (Contextual Gender Annotation) framework provides linguistically grounded guidelines for word-level gender annotation, specifically addressing challenges in Machine Translation (MT) and Large Language Models (LLMs) when translating between gender-neutral and morphologically gendered languages. English, being largely gender-neutral, often results in MT systems defaulting to masculine forms when translated into languages like Italian, which requires explicit grammatical gender agreement. ConGA distinguishes semantic gender in English using Masculine (M), Feminine (F), and Ambiguous (A) tags, and grammatical gender in Italian with Masculine (M) and Feminine (F) tags, incorporating entity-level identifiers for cross-sentence tracking. Applying ConGA to the gENder-IT dataset created a gold-standard resource, revealing systematic masculine overuse and inconsistent feminine realization in current MT systems, thereby offering a methodology and benchmark for more gender-aware multilingual NLP.
Key takeaway
For AI scientists and research scientists developing or evaluating Machine Translation and Large Language Models, adopting the ConGA framework is crucial for identifying and mitigating gender bias. Your systems' accuracy and fairness can be significantly improved by using ConGA's fine-grained annotation to create gold-standard datasets and benchmark gender performance, especially when translating between gender-neutral and morphologically gendered languages. This approach helps ensure more equitable and accurate multilingual NLP systems.
Key insights
ConGA provides a framework for fine-grained gender annotation to mitigate bias in machine translation.
Principles
- Distinguish semantic and grammatical gender.
- Track entities across sentences for gender consistency.
Method
ConGA uses M/F/A tags for English semantic gender and M/F tags for Italian grammatical gender, combined with entity-level identifiers for cross-sentence tracking.
In practice
- Annotate datasets with ConGA for gender evaluation.
- Use ConGA to identify masculine overuse in MT.
Topics
- Machine Translation
- Gender Bias
- Linguistic Annotation
- Natural Language Processing
- Large Language Models
Best for: AI Scientist, Research Scientist, AI Researcher, NLP Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.