MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Natural Language Processing · Depth: Expert, extended

Summary

MORPHOGEN is a new large-scale, multilingual benchmark dataset designed to evaluate the ability of Large Language Models (LLMs) to handle grammatical gender and morphological agreement in three typologically diverse languages: French, Arabic, and Hindi. The core task, GENFORM, requires LLMs to rewrite a first-person sentence in the opposite gender while preserving its meaning and structure. Researchers constructed a high-quality synthetic dataset for these languages and benchmarked 15 popular multilingual LLMs, ranging from 2 billion to 70 billion parameters. The evaluation revealed significant gaps in current models' handling of morphological gender, with larger models generally outperforming smaller ones, particularly in languages with complex morphology like Arabic. The benchmark also uncovered notable gender biases, with French and Arabic models often defaulting to masculine forms, while some Hindi models showed a feminine skew.

Key takeaway

For AI Engineers developing or deploying multilingual LLMs, understanding gender-aware morphological generation is crucial for inclusive applications. You should evaluate your models using benchmarks like MORPHOGEN, paying close attention to performance gaps in morphologically rich languages and identifying potential masculine or feminine biases. Prioritize models with higher parameter counts for complex languages and implement targeted debiasing strategies to ensure equitable and accurate linguistic outputs.

Key insights

MORPHOGEN evaluates LLM gender-aware morphological generation across French, Arabic, and Hindi, revealing performance gaps and biases.

Principles

Method

The GENFORM task prompts LLMs to rewrite first-person sentences in the opposite gender, preserving meaning and structure, using language-specific morphological rules and synthetic data generation.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.