AfriScience-MT: Towards Decolonizing Science in Africa through Text Translation

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

AfriScience-MT is a new parallel corpus designed to facilitate scientific text translation into six African languages: Amharic, Hausa, Luganda, Northern Sotho, Yorùbá, and isiZulu, spanning 11 scientific domains. This initiative directly addresses the challenge of colonial language dominance in African scientific communication by providing established scientific terminology where it is currently lacking. The corpus was developed by professional translators and expert science communicators who translated plain-language summaries of scientific papers and created new terms as needed. Benchmarking of machine translation systems and large language models showed that closed-source models, specifically GPT-5.4 and Gemini-3.1-Flash-Lite, achieved the highest average sentence-level COMET scores of 68.3 and 68.0, respectively, and tied at 48.3 for document-level COMET. Among open-source systems, fine-tuned NLLB-1.3B reached 67.3 at the sentence level, and TranslateGemma-12B achieved 44.0 document-level with 1-shot in-context learning. The AfriScience-MT corpus was published on 2026-05-28 to support further research in scientific machine translation for African languages.

Key takeaway

For NLP Engineers and Research Scientists developing scientific machine translation for African languages, the AfriScience-MT corpus offers a critical resource to overcome terminology gaps. You should prioritize evaluating closed-source models like GPT-5.4 or Gemini-3.1-Flash-Lite for superior performance, while also considering fine-tuning open-source options like NLLB-1.3B for competitive sentence-level results. Utilize this corpus to benchmark your systems and contribute to decolonizing scientific knowledge access.

Key insights

The AfriScience-MT corpus and benchmarks advance scientific machine translation for African languages, addressing linguistic barriers.

Principles

Method

Professional translators and expert science communicators translated plain-language scientific summaries into six African languages, creating new terms as needed, then benchmarked MT systems and LLMs.

In practice

Topics

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.