AfriScience-MT: Towards Decolonizing Science in Africa through Text Translation
Summary
AfriScience-MT is a new parallel corpus designed to facilitate scientific text translation into six African languages: Amharic, Hausa, Luganda, Northern Sotho, Yorùbá, and isiZulu, spanning 11 scientific domains. This initiative directly addresses the challenge of colonial language dominance in African scientific communication by providing established scientific terminology where it is currently lacking. The corpus was developed by professional translators and expert science communicators who translated plain-language summaries of scientific papers and created new terms as needed. Benchmarking of machine translation systems and large language models showed that closed-source models, specifically GPT-5.4 and Gemini-3.1-Flash-Lite, achieved the highest average sentence-level COMET scores of 68.3 and 68.0, respectively, and tied at 48.3 for document-level COMET. Among open-source systems, fine-tuned NLLB-1.3B reached 67.3 at the sentence level, and TranslateGemma-12B achieved 44.0 document-level with 1-shot in-context learning. The AfriScience-MT corpus was published on 2026-05-28 to support further research in scientific machine translation for African languages.
Key takeaway
For NLP Engineers and Research Scientists developing scientific machine translation for African languages, the AfriScience-MT corpus offers a critical resource to overcome terminology gaps. You should prioritize evaluating closed-source models like GPT-5.4 or Gemini-3.1-Flash-Lite for superior performance, while also considering fine-tuning open-source options like NLLB-1.3B for competitive sentence-level results. Utilize this corpus to benchmark your systems and contribute to decolonizing scientific knowledge access.
Key insights
The AfriScience-MT corpus and benchmarks advance scientific machine translation for African languages, addressing linguistic barriers.
Principles
- Colonial languages limit scientific access.
- Lack of terminology is a core obstacle.
- Professional translation creates new terms.
Method
Professional translators and expert science communicators translated plain-language scientific summaries into six African languages, creating new terms as needed, then benchmarked MT systems and LLMs.
In practice
- Use AfriScience-MT for MT benchmarking.
- Fine-tune NLLB-1.3B for open-source MT.
- Explore GPT-5.4/Gemini-3.1-Flash-Lite for high performance.
Topics
- African Languages
- Machine Translation
- Scientific Terminology
- Parallel Corpus
- Low-Resource NLP
- LLM Benchmarking
Best for: AI Scientist, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.