WAXAL: A large-scale open resource for African language speech technology

· Source: The latest research from Google · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, medium

Summary

WAXAL is a new, large-scale, open-access speech dataset released by Google Research on March 6, 2026, designed to support African language speech technology. It covers 27 Sub-Saharan African languages spoken by over 100 million people across more than 26 countries. The dataset includes approximately 1,846 hours of transcribed natural speech for Automatic Speech Recognition (ASR) and over 565 hours of high-fidelity recordings for Text-to-Speech (TTS). Released under a Creative Commons CC-BY-4.0 license, WAXAL aims to bridge the digital divide by providing crucial data for low-resource languages, enabling the development of robust, inclusive voice-enabled technologies tailored to Africa's linguistic diversity. The project involved multi-year collaboration with African academic and community organizations.

Key takeaway

For AI Engineers and NLP Engineers developing speech technologies for diverse linguistic populations, WAXAL offers a critical, permissively licensed resource. Your teams can leverage this dataset to build more accurate and inclusive ASR and TTS systems for 27 Sub-Saharan African languages, directly addressing the data scarcity challenge. Consider integrating WAXAL to expand your models' linguistic coverage and improve performance in low-resource contexts.

Key insights

WAXAL provides a large, open-access dataset for 27 African languages, fostering inclusive speech technology development.

Principles

Method

WAXAL-ASR uses image-prompted elicitation for natural, unscripted speech. WAXAL-TTS employs collaborative script drafting and studio recordings for high-fidelity audio.

In practice

Topics

Code references

Best for: AI Engineer, NLP Engineer, AI Scientist, AI Researcher, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The latest research from Google.