Giving Voice to the Constitution: Low-Resource Text-to-Speech for Quechua and Spanish Using a Bilingual Legal Corpus

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

Researchers developed a unified pipeline for high-quality text-to-speech (TTS) synthesis of the Peruvian Constitution in both Quechua and Spanish. The system utilizes three advanced TTS architectures: XTTS v2, F5-TTS, and DiFlow-TTS. These models were trained on separate Spanish and Quechua speech datasets, which varied in size and recording quality. By employing bilingual and multilingual TTS capabilities, the framework effectively addresses data scarcity for Quechua through cross-lingual transfer, while maintaining naturalness in Spanish. The project releases trained model checkpoints, inference code, and synthesized audio for each constitutional article, establishing a valuable resource for speech technologies in indigenous and multilingual environments.

Key takeaway

For research scientists developing inclusive TTS systems for low-resource languages, consider adopting a unified pipeline with state-of-the-art architectures like XTTS v2, F5-TTS, or DiFlow-TTS. Leveraging bilingual and multilingual capabilities, especially cross-lingual transfer, can significantly improve synthesis quality and naturalness, even with heterogeneous datasets. Explore the released checkpoints and code as a foundation for similar indigenous language projects.

Key insights

Cross-lingual transfer mitigates data scarcity in low-resource languages for high-quality TTS.

Principles

Method

A unified pipeline trains XTTS v2, F5-TTS, and DiFlow-TTS on independent, heterogeneous datasets, leveraging bilingual capabilities for Quechua and Spanish constitutional text.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.