Open Research Project

Scaling Voice Data for Africa.

Most African languages have less than 50 hours of speech data. We use state-of-the-art voice cloning to scale that to hundreds—no recording studios required.

6+ Target Languages
10x Data Augmentation
MIT Open License
The Challenge

The Low-Resource Gap

African languages are severely underrepresented in speech technology. Robust ASR and TTS systems require thousands of hours of data.

Traditional collection is expensive and logistically hard. As a result, millions of speakers are left behind by modern AI tools.

English (High Resource) 10,000+ hrs
Luganda (Low Resource) < 50 hrs

Data disparity ratio: 200:1

Methodology

Synthetic Augmentation

We use voice cloning to generate diverse synthetic speakers from limited source recordings, scaling datasets while preserving linguistic accuracy.

1. Collect Baseline

Gather high-quality single-speaker recordings (approx. 10 hours) as ground truth data.

2. Clone Voices

Generate hundreds of synthetic speaker identities to read text from the target language corpus.

3. Train Models

Use the combined real and synthetic dataset to train robust ASR models that generalize well.

Target Languages

Current focus languages for dataset creation

Luganda
Uganda
Swahili
East Africa
Yoruba
Nigeria
Igbo
Nigeria
Hausa
West Africa
Kinyarwanda
Rwanda
Also tracking: Somali Zulu Amharic
Built With
NVIDIA NeMo
Orpheus
Sesame
Spark TTS

The Team

Lwanga Caleb / Atuhaire Collins / Bronson Bakunga / Engombe Lokanga / OJ Onyeagwu / Ismail Tijjani / Isadru Santos / Shashank / Sa'ad Nasir Bashir

Contribute to the Project

We're looking for native speakers for evaluation, collaborators with African language data, and researchers interested in low-resource speech technologies.