> For the complete documentation index, see [llms.txt](https://twbvoiceplaybook.clearglobal.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://twbvoiceplaybook.clearglobal.org/1.-introduction/1.2-twb-voice-clear-globals-platform-for-voice-data-collection.md).

# 1.2 TWB Voice: CLEAR Global’s platform for voice data collection

TWB Voice is a platform for collecting [voice data](#user-content-fn-1)[^1]. It was set up by CLEAR Global in 2025 to collect and validate voice recordings from many contributors. These recordings help build voice technologies like [Automatic Speech Recognition (ASR](#user-content-fn-2)[^2]) and [Text-to-Speech (TTS)](#user-content-fn-3)[^3] for [low-resource languages](#user-content-fn-4)[^4].

You can access TWB Voice at [twbvoice.org](http://twbvoice.org). You will need to be a member of the Translators without Borders Community or sign up for an account. You can contribute to active TWB Voice projects in your language.

The aim of TWB Voice is to:

* support ethical, community-led voice data
* create high-quality, diverse datasets[^5] that will help to develop voice models[^6] that reflect how people really speak
* help people to access tools and information in their own language

On the platform, users can:

* record voice clips
* rate the voice recordings of others
* transcribe[^7] audio and rate the transcriptions of others

TWB Voice was built on more than 10 years of experience working with the Translators without Borders Community. This is a global network of 100,000+ volunteer linguists. TWB Voice helps this community to contribute their voices to aid the development of [voice technology](#user-content-fn-8)[^8] in their own languages.

| <mark style="color:blue;">**Interested in working with us or finding out more about TWB Voice?**</mark>                                                                                                                                                                      |
| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| <p>We are looking for partners and voice data contributors to help TWB Voice to grow. We want to add more languages and improve the platform.</p><p></p><p><a href="mailto:tech@clearglobal.org"><strong>Get in touch</strong></a> to find out how we can work together.</p> |

###

### Data collection pilot in Hausa, Kanuri, and Shuwa Arabic

In TWB Voice’s first data collection pilot, we collected conversational audio in Hausa, Kanuri, and Shuwa Arabic. CLEAR Global is already working with linguists and local partners in northeastern Nigeria to provide language services, do research, and develop [language technology](#user-content-fn-9)[^9]. The pilot aimed to create 50–100 hours of speech data per language. In practice, we collected:

* \~68 hours of speech data in Hausa of which \~58 hours for Automatic Speech Recognition and 10 hours for Text-to-Speech
* \~62 hours of speech data in Kanuri of which \~52 hours for Automatic Speech Recognition and 10 hours for Text-to-Speech
* \~15 hours of speech data in Shuwa Arabic for Automatic Speech Recognition

This is available as an open-source dataset for Automatic Speech Recognition and as samples for Text-to-Speech.

CLEAR Global and partners also used this data to develop:

1. Automatic Speech Recognition models for Hausa and Kanuri
2. Text-to-Speech models for Hausa and Kanuri.&#x20;

You can find the datasets and models published as part of this pilot in [this repository on CLEAR Global’s Hugging Face page](https://huggingface.co/collections/CLEAR-Global/twb-voice-10-688797ff02e04524a6f41edc).

This pilot is a key case study and we refer to it throughout the playbook.

<br>

[^1]: **Voice data:** Audio recordings of human speech. These recordings capture the acoustic features of spoken language, such as pronunciation, speaking patterns, and rhythm.

[^2]: **Automatic Speech Recognition (ASR) or Speech-to-Text (STT):** ASR converts spoken language into text. It can be used in voice assistants and transcription services, for example. You may hear both terms, ASR and STT, and they are almost the same. But STT can be a semi-manual process, while ASR is fully automated. ASR is like a smart listener that turns spoken words into written text on your device. While it does create the text automatically, it often needs some human input to make sure everything is correct, it can make mistakes especially in languages that are less used in the digital space.

[^3]: **Text-to-Speech (TTS):** Technology that converts written text into spoken language. It can also generate audio speech. You can find TTS in accessibility tools like screen readers and in virtual assistants. It brings written texts to life through spoken words. If your phone reads out the messages you get, or an audiobook tells you your favorite story, this is because of TTS technology.

[^4]: **Low-resource language:** A language that has limited written or recorded materials and is rarely found in digital tools, data, or technology. This means that technologies like speech recognition or machine translation are not available for the language and are difficult to build. This may be due to a lack of written content, digital resources (like websites or videos), or support from organizations that develop language technology.

[^5]: **Dataset:** A collection of information that has been organized for use. A voice dataset is a collection of voice recordings (paired with transcription) with additional  information (metadata) such as gender, age of the person  recording to give more information on how the data set is constructed and to avoid bias. It is for use in research and for training or improving voice models.

[^6]: **Model:** A computer-based system that has made use of data to learn patterns. It can make predictions or generate language. In speech technology, models use voice data to learn to recognize speech (converting audio to text) or to create speech (converting text to audio).

[^7]: **Transcribe**: Converting spoken words into written text.

[^8]: **Voice technology:** Language technology that processes or generates spoken words.

[^9]: **Language Technology (LT):** Technologies that focus on human language, including both spoken and written language. They can process, understand, and generate language. Examples are the tools on your phone or computer that understand and generate words, like translation apps or voice assistants. They allow us to communicate and interact with our devices with language. When you record a message and the device transforms it into text, or your phone suggests the next word in a message, that’s because of language technology. It makes digital tools more accessible, interactive and sometimes more efficient.&#x20;