Page cover

1. Introduction

Collecting voice data for low-resource languages: a playbook on how to engage communities and build voice datasets for language AI solutions

Welcome to the playbook for voice data collection for low-resource languages!

This playbook will help you to plan and manage projects to collect for . It is aimed at both new and experienced teams and covers the full process, from setting up the project to publishing your . We draw on CLEAR Global’s experience with our data collection platform “”. The playbook also covers aspects of data collection that apply to organizations and communities who want to collect voice data through other platforms or initiatives.

Around four billion people lack access to voice technologies like speech recognition and conversational AI. This is because their languages don’t have the data available to build these tools. This playbook aims to help address this gap by outlining the key steps, challenges, and best practices for collecting voice data in low-resource languages in an effective and ethical way.

In this chapter, we introduce TWB Voice, CLEAR Global’s new platform for collecting voice data. We refer to TWB Voice throughout this playbook. We also explain key terms and concepts in , and show you how to use the playbook.

How CLEAR Global can help

CLEAR Global’s mission is to help people get vital information and be heard, whatever language they speak. We help our partner organizations to listen to the communities they work with and communicate with them effectively.

Our tech-focused work helps organizations to find use cases where language technology could help users get more actively engaged and scale up communications efforts. We develop language AI solutions such as chatbots, machine translation, and speech solutions for low-resource languages. These are languages that don’t have enough data to create such language solutions. We also work with partners to help them collect voice data to build the resources for these technologies.

CLEAR Global’s user experience (UX) team can help with user research and UX design, and advise on human-centered design to tech interventions.

Our Language Services team can translate messages and documents into local languages, help with audio translations and pictures, train staff and volunteers, and give advice on two-way communication. We also work with partners to field test materials and make them easier to understand so they will have more impact. This work is backed up by research and language mapping to assess the communication needs of target populations.

For more information, go to our website or contact us at [email protected].

Last updated

Was this helpful?