1.4 Chapter overviews

Click on any chapter title in Table 1 to jump to the chapter.

Chapter

Content

We introduce CLEAR Global, explain the goals of this playbook, and the different ways you can use it. There is also a glossary of common terms.

We explain what voice data is and why some languages don’t have functional voice technologies. We also look at the difference between read speech and spontaneous speech.

We explain how to define the scope of a project in a new language or domain. We also show you how to make a clear work plan and set up the right team.

In this chapter, you will learn to identify and create prompts for voice recording. We explain how to find content sources for your language and domain. We also look at licensing and show you how to process text using automated and manual methods, and how to write prompts that elicit natural speech.

We explain how to record clear and natural voice data. We also show you how to check your recordings consistently (validation) and use quality control based on different roles.

We explain how to find the right people to work on a voice data project, onboard them effectively, and sustain their participation. We also cover ethical ways of motivating people and explain how to use feedback to keep them engaged.

We explain how CLEAR Global manages the collection of voice data. We also look at the steps it takes to make sure users know how CLEAR Global uses their data and what their rights are so they have control of their data.

We explain how to use Hugging Face as the main platform for sharing data, how to understand different types of licensing, how to access open and gated datasets, and how to work with pre-trained speech AI models.

We invite you to share your feedback. Your input will help us improve future versions of the playbook and support effective voice data collection.

Last updated

Was this helpful?