# 5. Recording and validation

#### <mark style="color:blue;">Chapter 5 overview:</mark> &#x20;

{% hint style="info" %}
This chapter is for anyone working on voice data projects. This may be with TWB Voice or any other platform. We explain the key principles for creating high-quality data sets.&#x20;

We cover topics such as:

* How to collect high-quality recordings
* How to rate (validate) recordings in a fair and consistent way
* Checklists so you can be more efficient and get good-quality recordings
* Making sure your recordings are in line with TTS or ASR goals&#x20;

Note: This section is all about action. You don’t need to be a technical expert. We focus on being clear, consistent and fair.
{% endhint %}

Recording and validation are the two key tasks in [voice data collection](#user-content-fn-1)[^1]. To get high-quality voice datasets[^2], you need clear, varied, natural recordings. And you need to rate them with accurate validation. This will make sure they can be used to train voice technologies like [Automatic Speech Recognition (ASR)](#user-content-fn-3)[^3] and [Text-to-Speech (TTS)](#user-content-fn-4)[^4] systems. Experts may also use the datasets you create for research or to develop other models[^5].&#x20;

In this section, we go through the guidelines, processes, roles, and checks that are needed during a [TWB Voice](#user-content-fn-6)[^6] project. They ensure that the data is of a good quality and consistent. We also look at the key standards and approaches that you can use to collect data and create data sets. You can use these for any project to collect data, not just on TWB Voice, but also on other platforms.

<br>

[^1]: **Voice data collection:** Gathering recordings of speech with their transcriptions in a systematic and ethical way. Also involves collecting demographic data (age, gender, accent) and for Automatic Speech Recognition should include a range of speakers. The voice data is used in research and for training or developing voice language models.

[^2]: **Dataset:** A collection of information that has been organized for use. A **voice dataset** is a collection of voice recordings (paired with transcription) with additional information (metadata) such as gender, age of the person recording to give more information on how the data set is constructed and to avoid bias. It is for use in research and for training or improving voice models.

[^3]: **Automatic Speech Recognition (ASR) or Speech-to-Text (STT):** ASR converts spoken language into text. It can be used in voice assistants and transcription services, for example. You may hear both terms, ASR and STT, and they are almost the same. But STT can be a semi-manual process, while ASR is fully automated. ASR is like a smart listener that turns spoken words into written text on your device. While it does create the text automatically, it often needs some human input to make sure everything is correct, it can make mistakes especially in languages that are less used in the digital space.

[^4]: **Text-to-Speech (TTS):** Technology that converts written text into spoken language. It can also generate audio speech. You can find TTS in accessibility tools like screen readers and in virtual assistants. It brings written texts to life through spoken words. If your phone reads out the messages you get, or an audiobook tells you your favorite story, this is because of TTS technology.

[^5]: **Model:** A computer-based system that has made use of data to learn patterns. It can make predictions or generate language. In speech technology, models use voice data to learn to recognize speech (converting audio to text) or to create speech (converting text to audio).

[^6]: **TWB Voice:** A platform for collecting voice data. It was developed by CLEAR Global, who also own it. Users can make voice recordings to help with active data collection projects in TWB Voice by [signing up to the TWB Community](https://translatorswithoutborders.org/join-the-twb-community/). The main goal of TWB Voice is to help to develop voice technology for speakers of marginalized languages. For example, by creating the voice datasets that are needed to build language models for TTS and ASR.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://twbvoiceplaybook.clearglobal.org/5.-recording-and-validation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
