# 4. Guidelines for sentence and prompt collection

#### <mark style="color:blue;">Chapter 4 overview:</mark>&#x20;

{% hint style="info" %}
This chapter is for linguists, content creators, technical developers, and project managers who are preparing prompts for voice data collection. We'll cover:

* finding content sources for your language and domain
* handling licensing issues when you use existing content
* processing text with automated and manual methods
* creating effective prompts that result in natural speech

Most sections are suitable for non-technical team members, but section 4.3 is for people with programming expertise for text processing pipelines.
{% endhint %}

Prompts are the sentences or instructions that guide users when they record in [TWB Voice](#user-content-fn-1)[^1]. Prompts that are well prepared will result in recordings that are of high quality. They will be suitable for both [Text-to-Speech (TTS)](#user-content-fn-2)[^2] and [Automatic Speech Recognition (ASR)](#user-content-fn-3)[^3] applications.

###

### Types of prompt

There are two main types of prompt:

* **Read prompts:** specific sentences that users read aloud exactly as written
* **Free-form prompts:** open-ended questions or descriptions of images that encourage people to speak freely

Each type has a specific purpose when developing [voice technology](#user-content-fn-4)[^4]. Read prompts give you controlled, predictable speech patterns. Free-form prompts create more natural speech patterns.

What you need to know to collect and prepare high-quality text prompts:

* &#x20;where to find the right kind of sentences for your language and domain[^5]
* &#x20;how to handle licensing issues if you use existing content
* &#x20;methods of processing text automatically to prepare it for recording
* &#x20;guidelines for checking sentences for quality and to make sure they are usable
* &#x20;special things to consider when you collect data with images
* &#x20;best practices for creating free-form prompts to generate natural speech

These guidelines will help you build an effective collection of prompts for your [voice data](#user-content-fn-6)[^6] project. They are helpful whether you're working with a widely spoken language or one with limited digital resources.

<br>

[^1]: **TWB Voice:** A platform for collecting voice data. It was developed by CLEAR Global, who also own it. Users can make voice recordings to help with active data collection projects in TWB Voice by [signing up to the TWB Community](https://translatorswithoutborders.org/join-the-twb-community/). The main goal of TWB Voice is to help to develop voice technology for speakers of marginalized languages. For example, by creating the voice datasets that are needed to build language models for TTS and ASR.

[^2]: **Text-to-Speech (TTS):** Technology that converts written text into spoken language. It can also generate audio speech. You can find TTS in accessibility tools like screen readers and in virtual assistants. It brings written texts to life through spoken words. If your phone reads out the messages you get, or an audiobook tells you your favorite story, this is because of TTS technology.

[^3]: **Automatic Speech Recognition (ASR) or Speech-to-Text (STT):** ASR converts spoken language into text. It can be used in voice assistants and transcription services, for example. You may hear both terms, ASR and STT, and they are almost the same. But STT can be a semi-manual process, while ASR is fully automated. ASR is like a smart listener that turns spoken words into written text on your device. While it does create the text automatically, it often needs some human input to make sure everything is correct, it can make mistakes especially in languages that are less used in the digital space.

[^4]: **Voice technology:** Language technology that processes or generates spoken words.

[^5]: **Domain:** The subject area or setting in which we use language. Examples are healthcare, farming, or education. Each domain has its own special terms and language patterns.

[^6]: **Voice data:** Audio recordings of human speech. These recordings capture the acoustic features of spoken language, such as pronunciation, speaking patterns, and rhythm.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://twbvoiceplaybook.clearglobal.org/4.-guidelines-for-sentence-and-prompt-collection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
