# 2.3 Read voice data versus spontaneous voice data

Two types of voice data are needed to develop effective [language technology](#user-content-fn-1)[^1]. [**Read data**](#user-content-fn-2)[^2] is used for the foundation. [**Spontaneous data**](#user-content-fn-3)[^3] helps to make the system more robust and sound more natural.

### Read voice data

Read voice data consists of recordings of speakers who read out a prepared text. This type of data:

* is highly controlled and structured
* has clear pronunciation and a steady pace
* follows standardized text formats

When building speech technologies for a new language, read voice data is often the first type of data that you would collect. It provides clean, predictable input for developing speech technology.

It's important to understand the difference between reading a text aloud and more natural communication. When they read aloud, people usually speak:

* more formally
* with fewer grammar errors
* with more complete sentences
* in a more monotone voice
* without using words like "um" or "uh"&#x20;
* without restarting sentences
* following the rules of written language&#x20;

This means that read speech can sometimes sound a bit artificial. Technologies that only use read speech may not work well in real-life situations.&#x20;

### Spontaneous voice data

Spontaneous voice data consists of natural speech without a script. Speakers may not finish their sentences and they use regional expressions. They make constant changes to their speech, responding to feedback from listeners. This type of data:

* contains natural speech patterns like hesitations, and words like “um” or “uh”
* includes varied styles of speaking and speeds
* may include dialects and everyday phrases that you wouldn’t find in written language
* often contains overlapping speech, background noise, interruptions
* gives a better picture of real-world communication

You must include spontaneous voice data if you want to develop robust speech technologies that can handle real-world situations, but it is more difficult to process.

<br>

[^1]: **Language Technology (LT):** Technologies that focus on human language, including both spoken and written language. They can process, understand, and generate language. Examples are the tools on your phone or computer that understand and generate words, like translation apps or voice assistants. They allow us to communicate and interact with our devices with language. When you record a message and the device transforms it into text, or your phone suggests the next word in a message, that’s because of language technology. It makes digital tools more accessible, interactive and sometimes more efficient.

[^2]: **Read data:** Data that is read aloud.

[^3]: **Spontaneous voice data:** Recordings of natural speech with no script. These will contain hesitations, restarts, varied speaking patterns, and everyday phrases. They sound more like the way people actually talk.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://twbvoiceplaybook.clearglobal.org/2.-what-is-voice-data/2.3-read-voice-data-versus-spontaneous-voice-data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
