Page cover

2. What is voice data?

Chapter 2 overview:

You don’t need to be a technical expert for this chapter. It’s for anyone who wants to understand the basics of voice technology. It includes:

  • an introduction to voice data—the basis for all speech technologies

  • a look at the main types of speech technology (TTS and ASR), and why some languages don’t have voice technology support

  • the difference between read speech and spontaneous speech.

consists of audio recordings of human speech, ideally paired with transcriptions of what the person said and relevant . We need it to develop language technologies that can process, understand, or generate spoken language.

Each recording in a "trains" machines to process human speech and teaches them to recognize or generate speech. We collect voice recordings and their transcriptions as examples that will help algorithms to build . These models are computer-based systems that can recognize patterns in human speech. They analyze thousands or millions of speech samples to find patterns between speech sounds and words or phrases. Voice data that comes from a wide range of speakers helps the models to get better at recognizing or generating speech. They learn to do this for different accents, speaking styles, and acoustic settings.

Last updated

Was this helpful?