Page cover

4. Guidelines for sentence and prompt collection

Chapter 4 overview:

This chapter is for linguists, content creators, technical developers, and project managers who are preparing prompts for voice data collection. We'll cover:

  • finding content sources for your language and domain

  • handling licensing issues when you use existing content

  • processing text with automated and manual methods

  • creating effective prompts that result in natural speech

Most sections are suitable for non-technical team members, but section 4.3 is for people with programming expertise for text processing pipelines.

Prompts are the sentences or instructions that guide users when they record in . Prompts that are well prepared will result in recordings that are of high quality. They will be suitable for both and applications.

Types of prompt

There are two main types of prompt:

  • Read prompts: specific sentences that users read aloud exactly as written

  • Free-form prompts: open-ended questions or descriptions of images that encourage people to speak freely

Each type has a specific purpose when developing . Read prompts give you controlled, predictable speech patterns. Free-form prompts create more natural speech patterns.

What you need to know to collect and prepare high-quality text prompts:

  • where to find the right kind of sentences for your language and

  • how to handle licensing issues if you use existing content

  • methods of processing text automatically to prepare it for recording

  • guidelines for checking sentences for quality and to make sure they are usable

  • special things to consider when you collect data with images

  • best practices for creating free-form prompts to generate natural speech

These guidelines will help you build an effective collection of prompts for your project. They are helpful whether you're working with a widely spoken language or one with limited digital resources.

Last updated

Was this helpful?