# 4.5 Guidelines for collecting image data

Image-based prompts can be very helpful tools to generate spontaneous speech in projects to collect [voice data](#user-content-fn-1)[^1]. Unlike [read prompts](#user-content-fn-2)[^2] where speakers read out prepared text, image prompts produce natural, varied responses as people describe what they see.

### Choosing suitable images

When you are collecting images for your voice data project, try to find pictures that are familiar and relevant to the local culture. Google Maps street view and photos of the local area uploaded by the community can provide helpful images. Local contributors will be able to relate to and respond to such images. They should show everyday scenes and objects, common activities, and local settings that people will recognize and can describe.

| <div><figure><img src="/files/1ot8mECo9MA8fIgu5zIP" alt=""><figcaption></figcaption></figure></div>                                                                                                                     | <div><figure><img src="/files/MABTqg79x6I6P7fQq35a" alt=""><figcaption></figcaption></figure></div>                                                                                                                       |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXflSLESxsl_tm2K2dmjkPoeOaapRuAhSS3MJLEk3hELjb6DEzU1Bnrl8koieZCVKkQCObzwnftaBUFxEWpu8bKRmO0tnTXjkfJBa-12H766jTgxowes3WBCvHT-bQccs5dbeRNY?key=LjOaNqlneHjM8MYR-1Jh9w) | ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXfOP3B7VKbfyi9FogXBmIUtW-pkJMtSXvHTgqzqtwW6Dha0OxduZjFu3lbLTNW2FV4GF_3bYn5RHPffOmmdJPLA7RxpuyZkWIOfGngdjYd08e1O1vdX4FJzNHSUvPufOzYPX-b4LA?key=LjOaNqlneHjM8MYR-1Jh9w) |

Asking users to describe generic images like these can help you collect voice data.&#x20;

### Type of content to choose

Always choose images with clear, specific content and details that contributors can describe. Don’t use sensitive content related to conflict, disasters, or illness, as these may trigger negative feelings. This will then stop the natural flow of speech. And don’t use images with faces or people that they may recognize (apart from public figures like the president, actors, singers etc.).&#x20;

### Using image prompts

When showing image prompts to contributors, give them simple, open-ended instructions. For example, "Describe what you see in this image" or "Tell me what's happening in this picture." Aim for responses that last between 10 and 20 seconds. This will give you enough voice data and is not too long for participants.

Remember that copyright issues also apply to images. Always keep a note of the source of your images and make sure you have the right permissions to use them in your project.

### Quality assurance

Take a look through the responses you get from your image prompts at regular intervals. If some images always seem to produce very short, confused, or awkward responses, you may need to replace them. Choose alternatives that get better results.

<br>

[^1]: **Voice data:** Audio recordings of human speech. These recordings capture the acoustic features of spoken language, such as pronunciation, speaking patterns, and rhythm.

[^2]: **Read prompt:** A sentence that contributors read aloud from a written prompt. These sentences will give you controlled samples of speech.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://twbvoiceplaybook.clearglobal.org/4.-guidelines-for-sentence-and-prompt-collection/4.5-guidelines-for-collecting-image-data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
