4.5 Guidelines for collecting image data
Image-based prompts can be very helpful tools to generate spontaneous speech in projects to collect . Unlike where speakers read out prepared text, image prompts produce natural, varied responses as people describe what they see.
Choosing suitable images
When you are collecting images for your voice data project, try to find pictures that are familiar and relevant to the local culture. Google Maps street view and photos of the local area uploaded by the community can provide helpful images. Local contributors will be able to relate to and respond to such images. They should show everyday scenes and objects, common activities, and local settings that people will recognize and can describe.


Asking users to describe generic images like these can help you collect voice data.
Type of content to choose
Always choose images with clear, specific content and details that contributors can describe. Don’t use sensitive content related to conflict, disasters, or illness, as these may trigger negative feelings. This will then stop the natural flow of speech. And don’t use images with faces or people that they may recognize (apart from public figures like the president, actors, singers etc.).
Using image prompts
When showing image prompts to contributors, give them simple, open-ended instructions. For example, "Describe what you see in this image" or "Tell me what's happening in this picture." Aim for responses that last between 10 and 20 seconds. This will give you enough voice data and is not too long for participants.
Remember that copyright issues also apply to images. Always keep a note of the source of your images and make sure you have the right permissions to use them in your project.
Quality assurance
Take a look through the responses you get from your image prompts at regular intervals. If some images always seem to produce very short, confused, or awkward responses, you may need to replace them. Choose alternatives that get better results.
Last updated
Was this helpful?