{"version":1,"pages":[{"id":"Ild1asfF55blalvWrMel","title":"1. Introduction","pathname":"/","siteSpaceId":"sitesp_x382i","description":"Collecting voice data for low-resource languages: a playbook on how to engage communities and build voice datasets for language AI solutions"},{"id":"NlhasYaJbrqDv7EMOMkR","title":"1.1 How to use this playbook","pathname":"/1.-introduction/1.1-how-to-use-this-playbook","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"1. Introduction"}]},{"id":"rUwPmzE5io3LEXamTkDi","title":"1.2 TWB Voice: CLEAR Global’s platform for voice data collection","pathname":"/1.-introduction/1.2-twb-voice-clear-globals-platform-for-voice-data-collection","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"1. Introduction"}]},{"id":"LNZSmekqz6SocJOW1Goj","title":"1.3 Glossary of key terms","pathname":"/1.-introduction/1.3-glossary-of-key-terms","siteSpaceId":"sitesp_x382i","description":"In this section, we explain the most common terms and concepts that we use in voice data collection and voice technology, to help you navigate and understand this playbook.","breadcrumbs":[{"label":"1. Introduction"}]},{"id":"74rcfpaRXNDaPVIstF0B","title":"1.4 Chapter overviews","pathname":"/1.-introduction/1.4-chapter-overviews","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"1. Introduction"}]},{"id":"0IJShqgxMoef0B68I2qE","title":"1.5 Acknowledgements","pathname":"/1.-introduction/1.5-acknowledgements","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"1. Introduction"}]},{"id":"r43YSJkeKa6JlgRyaHVI","title":"2. What is voice data?","pathname":"/2.-what-is-voice-data","siteSpaceId":"sitesp_x382i"},{"id":"RDzKRULXYNd6PH6ISkUA","title":"2.1 Technologies that use voice data","pathname":"/2.-what-is-voice-data/2.1-technologies-that-use-voice-data","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"2. What is voice data?"}]},{"id":"0sIX23f3HlWnNzGqXw3w","title":"2.2 Data imbalance & low-resource languages","pathname":"/2.-what-is-voice-data/2.2-data-imbalance-and-low-resource-languages","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"2. What is voice data?"}]},{"id":"gpbkGRpOG0JYzKOs4MoO","title":"2.3 Read voice data versus spontaneous voice data","pathname":"/2.-what-is-voice-data/2.3-read-voice-data-versus-spontaneous-voice-data","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"2. What is voice data?"}]},{"id":"Vf9uJS1IHQefe9kYyqFY","title":"3. Setting up a project to collect voice data","pathname":"/3.-setting-up-a-project-to-collect-voice-data","siteSpaceId":"sitesp_x382i"},{"id":"2v7BelHqXMFd952QztGw","title":"3.1 Designing your project","pathname":"/3.-setting-up-a-project-to-collect-voice-data/3.1-designing-your-project","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"3. Setting up a project to collect voice data"}]},{"id":"xKC5PHP9ADVSNaj0uyJD","title":"3.2 Creating a work plan for voice data collection","pathname":"/3.-setting-up-a-project-to-collect-voice-data/3.2-creating-a-work-plan-for-voice-data-collection","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"3. Setting up a project to collect voice data"}]},{"id":"J3fbYdZigHT0mpKNwVOA","title":"3.3 Building your team","pathname":"/3.-setting-up-a-project-to-collect-voice-data/3.3-building-your-team","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"3. Setting up a project to collect voice data"}]},{"id":"6oyx83DBWf4BhSpsywhH","title":"4. Guidelines for sentence and prompt collection","pathname":"/4.-guidelines-for-sentence-and-prompt-collection","siteSpaceId":"sitesp_x382i"},{"id":"OmGCQCwBzlOPqO7qEBDx","title":"4.1 Where can I find sentences for my language/domain?","pathname":"/4.-guidelines-for-sentence-and-prompt-collection/4.1-where-can-i-find-sentences-for-my-language-domain","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"4. Guidelines for sentence and prompt collection"}]},{"id":"3aNTknRGyoifitvyDPo7","title":"4.2 License issues","pathname":"/4.-guidelines-for-sentence-and-prompt-collection/4.2-license-issues","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"4. Guidelines for sentence and prompt collection"}]},{"id":"wNtdG1HpKZRpLdWcXlE5","title":"4.3 Automatic and manual processing of sentences","pathname":"/4.-guidelines-for-sentence-and-prompt-collection/4.3-automatic-and-manual-processing-of-sentences","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"4. Guidelines for sentence and prompt collection"}]},{"id":"K2rnBrEdWcZdxdecleF8","title":"4.4 Reviewing the sentences","pathname":"/4.-guidelines-for-sentence-and-prompt-collection/4.4-reviewing-the-sentences","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"4. Guidelines for sentence and prompt collection"}]},{"id":"dg7V50ZzjdbsJ8IogRM6","title":"4.5 Guidelines for collecting image data","pathname":"/4.-guidelines-for-sentence-and-prompt-collection/4.5-guidelines-for-collecting-image-data","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"4. Guidelines for sentence and prompt collection"}]},{"id":"XsLnb9ybgHdarj277uli","title":"4.6 Guidelines for creating free-form prompts","pathname":"/4.-guidelines-for-sentence-and-prompt-collection/4.6-guidelines-for-creating-free-form-prompts","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"4. Guidelines for sentence and prompt collection"}]},{"id":"gRaflKBX1t3ZrrA9ooTz","title":"5. Recording and validation","pathname":"/5.-recording-and-validation","siteSpaceId":"sitesp_x382i"},{"id":"0qiy9suNXwzgslJQgG0C","title":"5.1 Guidelines for recording","pathname":"/5.-recording-and-validation/5.1-guidelines-for-recording","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"5. Recording and validation"}]},{"id":"VozGJvBTAg58obc9FtEi","title":"5.2 Guidelines for validation","pathname":"/5.-recording-and-validation/5.2-guidelines-for-validation","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"5. Recording and validation"}]},{"id":"5cEaNCU39GJMnwhXYujg","title":"5.3 Language Leads and spot checking","pathname":"/5.-recording-and-validation/5.3-language-leads-and-spot-checking","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"5. Recording and validation"}]},{"id":"ZjJZMeR2egeLG3c2OJGw","title":"5.4 Checklists before recording","pathname":"/5.-recording-and-validation/5.4-checklists-before-recording","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"5. Recording and validation"}]},{"id":"1wID7aejcyLDeB5ynfRI","title":"5.5 TTS versus ASR-oriented recording","pathname":"/5.-recording-and-validation/5.5-tts-versus-asr-oriented-recording","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"5. Recording and validation"}]},{"id":"WgvWOEF4Qk2pzlHm3r0P","title":"6. Community engagement","pathname":"/6.-community-engagement","siteSpaceId":"sitesp_x382i"},{"id":"IT5m7NeYGDD7hmhzefqG","title":"6.1 Identifying contributor profiles","pathname":"/6.-community-engagement/6.1-identifying-contributor-profiles","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"6. Community engagement"}]},{"id":"5zT5pR8CGDABeNCOKmFI","title":"6.2 Guidelines for onboarding","pathname":"/6.-community-engagement/6.2-guidelines-for-onboarding","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"6. Community engagement"}]},{"id":"GdzcyHQkMTw8s4Gz5w1T","title":"6.3 Approaches to community engagement","pathname":"/6.-community-engagement/6.3-approaches-to-community-engagement","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"6. Community engagement"}]},{"id":"lXIMLQxrNnCCx7JQ0bNu","title":"6.4 Strategies to keep people involved","pathname":"/6.-community-engagement/6.4-strategies-to-keep-people-involved","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"6. Community engagement"}]},{"id":"ngW1E4nG8IQkGUFE0llF","title":"6.5 Frameworks for engagement and feedback","pathname":"/6.-community-engagement/6.5-frameworks-for-engagement-and-feedback","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"6. Community engagement"}]},{"id":"qLwkpKRx4SGy3KrY6SmE","title":"6.6 Recognition","pathname":"/6.-community-engagement/6.6-recognition","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"6. Community engagement"}]},{"id":"QJEm2MAPBtditZQ4RiTU","title":"7. Data storage (compliance and ethical issues)","pathname":"/7.-data-storage-compliance-and-ethical-issues","siteSpaceId":"sitesp_x382i"},{"id":"bT0jV2YRbUM45S1jqwzk","title":"7.1 Data storage","pathname":"/7.-data-storage-compliance-and-ethical-issues/7.1-data-storage","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"7. Data storage (compliance and ethical issues)"}]},{"id":"y8sQWwVazNi6LPE8xLqj","title":"7.2 Legal issues: contracts, data protection, intellectual property","pathname":"/7.-data-storage-compliance-and-ethical-issues/7.2-legal-issues-contracts-data-protection-intellectual-property","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"7. Data storage (compliance and ethical issues)"}]},{"id":"JbQCCVqwEB74Tr9rkRpe","title":"7.3 Informed consent","pathname":"/7.-data-storage-compliance-and-ethical-issues/7.3-informed-consent","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"7. Data storage (compliance and ethical issues)"}]},{"id":"SGqzBYShzb1rRhYbuN12","title":"7.4 Downloading data for publishing: criteria for recordings and fields included","pathname":"/7.-data-storage-compliance-and-ethical-issues/7.4-downloading-data-for-publishing-criteria-for-recordings-and-fields-included","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"7. Data storage (compliance and ethical issues)"}]},{"id":"yOWq5cuyme8OLyRh9FNZ","title":"8. Access to datasets and models","pathname":"/8.-access-to-datasets-and-models","siteSpaceId":"sitesp_x382i"},{"id":"W0SNf6YdclLNtCM51G6R","title":"8.1 Sharing data on Hugging Face and platform access","pathname":"/8.-access-to-datasets-and-models/8.1-sharing-data-on-hugging-face-and-platform-access","siteSpaceId":"sitesp_x382i","breadcrumbs":[{"label":"8. Access to datasets and models"}]},{"id":"4kVWj7Slt5plVBIPM50F","title":"9. Share your feedback","pathname":"/9.-share-your-feedback","siteSpaceId":"sitesp_x382i","description":"Thank you for reading the TWB Voice Program Playbook."}]}