7.1 Data storage

Data flow

Currently, all TWB Voice users have to first sign up to TWB Platform. This is a platform that TWB developed for our community of linguists to use. They use it to take on translation jobs and for other language services too. Users have to accept the terms and conditions and privacy policy.

Once they have agreed and signed up to TWB Platform, contributors can also log into TWB Voice. They can use the same login credentials (email and password).

When they log in for the first time, they will need to accept some extra terms. These apply to the collection of voice data:

New users will also have to provide some further information. This information is relevant to the collection of voice data: gender, year of birth, education level, and language variant.

All of the data we collect is stored securely across multiple CLEAR Global databases.

When they have done this, users can start doing TWB Voice tasks. The tasks they can do will depend on their level of access.

Data segregation

We store personal information of users (name, email, gender, year of birth, education level, and language variant) separately from the user recordings. This means there is no risk that people could find out the identity of the speakers.
We only add the user metadata (gender, year of birth, education level, and language variant) to the recording when we export the data. We never share the user’s name or email in the published dataset.
We collect recordings for Automatic Speech Recognition (ASR) models through specific workflows. We store them in separate datasets from recordings for Text to Speech (TTS) models. We do this because we don’t publish TTS datasets fully, but use them internally to train models. We publish a partial set within the ASR dataset so that users can be anonymous.

Systems for storing and securing data

We store all voice recordings and metadata on secure servers. CLEAR Global manages and approves these servers. Our infrastructure ensures:

user-based access control, so access depends on the role of the user (e.g. reviewers, admins)
strong password and device policies are in place across all accounts
data encryption so that unauthorized persons cannot access or change the data
automated and encrypted backups on a regular schedule
server hardening and monitoring (firewall, operating system patches, minimal access configurations)
event logging to detect any unusual activity or misuse

Previous7. Data storage (compliance and ethical issues)Next7.2 Legal issues: contracts, data protection, intellectual property

Last updated 7 months ago

Was this helpful?

hashtagData flow

hashtagData segregation

hashtagSystems for storing and securing data

Data flow

Data segregation

Systems for storing and securing data