# 7.2 Legal issues: contracts, data protection, intellectual property

### **Data protection framework**

[CLEAR Global](#user-content-fn-1)[^1] complies with the EU [General Data Protection Regulation](#user-content-fn-2)[^2] (GDPR).

We act as a [data controller](#user-content-fn-3)[^3] for internal systems and we carefully check all third-party processors that we use for storing data and analysis.

Voice recordings count as personal data—even if they are anonymized and don’t include any other information. This is because they may reveal personal details about the speaker, such as gender, age, or where they come from.

Under the GDPR, key terms apply:

* Personal data includes voice recordings and metadata[^4]
* Anyone who processes the data acts as an independent data controller
* Wherever the users live, they have the right to:
  * know what personal information is stored about them
  * object to how their personal information is being processed and ask for correction or deletion of their data
  * ask for a copy of their personal data in a format that allows them to transfer it to another organization

### **Dataset sharing agreements**

Persons or organizations who want to have access to the datasets[^5] must:

* sign the **Dataset Access Terms** which state that they are **independent data controllers**
* agree that they will:
  * comply with the GDPR and other relevant laws. This means they must secure the data and respect the rights of contributors.
  * only process the data for the purpose of training[^6] ASR models.
  * respond to any requests for access to data or deletion.
  * notify CLEAR Global of any breach or misuse of data.

### **Intellectual property**

Contributors will still have moral rights over their voice recordings. This means their personal right to be credited for their work and to prevent harmful changes to their work. These rights are separate from copyright. Contributors cannot give them up or pass them on to someone else, even if they can do this for economic rights (e.g. licensing). They include:

* the right to be credited as the author of the work (right of attribution)
* the right to say no to harmful changes or misuse of the work that could damage the creator's reputation (right of integrity)
* the right to remove or take back a work

### **Data retention and deletion**

* We only keep data for as long as we need it to review, improve, or publish it.
* If a contributor **takes back their consent**, we:
  * remove their contributions from our internal systems
  * mark their data and leave it out of future dataset releases
  * contact known dataset downloaders and tell them that the person has taken back (withdrawn) their consent
* We do **not keep archived versions** of public datasets that contain withdrawn data.

[^1]: **CLEAR Global:** A non-profit organization that helps people to get information and be heard, whatever language they speak. [Translators without Borders](https://clearglobal.org/translators-without-borders/) is a part of CLEAR Global.

[^2]: **The General Data Protection Regulation (GDPR):** A European Union law that deals with personal data. It tells us how to collect, process, store, and share data. It makes sure people have control over their personal data. It also ensures that organizations handle such data in a transparent, secure, and lawful way.

[^3]: **Data controller:** An individual or organization that is responsible for processing personal data. They decide why and how they will process the data. The General Data Protection Regulation states that data controllers have to make sure that data processing is lawful, transparent, and secure. They also have to respect the rights of the people whose data they process.

[^4]: **Metadata:** Extra information that gives some background to a dataset or parts of a dataset. Examples are demographics of the speakers (age, gender, accent), recording conditions (microphone type, level of background noise), or language-related details (dialect, speed of speaking, emotional tone).

[^5]: **Dataset:** A collection of information that has been organized for use. A **voice dataset** is a collection of voice recordings (paired with transcription) with additional information (metadata) such as gender, age of the person recording to give more information on how the data set is constructed and to avoid bias. It is for use in research and for training or improving voice models.

[^6]: **Training:** Teaching a computer-based model to recognize patterns. To do this, you need to show it large amounts of data as examples.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://twbvoiceplaybook.clearglobal.org/7.-data-storage-compliance-and-ethical-issues/7.2-legal-issues-contracts-data-protection-intellectual-property.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
