8.1 Sharing data on Hugging Face and platform access

What is Hugging Face?

Hugging Face is a popular online platform where researchers and developers share AI , , and applications. It’s like a library for resources that many people can use. The platform makes it easy to find, download, and use machine learning models and datasets. These are created by organizations and individuals worldwide.

Accessing TWB Voice datasets

You can find our datasets, models and demos on the CLEAR Global organization page: huggingface.co/CLEAR-Global

We publish each dataset with lots of , including number of people taking part, total hours of recordings, and license information. We usually release our datasets under or licenses. These allow broad use, but make sure that people are credited properly, and provide guidelines on use.

We publish models with information on their architecture and on their evaluation on a of the data we publish.

You can download the data through the web interface. Or you can use the Hugging Face datasets Python module to access them for your projects.

Important limitations and things to consider

The models we share on Hugging Face are mostly research demonstrations and evaluation tools. They have been tested on specific datasets and may not work well in all real-world settings. These models are not guaranteed for production use. You would need to test them thoroughly before using them in any active humanitarian technology system.

When using our models, consider the following:

  • Performance may vary a lot with different accents, recording conditions, or speaking styles.

  • Models are trained on limited data so they may not represent the full range of speakers in a language community.

  • For any practical application, they will need regular testing and adjustment.

We suggest that users see these models as starting points for further development. They are not complete solutions that could be put to immediate use in critical humanitarian settings.

Gated datasets

Some datasets are configured as "gated" on Hugging Face. This means you need to log in to the platform and request access before you can download them. We choose to gate some datasets so we can keep track of who is using our data. We can also make sure they use it responsibly. This helps us understand the impact of our work. We can also be accountable to the communities who record their voices for us. When you access gated datasets, you'll need to give some basic information about your intended use. Access is then usually granted automatically.

Last updated

Was this helpful?