SE&R 2022 Challenge - SER Track

Introduction
Dependencies
Datasets
Training
Pre-Trained Model
Contact

Introduction

Automatic Speech Recognition for spontaneous and prepared speech & Speech Emotion Recognition in Portuguese (SE&R 2022) Workshop is the first edition of a new series of shared-tasks for the Portuguese language and introduces two versions of a new dataset called CORAA (Corpus of Annotated Audios built in the TaRSila project, an effort of the Center for Artificial Intelligence (C4AI).

The Speech Emotion Recognition track aims to motivate research for SER in our community, mainly to discuss theoretical and practical aspects of SER, pre-processing and feature extraction, and machine learning models for Portuguese. In this task, participants must train their own models using acoustic audio features using the dataset provided called CORAA SER version 1.0 composed of approximately 40 minutes of audio segments labeled in three classes: neutral, non-neutral female, and non-neutral male. While the neutral class represents audio segments with no well-defined emotional state, the non-neutral classes represent segments associated with one of the primary emotional states in the speaker's speech.

This repository presents the code used by our team that got first place on the SER track.

Three key strategies make up the solution:

The use of a multilingual model (Wav2vec2.0 XLS-R)
The use of a mixture of voice emotion recognition datasets from several languages
Dataset normalization

Dependencies

It is important to install the dependencies before launching the application.

Run the following command to install the required dependencies using pip:

sudo pip install -r requeriments

Datasets

Three more speech emotion recognition datasets were used in addition to the CORAA SER dataset:

Run the script prepare_datasets.sh to download all of the datasets used in the experiments, then run the script get_metadata.py to prepare the metadata used to train, validate and test the models.

Training

The default.yaml file in the config directory can be used to set the essential configurations for training a model.

If you want to apply gain normalization during training/testing, you must first calculate the mean dbfs level, which you can accomplish by running the get_mean_dbfs.py script in the utils directory, and then inserting the result in the target_dbfs parameter.

Pre-Trained Model

The model weights with the best results can be found on the huggingface hub and can be easily fine-tuned in more data by this application.

Author

Alef Iury Siqueira Ferreira

Contact

e-mail: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
config		config
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
get_metadata.py		get_metadata.py
prepare_datasets.sh		prepare_datasets.sh
requirements.txt		requirements.txt
tester.py		tester.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SE&R 2022 Challenge - SER Track

Introduction

Dependencies

Datasets

Training

Pre-Trained Model

Author

Contact

About

Releases

Packages

Languages

License

alefiury/SE-R-2022-SER-Track

Folders and files

Latest commit

History

Repository files navigation

SE&R 2022 Challenge - SER Track

Introduction

Dependencies

Datasets

Training

Pre-Trained Model

Author

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages