Skip to content

reeha-parkar/samsemo

 
 

Repository files navigation

SAMSEMO: New dataset for multilingual and multimodal emotion recognition

CC BY 4.0

The repository contains the data collection and code for a new article that will be presented at the Interspeech 2024 conference: https://interspeech2024.org/.

We present training code for multilingual and multimodal emotion recognition. This code is based mainly on the code from https://github.com/wenliangdai/Multimodal-End2end-Sparse which was released under CC BY 4.0 license. The training consists of two phases. In the preprocessing phase, we extract faces from video frames, spectrograms from audio, and text features from text. In the end-to-end approach proposed in https://github.com/wenliangdai/Multimodal-End2end-Sparse, this preprocessing was performed each time a data item appeared in a training batch. Our approach considerably speeds up the training process.

[Here] you can find both the raw data and already preprocessed data in the pickle files.

Training without preprocessing

To run our code directly, you can download the processed data from [here]. Make sure to change appropriately the "data_path" and "dataset_name" parameters in the run.py file or in terminal.

Training with preprocessing

To run the code with the preprocessing phase, you can download the raw data from the .zip files [here]. Please update appropriately the parameters given in the run.py file. Below you will find a command to run the training with preprocessing phase. You may also perform only the preprocessing phase to obtain the pickle files by running (with appropriate parameters):

python3 preprocessing/prep_raw.py

Command examples for running

Example command for run from preprocessed data:

python3 run.py --train_from_raw_data ""
--data_path [dir_with_pickle_file]
--dataset_name samsemo_en_article.pkl

Example command for run from raw data:

python3 lightning_train.py --train_from_raw_data True
--base_dir [directory_with_raw_files]
--utt_names_path [dir_with_utterances_(text)]
--utt_file _split_EN.txt
--meta_path [dir_with_meta_file]
--files_path [dir_with_audio_and_frames]
--dataset_name samsemo_en_article.pkl

Results

During the training, you will see the tables with the results for the training and validation sets. We provide several metrics: accuracy, f1, recall, ROC/AUC and precision. After training, we evaluate the best model on the test set and print the most important metric, which is average f1 (average is taken over f1 for all emotions).

Citation

Please, mention our contribution in your article (code or data) with the following citation:

@inproceedings{samsemo24_interspeech,
  title     = {SAMSEMO: New dataset for multilingual and multimodal emotion recognition},
  author    = {Pawel Bujnowski and Bartlomiej Kuzma and Bartlomiej Paziewski and Jacek Rutkowski and Joanna Marhula and Zuzanna Bordzicka and Piotr Andruszkiewicz},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {2925--2929},
  doi       = {10.21437/Interspeech.2024-212},
}

About

Multimodal experiments on the SAMSEMO dataset featuring cultural analysis, BERT for text emotion recognition, and Wav2Vec2 audio embeddings.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 97.1%
  • Python 2.9%