This repository contains the code used to train a noise-robust VAD using adversarial multi-task learning as presented in [1]. The work is built upon the framework in [2].
The code is split into modules accordingly:
main.pyThe main file of the code. All the other modules are run from thistraining.pyThe module responsible for training of the modeltesting.pyThe module responsible for validation and testing of the modeldataloaders.pyThe module responsible for loading in the datafile_management.pyThe module responsible for loading and saving the models and resultsmodel_file.pyThe module in which the model is definedconfig.pyThe module in which global variables are initialised. From here the learning rate, kernel sizes etc. can be changed.
Additionally, the ground truth VAD labels for the TIMIT database is generated using the .WRD files and given in this repository.
Python modules:
- PyTorch
- pickle
- os
- numpy
- matplitlib.pyplot
The AURORA2 database.
run python main.py
Before executing the program you will have to change the paths to the AURORA2 database in config.py. The VAD labels can be downloaded from https://github.com/zhenghuatan/rVAD
[1] C.M. Larsen, P. Koch, Z.-H. Tan. Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic Delay, Manuscript, 2022.
[2] Yu, Cheng & Hung, Kuo-Hsuan & Lin, I-Fan & Fu, Szu-Wei & Tsao, Yu & Hung, Jeih-weih. (2020). Waveform-based Voice Activity Detection Exploiting Fully Convolutional networks with Multi-Branched Encoders, arXiv preprint arXiv:2006.11139, 2020.