[detection/asvspoof5/v11_ssl_gmlp] add folder#66
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds a new experimental configuration folder for ASVspoof5 detection using SSL (Self-Supervised Learning) with gMLP (gated Multi-Layer Perceptron) models. The implementation explores different SSL model variants (WavLM, wav2vec2, XLS-R) with frozen and fine-tuned configurations for audio spoofing detection.
- Adds complete training pipeline with data preparation, model training, evaluation, and performance measurement stages
- Implements 6 different SSL model configurations combining various pre-trained models with frozen/fine-tuned settings
- Provides SLURM-based distributed training setup for HPC environments
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| wedefense | Symbolic link to wedefense framework |
| tools | Symbolic link to shared tools directory |
| slurm_train.sh | SLURM job script for distributed training with model configuration array |
| run.sh | Main training pipeline script with 7 stages from data prep to evaluation |
| path.sh | Environment setup script for Python paths |
| local | Symbolic link to shared local utilities |
| conf/*.yaml | 6 SSL model configurations for different wav2vec2/WavLM/XLS-R variants |
| cd ${data} | ||
| ln -s flac_T train | ||
| ln -s flac_D dev | ||
| ln -s flac_E_eval /eval |
There was a problem hiding this comment.
The symbolic link target '/eval' is an absolute path which will create a link in the root directory instead of the intended relative path. This should be 'eval' without the leading slash.
| ln -s flac_E_eval /eval | |
| ln -s flac_E_eval eval |
| # filename cm-label | ||
| echo "filename cm-label" > ${data}/${dset}/cm_key_file.txt | ||
| cat ${data}/${dset}/utt2cls >> ${data}/${dset}/cm_key_file.txt | ||
| sed -i "s/ /\t/g" ${data}/${dset}/cm_key_file.txt |
There was a problem hiding this comment.
Using sed to replace spaces with tabs is fragile and could break if filenames contain spaces. Consider using awk or a more robust text processing approach to format the cm_key_file.txt properly.
| sed -i "s/ /\t/g" ${data}/${dset}/cm_key_file.txt | |
| awk '{$1=$1; gsub(" ", "\t"); print}' ${data}/${dset}/cm_key_file.txt > ${data}/${dset}/cm_key_file.tmp && mv ${data}/${dset}/cm_key_file.tmp ${data}/${dset}/cm_key_file.txt |
| #SBATCH --mail-user="lzhan268@jh.edu" #email for reporting | ||
| #SBATCH --mail-type=END,FAIL #report types | ||
| #SBATCH --output=./logs/slurm-%A_%a.out | ||
| #SBATCH --array=4-4 | ||
|
|
There was a problem hiding this comment.
The email address is hardcoded, which exposes personal information and makes the script non-portable. Consider parameterizing this or removing it from the committed code.
| #SBATCH --mail-user="lzhan268@jh.edu" #email for reporting | |
| #SBATCH --mail-type=END,FAIL #report types | |
| #SBATCH --output=./logs/slurm-%A_%a.out | |
| #SBATCH --array=4-4 | |
| #SBATCH --mail-user="${SLURM_MAIL_USER}" #email for reporting | |
| #SBATCH --mail-type=END,FAIL #report types | |
| #SBATCH --output=./logs/slurm-%A_%a.out | |
| #SBATCH --array=4-4 | |
| if [ -z "${SLURM_MAIL_USER}" ]; then | |
| echo "Error: SLURM_MAIL_USER environment variable is not set. Please set it to your email address." | |
| exit 1 | |
| fi |
| stage=3 | ||
| stop_stage=3 | ||
|
|
||
| ASVspoof5_dir=/export/fs05/lzhan268/workspace/PUBLIC/PartialSpoof/database |
There was a problem hiding this comment.
The hardcoded absolute path contains a username 'lzhan268' making it non-portable across different users and systems. This should be parameterized or use a relative path.
| ASVspoof5_dir=/export/fs05/lzhan268/workspace/PUBLIC/PartialSpoof/database | |
| : "${ASVspoof5_dir:?Environment variable ASVspoof5_dir must be set to the ASVspoof5 database directory.}" |
This is a draft PR. The performance on ASVspoof 5 is not good and requires further exploration of suitable parameters.