ROSE-CD

This repository provides the official PyTorch implementation of the following paper:

Robust One-step Speech Enhancement via Consistency Distillation (IEEE WASPAA 2025, Oral Presentation)
Liang Xu, Longfei Felix Yan, W. Bastiaan Kleijn
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2025

🔗 Project Website | 📄 arXiv Preprint | 📄 IEEE Xplore

📖 Highlights

Real-Time Efficiency: Proposes a one-step consistency training (CT) framework for highly efficient, real-time speech enhancement.
Improved Robustness: Mitigates the accumulation of teacher-induced biases via randomized trajectory training and auxiliary time-domain constraints.
Superior Performance: Accelerates inference speed by a factor of 54× while surpassing the performance of the foundational 30-step teacher model.
Strong Generalization: Demonstrates robust generalization capabilities across out-of-domain and dynamic real-world acoustic scenarios.

📊 Performance Benchmark

The table below presents a comparative analysis between the proposed 1-step Consistency Training (CT) model and the baseline 30-step Teacher model, evaluated on the VoiceBank-DEMAND test corpus. The CT framework not only accelerates inference by an order of magnitude but also yields statistically significant improvements across all established objective metrics.

Model	Steps	PESQ (↑)	ESTOI (↑)	SI-SDR (↑)	SI-SIR (↑)	SI-SAR (↑)
Teacher	30	2.89 ± 0.67	0.86 ± 0.10	16.7 ± 3.7	26.7 ± 5.8	17.6 ± 3.4
CT (Ours)	1	3.47 ± 0.67	0.87 ± 0.10	19.2 ± 3.6	29.2 ± 5.4	20.0 ± 3.7

🔗 Pre-trained Models & Audio Samples

We release the pre-trained checkpoints alongside corresponding enhanced audio outputs for both the 30-step teacher model and our 1-step CT model.

Checkpoints: Download via Google Drive
Enhanced Audio Outputs: Download via Google Drive

Usage instructions: Extract and place the downloaded checkpoints into the designated logs/ directory (e.g., ./logs/). Ensure that the checkpoint paths within the evaluation scripts (scripts/eval_CT.sh or scripts/eval_teacher.sh) are correctly updated to replicate the reported benchmark results.

⚙️ Installation & Setup

We recommend utilizing an isolated virtual environment with Python 3.11. To initialize the environment and install dependencies, execute:

# Clone the repository
git clone https://github.com/liangxu123/rosecd.git
cd rosecd

# Install required packages
pip install -r requirements.txt

Note: For experiment tracking via Weights & Biases (W&B), please configure your environment using wandb login prior to initiating training.

🗄️ Dataset Preparation

Our data preprocessing pipeline is adapted from the established SGMSE+ framework. To configure the dataset directories, please update the corresponding paths in path_config.sh. By default, the configuration points to the VoiceBank-DEMAND corpus paths.

🚀 Training Framework: Consistency Training (CT) and Consistency Distillation (CD)

Consistency Models inherently support two distinct training paradigms:

Consistency Distillation (CD): Distilling knowledge from a pre-trained teacher diffusion model.
Consistency Training (CT): Direct training on the empirical data distribution without the necessity of a teacher model.

While our published paper primarily formalizes and evaluates the method utilizing Consistency Distillation (CD), subsequent empirical analyses revealed that applying Consistency Training (CT) within the exact same codebase yields identical performance. Because CT completely bypasses the need to rely on a pre-trained teacher model, it significantly streamlines the training procedure and circumvents teacher-induced approximation errors.

Therefore, in this repository, we officially release both the Teacher model training scripts and the CT model training scripts, as CT achieves the same state-of-the-art one-step enhancement performance as CD, but with a much simpler pipeline.

To train the one-step model from scratch using CT, execute:

bash ./scripts/train_CT.sh <GPU_ID>  # e.g., bash ./scripts/train_CT.sh 0

To train the multi-step Teacher model (if you wish to reproduce the baseline or teacher pipeline), execute:

bash ./scripts/train_teacher.sh <GPU_ID>  # e.g., bash ./scripts/train_teacher.sh 0

📈 Evaluation Protocol

To benchmark the one-step consistency model on the test corpus, utilize the evaluation scripts provided in the scripts/ directory.

Prior to execution, verify that the checkpoint path inside scripts/eval_CT.sh correctly points to your trained model weights.

bash ./scripts/eval_CT.sh <GPU_ID>  # e.g., bash ./scripts/eval_CT.sh 0

This procedure generates the enhanced audio files within out/onestep_pesq5e-4_CT/N_1 and subsequently computes standard objective metrics (e.g., PESQ, ESTOI, SI-SDR).

Evaluating the Baseline Teacher Model:
If a baseline multi-step Teacher model was trained (via scripts/train_teacher.sh), it can be evaluated by updating the checkpoint path within scripts/eval_teacher.sh and running:

bash ./scripts/eval_teacher.sh <GPU_ID>  # e.g., bash ./scripts/eval_teacher.sh 0

Enhanced outputs will be saved to out/teacher/N_30 and evaluated automatically.

📝 Citation

If this codebase or methodology proves useful in your research, please consider citing our work:

@inproceedings{xu2025robust,
  title={Robust One-step Speech Enhancement via Consistency Distillation},
  author={Xu, Liang and Yan, Longfei Felix and Kleijn, W Bastiaan},
  booktitle={2025 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
  pages={1--5},
  year={2025},
  organization={IEEE},
  keywords={Noise;Speech enhancement;Robustness;Real-time systems;Trajectory;Recording;Noise measurement;Iterative methods;Time-domain analysis;Optimization},
  doi={10.1109/WASPAA66052.2025.11230988}
}

🙏 Acknowledgments

We express our gratitude to the authors of the SGMSE+ repository, upon whose exemplary foundational work this codebase is built.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
preprocessing		preprocessing
scripts		scripts
sgmse		sgmse
.gitignore		.gitignore
LICENSE		LICENSE
README-SGMSE+.md		README-SGMSE+.md
README.md		README.md
cal_metrics_VB.sh		cal_metrics_VB.sh
calc_metrics.py		calc_metrics.py
enhancement.py		enhancement.py
onestep_train.py		onestep_train.py
path_config.sh		path_config.sh
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ROSE-CD

📖 Highlights

📊 Performance Benchmark

🔗 Pre-trained Models & Audio Samples

⚙️ Installation & Setup

🗄️ Dataset Preparation

🚀 Training Framework: Consistency Training (CT) and Consistency Distillation (CD)

📈 Evaluation Protocol

📝 Citation

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ROSE-CD

📖 Highlights

📊 Performance Benchmark

🔗 Pre-trained Models & Audio Samples

⚙️ Installation & Setup

🗄️ Dataset Preparation

🚀 Training Framework: Consistency Training (CT) and Consistency Distillation (CD)

📈 Evaluation Protocol

📝 Citation

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages