Skip to content

Add mls_english and multi_ja_en recipes#2

Open
kinanmartin wants to merge 146 commits into
masterfrom
multi_ja_en_mls_english_clean
Open

Add mls_english and multi_ja_en recipes#2
kinanmartin wants to merge 146 commits into
masterfrom
multi_ja_en_mls_english_clean

Conversation

@kinanmartin

@kinanmartin kinanmartin commented Jul 28, 2025

Copy link
Copy Markdown
Collaborator

This PR contains the new code for two icefall recipes:

  1. mls_english
  2. multi_ja_en

To get the big picture of how each recipe works, please first look at the prepare.sh scripts in each recipe: see the mls_english prepare script here, and the multi_ja_en prepare script here.

The mls_english prepare script downloads the parler-tts/mls_eng dataset from HF, computes features, creates lhotse manifests, and trains a BPE tokenizer. The mls_english training code (zipformer/train.py) trains a model in a similar way to the reazonspeech recipe.

The multi_ja_en prepare script depends on the objects created by both the mls_english prepare script and the multi_ja_en prepare script. It creates symlinks to the features computed by each of those recipes, then creates new lhotse manifests in order to properly use those features during training.

kinanmartin and others added 30 commits July 28, 2025 17:49
This reverts commit ba603e0.
…tructure, added script to update cutset paths. WIP
…the multilingual training recipe directory structure
…o make dev and test splits have matching sizes to reazonspeech
csukuangfj and others added 30 commits August 5, 2025 19:16
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
I find there are some inf in tot_score, it makes model cannot converge, add inf mask can make training more stable.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants