Skip to content

Implement Multiturn support for EasyMagpie#15764

Open
Edresson wants to merge 109 commits into
NVIDIA-NeMo:mainfrom
Edresson:nemotron_tts_multiturn_user_audio_latest
Open

Implement Multiturn support for EasyMagpie#15764
Edresson wants to merge 109 commits into
NVIDIA-NeMo:mainfrom
Edresson:nemotron_tts_multiturn_user_audio_latest

Conversation

@Edresson

@Edresson Edresson commented Jun 8, 2026

Copy link
Copy Markdown
Member

What does this PR do ?

Implements multiturn support for EasyMagpieTTS, including multiturn Lhotse training/evaluation data handling, user-audio-conditioned inference, multiturn metrics/export support, and related tests.

Collection: TTS

Changelog

  • Added a new EasyMagpie multiturn Lhotse config: examples/tts/conf/magpietts/easy_magpietts_lhotse_multiturn.yaml.
  • Added support for restoring EasyMagpie models from a pretrained checkpoint before normal initialization.
  • Added multiturn Lhotse TTS dataset support for duplex / multi-turn conversational samples.
  • Added EasyMagpie multiturn user-audio inference mode via --easy_magpie_inference_mode multiturn_user_audio.
  • Added support for user-audio conditioning, user silence handling, user turn end-token handling, and max evaluation turns during multiturn inference.
  • Added distributed / multi-GPU inference support for multiturn EasyMagpie evaluation by sharding generation across ranks and merging rank-local outputs for evaluation.
  • Added grouped turn-level and sample-level metrics export for multiturn evaluation, including JSON/CSV outputs.
  • Added phoneme prediction export fields to generated .json / .csv evaluation outputs.
  • Added optional predicted-phoneme inference controls, including phoneme input type, phoneme sampling method, tokenizer override, and text-input dropout.
  • Added additional evaluation options for ASR / EOU batch sizes, emotion cosine similarity, emotion match rate, and text-annotation stripping for metrics.
  • Added Lhotse dataloader speaker filtering support.
  • Added / updated Nemotron-H decoder support needed for prefill in EasyMagpie multiturn inference.
  • Added unit tests for Lhotse TTS filters, MagpieTTS Lhotse dataset behavior, and EasyMagpieTTS model behavior.

Usage

Example EasyMagpie multiturn inference with a .nemo checkpoint:

# Run from the NeMo repository root.
# Replace paths with your local model, codec, evalset config, and output directory.

python examples/tts/magpietts_inference.py \
  --model_type easy_magpie \
  --easy_magpie_inference_mode multiturn_user_audio \
  --nemo_files /path/to/easy_magpie_model.nemo \
  --datasets_json_path /path/to/evalset_config.json \
  --out_dir /path/to/output \
  --codecmodel_path /path/to/codec_model.nemo \
  --batch_size 1 \
  --max_eval_turns 6 \
  --run_evaluation

@copy-pr-bot

copy-pr-bot Bot commented Jun 8, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Comment thread examples/tts/easy_magpietts_inference_multiturn.py Fixed
Comment thread examples/tts/easy_magpietts_inference_multiturn.py Fixed
Comment thread examples/tts/easy_magpietts_inference_multiturn.py Fixed
Comment thread nemo/collections/tts/models/easy_magpietts_inference.py Fixed
Comment thread nemo/collections/tts/models/easy_magpietts.py Fixed
Comment thread nemo/collections/common/data/lhotse/cutset.py Fixed
Comment thread examples/tts/easy_magpietts_inference_multiturn_multigpu.py Fixed
Comment thread examples/tts/easy_magpietts_inference_multiturn.py Fixed
Comment thread examples/tts/easy_magpietts_inference_multiturn_runner.py Fixed
Comment thread examples/tts/easy_magpietts_inference_multiturn_runner.py Fixed
Comment thread examples/tts/easy_magpietts_inference_multiturn_runner.py Fixed
@Edresson Edresson force-pushed the nemotron_tts_multiturn_user_audio_latest branch from 2600fb8 to 9cb8c01 Compare June 8, 2026 19:08
Comment thread examples/tts/magpietts_inference.py Fixed
Comment thread examples/tts/magpietts_inference.py Fixed
Comment thread examples/tts/magpietts_inference.py Fixed
@Edresson Edresson force-pushed the nemotron_tts_multiturn_user_audio_latest branch from 9cb8c01 to a73132d Compare June 8, 2026 19:44
@github-actions github-actions Bot removed the common label Jun 9, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, these changes was done to support prefill in one step on nemotron_h class.

Comment thread nemo/collections/tts/metrics/emotion_encoder.py Fixed
return cfg_cls(**kwargs)


def _select_runner_cls(args):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Edresson Edresson force-pushed the nemotron_tts_multiturn_user_audio_latest branch 2 times, most recently from c564930 to 1a43737 Compare June 23, 2026 15:36
Comment thread nemo/collections/tts/modules/magpietts_inference/inference.py Fixed
Comment thread nemo/collections/tts/modules/magpietts_inference/inference.py Fixed
Comment thread nemo/collections/tts/modules/magpietts_inference/inference.py Fixed
@Edresson Edresson force-pushed the nemotron_tts_multiturn_user_audio_latest branch 2 times, most recently from b18c9ea to ba6c288 Compare June 23, 2026 18:58
paarthneekhara and others added 11 commits June 23, 2026 12:02
Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
Signed-off-by: Edresson Casanova <edresson1@gmail.com>
Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
Signed-off-by: Edresson Casanova <edresson1@gmail.com>
Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
Signed-off-by: Edresson Casanova <edresson1@gmail.com>
Signed-off-by: paarthneekhara <paarthneekhara@users.noreply.github.com>
Signed-off-by: Edresson Casanova <edresson1@gmail.com>
Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
Signed-off-by: Edresson Casanova <edresson1@gmail.com>
Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
Signed-off-by: Edresson Casanova <edresson1@gmail.com>
Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
Signed-off-by: Edresson Casanova <edresson1@gmail.com>
Signed-off-by: Edresson Casanova <edresson1@gmail.com>
Signed-off-by: Edresson Casanova <edresson1@gmail.com>
Signed-off-by: Edresson Casanova <edresson1@gmail.com>
Signed-off-by: Edresson Casanova <edresson1@gmail.com>
Signed-off-by: Edresson Casanova <edresson1@gmail.com>
@Edresson Edresson force-pushed the nemotron_tts_multiturn_user_audio_latest branch from 930147e to dc1d48b Compare June 24, 2026 12:38
@Edresson

Copy link
Copy Markdown
Member Author

/ok to test dc1d48b

@github-actions

Copy link
Copy Markdown
Contributor

[🤖]: Hi @Edresson 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

@Edresson Edresson requested a review from shehzeen June 24, 2026 17:33
Signed-off-by: Edresson Casanova <edresson1@gmail.com>
@Edresson

Copy link
Copy Markdown
Member Author

/ok to test eb6b252

@github-actions

Copy link
Copy Markdown
Contributor

[🤖]: Hi @Edresson 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

shehzeen and others added 2 commits June 24, 2026 15:23
Signed-off-by: Shehzeen Hussain <shehzeenh@nvidia.com>
Signed-off-by: Edresson Casanova <edresson1@gmail.com>
@Edresson

Copy link
Copy Markdown
Member Author

/ok to test 3b5077f

@github-actions

Copy link
Copy Markdown
Contributor

[🤖]: Hi @Edresson 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants