Draft: merge Q2 submission support into main without recipes by jasonlizhengjian · Pull Request #191 · NVIDIA/srt-slurm

jasonlizhengjian · 2026-05-29T18:08:33Z

Summary

Draft merge-back of Q2 submission branch support into main, excluding recipe changes.
Keeps recipes/ at the shared merge-base state on this branch, so the PR has no recipe-path changes and a merge should preserve the current main recipe tree.
Carries the non-recipe Q2 support changes: benchmark/runtime config updates, SA-Bench and lm-eval support, worker placement/topology behavior, TRTLLM setup-script support, and related tests.

What is intentionally excluded

All recipe additions, removals, and rewrites from sa-submission-q2-2026.

Notes

This replaces the direct submission-branch draft PR Draft: merge Q2 submission branch back into main #190, which included recipe changes.
PR Cherry-pick Dynamo wheel install support to Q2 #184 is still not included unless it lands in sa-submission-q2-2026 or is separately cherry-picked here.
The median interactivity CSV rollup from current main is not introduced by this branch; this branch preserves main unless the Q2 changes touch the same files.

Validation

Verified with git diff --name-only nvidia/main...HEAD -- recipes | wc -l that there are 0 recipe-path changes in this PR-style diff.
Full test suite not run for this draft PR.

Auto-detect container type at runtime: if /sgl-workspace exists (SGLang), use original install path unchanged; otherwise use portable /tmp build path with conditional dependency installation for non-SGLang containers.

* Add Kimi-K2.5 vLLM recipes and fix NIXL side channel host - Add kimi-k2.5 1k1k and 8k1k disagg GB200 recipes (from NVIDIA#7) - Fix vLLM NIXL handshake failures: set VLLM_NIXL_SIDE_CHANNEL_HOST to node's routable IP in get_process_environment() instead of leaving it as 0.0.0.0/localhost which caused transfer handshake failures - Update test_vllm_get_process_environment to cover NIXL host env var Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: run checks on PRs targeting sa-submission-q2-2026 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

NVIDIA#24) * Add Kimi K2.5 disagg STP and MTP recipes for GB200 NVfp4 (ISL8K_OSL1K and ISL1K_OSL1K) Add optimized disaggregated inference recipes for Kimi K2.5 model with NVfp4 precision on GB200 GPUs. Includes both STP and MTP configurations for ISL8K_OSL1K and ISL1K_OSL1K workloads covering concurrency points from 5 to 2253, with Eagle speculative decoding for MTP variants. * Update Kimi K2.5 recipes: container, model path, concurrency format, and env cleanup - Update container to tensorrtllm-runtime-1.1.0-dev.2.sqsh - Point model path to shared /mnt/lustre01/models/kimi-k2.5-nvfp4 - Update Eagle model mount path for MTP configs - Remove HF_HOME (defaults to ~/.cache/huggingface) - Fix concurrency separator from space to 'x' for sa-bench compatibility - Enable multiple frontends for ctx1dep4_gen1dep32_batch64 * Use generic model path and container aliases for cluster portability Replace cluster-specific paths with generic alias names that are resolved via srtslurm.yaml model_paths and containers mappings, as per upstream convention. * Add extra_mount alias resolution and use generic Eagle model path Add model_paths alias resolution for extra_mount host paths in config.py, enabling MTP recipes to use generic name "kimi-k2.5-eagle3" instead of cluster-specific path for the Eagle speculative decoding model. * Use HuggingFace model names and full NVCR container paths Per review feedback, update model paths to HuggingFace format (nvidia/Kimi-K2.5-NVFP4) and container to full NVCR registry path (nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.1.0-dev.2) so recipes are portable and work without pre-built sqsh files. --------- Co-authored-by: nlevin-ui <nlevin@nvidia.com>

* recipes for minimax m2.5 fp4 b200 agg vllm * commit for signature

* Add lm-eval benchmark runner for InferenceX evals Adds support for running lm-eval accuracy evaluations as a post-benchmark step, leveraging the InferenceX benchmark_lib.sh harness.

…NVIDIA#47) * fix tokenizer for glm5 (NVIDIA#20) fix * add nvidia pre-release url (NVIDIA#22)

Add 66 GLM5 NVFP4 disaggregated recipe configs for GB200 and GB300 on the sa-submission branch; standardize model path and container values across the recipe set for consistency.

* Add GLM5 GB200 NVFP4 Apr-09 disagg recipes. Include the updated 1K/1K and 8K/1K STP and MTP TensorRT-LLM Dynamo configs so submission testing can run on the latest GB200 parameter set. * Keep only Apr-09 GB200 configs and align YAML quoting. Remove legacy GB200 trtllm_dynamo recipes inherited from the submission base branch, and normalize concurrencies/custom_tokenizer fields to double-quoted style for consistency with existing GB300 recipes. * fix: enable chat template and 16x rounds for GB200 GLM5 configs Update GB200 GLM5 trtllm_dynamo recipes to set use_chat_template=true and num_prompts_mult=16 so sa-bench runs align with current submission benchmarking methodology.

…IA#113) Set GLM5 GB300 trtllm_dynamo recipes to use chat template and num_prompts_mult=16 so throughput runs match TRTLLM multi-round methodology, while keeping warmup fixed at 2x.

Add setup_script install-trtllm-pip.sh to all GB300 GLM5 trtllm_dynamo recipes so eval-only jobs can install lm-eval even when pip is missing in the runtime container venv.

* Add spread_workers option to ResourceConfig Allow placing each partial-node worker on its own node instead of packing multiple onto the same node. Useful when colocating workers on a single node causes resource contention (port collisions, etc.). Caller must reserve enough nodes (e.g. set decode_nodes=decode_workers when gpus_per_decode<gpus_per_node). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * try fix * allow multiple DEP2 workers per node * multi worker fix * Allow vLLM one-node prefill decode colocation * Avoid same-node worker port collisions * Fix spread workers tests and lint * Cover vLLM colocation guard --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: hjjq <50634613+hjjq@users.noreply.github.com>

codecov-commenter · 2026-05-29T18:38:31Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@babf250). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #191   +/-   ##
=======================================
  Coverage        ?   65.78%           
=======================================
  Files           ?       67           
  Lines           ?     8401           
  Branches        ?        0           
=======================================
  Hits            ?     5527           
  Misses          ?     2874           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Albert Cheng (Engrg-Hardware 1) and others added 15 commits April 2, 2026 14:17

Make Dynamo source install container-agnostic (vLLM, SGLang, etc.)

9cc6d50

Auto-detect container type at runtime: if /sgl-workspace exists (SGLang), use original install path unchanged; otherwise use portable /tmp build path with conditional dependency installation for non-SGLang containers.

Add Minimax M2.5 NVFP4 agg B200 single-node configs (NVIDIA#36)

b0f5b83

* recipes for minimax m2.5 fp4 b200 agg vllm * commit for signature

Add lm-eval benchmark runner for InferenceX evals (NVIDIA#12)

f61dbba

* Add lm-eval benchmark runner for InferenceX evals Adds support for running lm-eval accuracy evaluations as a post-benchmark step, leveraging the InferenceX benchmark_lib.sh harness.

fix: add glm5 dynamo trtllm benchmark support to sa submission branch (…

10f4ac9

…NVIDIA#47) * fix tokenizer for glm5 (NVIDIA#20) fix * add nvidia pre-release url (NVIDIA#22)

Add GLM5 disaggregated recipes for SA submission (NVIDIA#48)

a10acd3

Add 66 GLM5 NVFP4 disaggregated recipe configs for GB200 and GB300 on the sa-submission branch; standardize model path and container values across the recipe set for consistency.

fix: add chat template to the glm5 tokenizer

b85ec1d

fix: align glm5 gb300 sa-bench rounds with submission baselines (NVID…

c88c68e

…IA#113) Set GLM5 GB300 trtllm_dynamo recipes to use chat template and num_prompts_mult=16 so throughput runs match TRTLLM multi-round methodology, while keeping warmup fixed at 2x.

fix: using a setup script to install pip in trtllm venv # (NVIDIA#117)

95b0a33

fix: add trtllm venv pip bootstrap to GB300 GLM5 recipes (NVIDIA#120)

0f0aa60

Add setup_script install-trtllm-pip.sh to all GB300 GLM5 trtllm_dynamo recipes so eval-only jobs can install lm-eval even when pip is missing in the runtime container venv.

run setup script before post eval (NVIDIA#123)

fdb1be7

Drop recipe changes from Q2 merge-back

4ef3c08

jasonlizhengjian mentioned this pull request May 29, 2026

Draft: merge Q2 submission branch back into main #190

Closed

Merge main into Q2 merge-back branch

01d6e2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: merge Q2 submission support into main without recipes#191

Draft: merge Q2 submission support into main without recipes#191
jasonlizhengjian wants to merge 16 commits into
NVIDIA:mainfrom
jasonlizhengjian:lijas/q2-mergeback-no-recipes

jasonlizhengjian commented May 29, 2026

Uh oh!

codecov-commenter commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

jasonlizhengjian commented May 29, 2026

Summary

What is intentionally excluded

Notes

Validation

Uh oh!

codecov-commenter commented May 29, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants