Skip to content

Draft: merge Q2 submission support into main without recipes#191

Draft
jasonlizhengjian wants to merge 16 commits into
NVIDIA:mainfrom
jasonlizhengjian:lijas/q2-mergeback-no-recipes
Draft

Draft: merge Q2 submission support into main without recipes#191
jasonlizhengjian wants to merge 16 commits into
NVIDIA:mainfrom
jasonlizhengjian:lijas/q2-mergeback-no-recipes

Conversation

@jasonlizhengjian
Copy link
Copy Markdown
Contributor

Summary

  • Draft merge-back of Q2 submission branch support into main, excluding recipe changes.
  • Keeps recipes/ at the shared merge-base state on this branch, so the PR has no recipe-path changes and a merge should preserve the current main recipe tree.
  • Carries the non-recipe Q2 support changes: benchmark/runtime config updates, SA-Bench and lm-eval support, worker placement/topology behavior, TRTLLM setup-script support, and related tests.

What is intentionally excluded

  • All recipe additions, removals, and rewrites from sa-submission-q2-2026.

Notes

Validation

  • Verified with git diff --name-only nvidia/main...HEAD -- recipes | wc -l that there are 0 recipe-path changes in this PR-style diff.
  • Full test suite not run for this draft PR.

Albert Cheng (Engrg-Hardware 1) and others added 15 commits April 2, 2026 14:17
Auto-detect container type at runtime: if /sgl-workspace exists (SGLang),
use original install path unchanged; otherwise use portable /tmp build path
with conditional dependency installation for non-SGLang containers.
* Add Kimi-K2.5 vLLM recipes and fix NIXL side channel host

- Add kimi-k2.5 1k1k and 8k1k disagg GB200 recipes (from NVIDIA#7)
- Fix vLLM NIXL handshake failures: set VLLM_NIXL_SIDE_CHANNEL_HOST to
  node's routable IP in get_process_environment() instead of leaving it
  as 0.0.0.0/localhost which caused transfer handshake failures
- Update test_vllm_get_process_environment to cover NIXL host env var

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: run checks on PRs targeting sa-submission-q2-2026

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
NVIDIA#24)

* Add Kimi K2.5 disagg STP and MTP recipes for GB200 NVfp4 (ISL8K_OSL1K and ISL1K_OSL1K)

Add optimized disaggregated inference recipes for Kimi K2.5 model with NVfp4
precision on GB200 GPUs. Includes both STP and MTP configurations for
ISL8K_OSL1K and ISL1K_OSL1K workloads covering concurrency points from 5
to 2253, with Eagle speculative decoding for MTP variants.

* Update Kimi K2.5 recipes: container, model path, concurrency format, and env cleanup

- Update container to tensorrtllm-runtime-1.1.0-dev.2.sqsh
- Point model path to shared /mnt/lustre01/models/kimi-k2.5-nvfp4
- Update Eagle model mount path for MTP configs
- Remove HF_HOME (defaults to ~/.cache/huggingface)
- Fix concurrency separator from space to 'x' for sa-bench compatibility
- Enable multiple frontends for ctx1dep4_gen1dep32_batch64

* Use generic model path and container aliases for cluster portability

Replace cluster-specific paths with generic alias names that are resolved
via srtslurm.yaml model_paths and containers mappings, as per upstream convention.

* Add extra_mount alias resolution and use generic Eagle model path

Add model_paths alias resolution for extra_mount host paths in config.py,
enabling MTP recipes to use generic name "kimi-k2.5-eagle3" instead of
cluster-specific path for the Eagle speculative decoding model.

* Use HuggingFace model names and full NVCR container paths

Per review feedback, update model paths to HuggingFace format
(nvidia/Kimi-K2.5-NVFP4) and container to full NVCR registry path
(nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.1.0-dev.2) so recipes
are portable and work without pre-built sqsh files.

---------

Co-authored-by: nlevin-ui <nlevin@nvidia.com>
* recipes for minimax m2.5 fp4 b200 agg vllm

* commit for signature
* Add lm-eval benchmark runner for InferenceX evals

Adds support for running lm-eval accuracy evaluations as a post-benchmark
step, leveraging the InferenceX benchmark_lib.sh harness.
Add 66 GLM5 NVFP4 disaggregated recipe configs for GB200 and GB300 on the sa-submission branch; standardize model path and container values across the recipe set for consistency.
* Add GLM5 GB200 NVFP4 Apr-09 disagg recipes.

Include the updated 1K/1K and 8K/1K STP and MTP TensorRT-LLM Dynamo configs so submission testing can run on the latest GB200 parameter set.

* Keep only Apr-09 GB200 configs and align YAML quoting.

Remove legacy GB200 trtllm_dynamo recipes inherited from the submission base branch, and normalize concurrencies/custom_tokenizer fields to double-quoted style for consistency with existing GB300 recipes.

* fix: enable chat template and 16x rounds for GB200 GLM5 configs

Update GB200 GLM5 trtllm_dynamo recipes to set use_chat_template=true and num_prompts_mult=16 so sa-bench runs align with current submission benchmarking methodology.
…IA#113)

Set GLM5 GB300 trtllm_dynamo recipes to use chat template and num_prompts_mult=16 so throughput runs match TRTLLM multi-round methodology, while keeping warmup fixed at 2x.
Add setup_script install-trtllm-pip.sh to all GB300 GLM5 trtllm_dynamo recipes so eval-only jobs can install lm-eval even when pip is missing in the runtime container venv.
* Add spread_workers option to ResourceConfig

Allow placing each partial-node worker on its own node instead of
packing multiple onto the same node. Useful when colocating workers
on a single node causes resource contention (port collisions, etc.).

Caller must reserve enough nodes (e.g. set decode_nodes=decode_workers
when gpus_per_decode<gpus_per_node).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* try fix

* allow multiple DEP2 workers per node

* multi worker fix

* Allow vLLM one-node prefill decode colocation

* Avoid same-node worker port collisions

* Fix spread workers tests and lint

* Cover vLLM colocation guard

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: hjjq <50634613+hjjq@users.noreply.github.com>
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@babf250). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #191   +/-   ##
=======================================
  Coverage        ?   65.78%           
=======================================
  Files           ?       67           
  Lines           ?     8401           
  Branches        ?        0           
=======================================
  Hits            ?     5527           
  Misses          ?     2874           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants