Draft: merge Q2 submission support into main without recipes#191
Draft
jasonlizhengjian wants to merge 16 commits into
Draft
Draft: merge Q2 submission support into main without recipes#191jasonlizhengjian wants to merge 16 commits into
jasonlizhengjian wants to merge 16 commits into
Conversation
Auto-detect container type at runtime: if /sgl-workspace exists (SGLang), use original install path unchanged; otherwise use portable /tmp build path with conditional dependency installation for non-SGLang containers.
* Add Kimi-K2.5 vLLM recipes and fix NIXL side channel host - Add kimi-k2.5 1k1k and 8k1k disagg GB200 recipes (from NVIDIA#7) - Fix vLLM NIXL handshake failures: set VLLM_NIXL_SIDE_CHANNEL_HOST to node's routable IP in get_process_environment() instead of leaving it as 0.0.0.0/localhost which caused transfer handshake failures - Update test_vllm_get_process_environment to cover NIXL host env var Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: run checks on PRs targeting sa-submission-q2-2026 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
NVIDIA#24) * Add Kimi K2.5 disagg STP and MTP recipes for GB200 NVfp4 (ISL8K_OSL1K and ISL1K_OSL1K) Add optimized disaggregated inference recipes for Kimi K2.5 model with NVfp4 precision on GB200 GPUs. Includes both STP and MTP configurations for ISL8K_OSL1K and ISL1K_OSL1K workloads covering concurrency points from 5 to 2253, with Eagle speculative decoding for MTP variants. * Update Kimi K2.5 recipes: container, model path, concurrency format, and env cleanup - Update container to tensorrtllm-runtime-1.1.0-dev.2.sqsh - Point model path to shared /mnt/lustre01/models/kimi-k2.5-nvfp4 - Update Eagle model mount path for MTP configs - Remove HF_HOME (defaults to ~/.cache/huggingface) - Fix concurrency separator from space to 'x' for sa-bench compatibility - Enable multiple frontends for ctx1dep4_gen1dep32_batch64 * Use generic model path and container aliases for cluster portability Replace cluster-specific paths with generic alias names that are resolved via srtslurm.yaml model_paths and containers mappings, as per upstream convention. * Add extra_mount alias resolution and use generic Eagle model path Add model_paths alias resolution for extra_mount host paths in config.py, enabling MTP recipes to use generic name "kimi-k2.5-eagle3" instead of cluster-specific path for the Eagle speculative decoding model. * Use HuggingFace model names and full NVCR container paths Per review feedback, update model paths to HuggingFace format (nvidia/Kimi-K2.5-NVFP4) and container to full NVCR registry path (nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.1.0-dev.2) so recipes are portable and work without pre-built sqsh files. --------- Co-authored-by: nlevin-ui <nlevin@nvidia.com>
* recipes for minimax m2.5 fp4 b200 agg vllm * commit for signature
* Add lm-eval benchmark runner for InferenceX evals Adds support for running lm-eval accuracy evaluations as a post-benchmark step, leveraging the InferenceX benchmark_lib.sh harness.
…NVIDIA#47) * fix tokenizer for glm5 (NVIDIA#20) fix * add nvidia pre-release url (NVIDIA#22)
Add 66 GLM5 NVFP4 disaggregated recipe configs for GB200 and GB300 on the sa-submission branch; standardize model path and container values across the recipe set for consistency.
* Add GLM5 GB200 NVFP4 Apr-09 disagg recipes. Include the updated 1K/1K and 8K/1K STP and MTP TensorRT-LLM Dynamo configs so submission testing can run on the latest GB200 parameter set. * Keep only Apr-09 GB200 configs and align YAML quoting. Remove legacy GB200 trtllm_dynamo recipes inherited from the submission base branch, and normalize concurrencies/custom_tokenizer fields to double-quoted style for consistency with existing GB300 recipes. * fix: enable chat template and 16x rounds for GB200 GLM5 configs Update GB200 GLM5 trtllm_dynamo recipes to set use_chat_template=true and num_prompts_mult=16 so sa-bench runs align with current submission benchmarking methodology.
…IA#113) Set GLM5 GB300 trtllm_dynamo recipes to use chat template and num_prompts_mult=16 so throughput runs match TRTLLM multi-round methodology, while keeping warmup fixed at 2x.
Add setup_script install-trtllm-pip.sh to all GB300 GLM5 trtllm_dynamo recipes so eval-only jobs can install lm-eval even when pip is missing in the runtime container venv.
* Add spread_workers option to ResourceConfig Allow placing each partial-node worker on its own node instead of packing multiple onto the same node. Useful when colocating workers on a single node causes resource contention (port collisions, etc.). Caller must reserve enough nodes (e.g. set decode_nodes=decode_workers when gpus_per_decode<gpus_per_node). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * try fix * allow multiple DEP2 workers per node * multi worker fix * Allow vLLM one-node prefill decode colocation * Avoid same-node worker port collisions * Fix spread workers tests and lint * Cover vLLM colocation guard --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: hjjq <50634613+hjjq@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #191 +/- ##
=======================================
Coverage ? 65.78%
=======================================
Files ? 67
Lines ? 8401
Branches ? 0
=======================================
Hits ? 5527
Misses ? 2874
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
main, excluding recipe changes.recipes/at the shared merge-base state on this branch, so the PR has no recipe-path changes and a merge should preserve the currentmainrecipe tree.What is intentionally excluded
sa-submission-q2-2026.Notes
sa-submission-q2-2026or is separately cherry-picked here.mainis not introduced by this branch; this branch preservesmainunless the Q2 changes touch the same files.Validation
git diff --name-only nvidia/main...HEAD -- recipes | wc -lthat there are 0 recipe-path changes in this PR-style diff.