Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ frontend:
num_additional_frontends: 8

dynamo:
hash: "34d55a596fb8d3d44daefe425ec1e303131f4d2c"
hash: "81d0555ee23519cea80a42b4fe824e30368b7300"
install: true

model:
path: "deepseek-v4-pro"
container: "lmsysorg/sglang:nightly-dev-cu13-20260510-2473659e"
container: "lmsysorg/sglang:nightly-dev-20260522-c9153da5"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The 6 recipe YAMLs are bumped to lmsysorg/sglang:nightly-dev-20260522-c9153da5, but the matching image: field on the dsv4-fp4-gb300-dynamo-sglang-mtp block in .github/configs/nvidia-master.yaml (line 9073) is left at the stale lmsysorg/sglang:nightly-dev-cu13-20260509-9ee83034. Per AGENTS.md the two must be bumped in lockstep — the launcher uses image: as the container-alias key, so without this update CI will still import/run the old image and the perf-changelog claim is untrue. Fix: bump line 9073 of nvidia-master.yaml to the same nightly-dev-20260522-c9153da5 tag.

Extended reasoning...

What's wrong

This PR bumps model.container in all six benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/*-mtp.yaml files from lmsysorg/sglang:nightly-dev-cu13-20260510-2473659e to lmsysorg/sglang:nightly-dev-20260522-c9153da5, and adds a perf-changelog entry that explicitly claims the image was updated for dsv4-fp4-gb300-dynamo-sglang-mtp. However .github/configs/nvidia-master.yaml line 9073 (the image: field on the dsv4-fp4-gb300-dynamo-sglang-mtp block) still reads lmsysorg/sglang:nightly-dev-cu13-20260509-9ee83034 — an even older 20260509 tag from before the previous bump.

Why this matters

AGENTS.md line 115 documents the invariant explicitly: multi-node srt-slurm changes must edit the recipe yaml AND nvidia-master.yaml together, and for image bumps model.container must equal image: because the launcher uses the latter as the container-alias key. Concretely, .github/workflows/profile.yml reads matrix.config.image from nvidia-master.yaml into the IMAGE env var, and runners/launch_gb300-cw.sh uses it both to build/import the enroot squash file (enroot import -o ... docker://$image) and to register the alias in the generated srtslurm.yaml containers map (${IMAGE}: ${SQUASH_FILE}). The recipe's container: is then matched against that alias by srtctl.

Precedent

The sibling non-MTP PR #1528 (commit 59980fe) for dsv4-fp4-gb300-dynamo-sglang updated BOTH .github/configs/nvidia-master.yaml AND the recipe YAMLs in lockstep. After that PR, the non-MTP block at line 8760 sits at nightly-dev-cu13-20260520-425dffbd matching its recipe — a consistent lockstep. The MTP variant has now diverged 13 days from its recipe, and the new tag has dropped the cu13 prefix.

Step-by-step proof of impact

  1. CI launches the sweep, profile.yml reads matrix.config.image from nvidia-master.yaml → IMAGE=lmsysorg/sglang:nightly-dev-cu13-20260509-9ee83034.
  2. runners/launch_gb300-cw.sh runs enroot import -o $SQUASH_FILE docker://$IMAGE — squashing the 20260509 image.
  3. The generated srtslurm.yaml registers containers: { "${IMAGE}": ${SQUASH_FILE} } — keyed by the 20260509 tag.
  4. srtctl loads the recipe yaml, sees model.container: lmsysorg/sglang:nightly-dev-20260522-c9153da5 — does not match the alias.
  5. Result is one of: (a) srtctl falls back to a fresh docker pull of the 20260522 image at runtime (defeating the pre-stage), (b) the alias mismatch causes a launch failure, or (c) the bench actually runs against the 20260509 squash file — invalidating the perf-changelog claim. All three are bad outcomes.

Fix

Bump .github/configs/nvidia-master.yaml line 9073 from lmsysorg/sglang:nightly-dev-cu13-20260509-9ee83034 to lmsysorg/sglang:nightly-dev-20260522-c9153da5 in this PR, matching the recipe container: values and the lockstep pattern established by PR #1528.

precision: "mxfp4"

sbatch_directives:
Expand All @@ -31,14 +31,10 @@ backend:

prefill_environment:
PYTHONUNBUFFERED: "1"
SGLANG_RADIX_DISABLE_REUSE: "1"
SGLANG_JIT_DEEPGEMM_FAST_WARMUP: "1"
SGLANG_DEFAULT_THINKING: "1"
SGLANG_DSV4_REASONING_EFFORT: "max"
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"
NCCL_MNNVL_ENABLE: "1"
NCCL_CUMEM_ENABLE: "1"
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
Expand All @@ -49,14 +45,10 @@ backend:

decode_environment:
PYTHONUNBUFFERED: "1"
SGLANG_RADIX_DISABLE_REUSE: "1"
SGLANG_JIT_DEEPGEMM_FAST_WARMUP: "1"
SGLANG_DEFAULT_THINKING: "1"
SGLANG_DSV4_REASONING_EFFORT: "max"
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"
NCCL_MNNVL_ENABLE: "1"
NCCL_CUMEM_ENABLE: "1"
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ frontend:
num_additional_frontends: 8

dynamo:
hash: "34d55a596fb8d3d44daefe425ec1e303131f4d2c"
hash: "81d0555ee23519cea80a42b4fe824e30368b7300"
install: true

model:
path: "deepseek-v4-pro"
container: "lmsysorg/sglang:nightly-dev-cu13-20260510-2473659e"
container: "lmsysorg/sglang:nightly-dev-20260522-c9153da5"
precision: "mxfp4"

sbatch_directives:
Expand All @@ -31,24 +31,14 @@ backend:

prefill_environment:
PYTHONUNBUFFERED: "1"
SGLANG_RADIX_DISABLE_REUSE: "1"
SGLANG_JIT_DEEPGEMM_FAST_WARMUP: "1"
SGLANG_DEFAULT_THINKING: "1"
SGLANG_DSV4_REASONING_EFFORT: "max"
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"

SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "9216"
SGLANG_OPT_FIX_MEGA_MOE_MEMORY: "1"
SGLANG_OPT_FIX_NEXTN_MEGA_MOE: "1"
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"

NCCL_MNNVL_ENABLE: "1"
NCCL_CUMEM_ENABLE: "1"
Expand All @@ -60,14 +50,10 @@ backend:

decode_environment:
PYTHONUNBUFFERED: "1"
SGLANG_RADIX_DISABLE_REUSE: "1"
SGLANG_JIT_DEEPGEMM_FAST_WARMUP: "1"
SGLANG_DEFAULT_THINKING: "1"
SGLANG_DSV4_REASONING_EFFORT: "max"
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"
NCCL_MNNVL_ENABLE: "1"
NCCL_CUMEM_ENABLE: "1"
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
Expand Down Expand Up @@ -95,8 +81,7 @@ backend:
enable-dp-attention: true
enable-dp-lm-head: true

moe-a2a-backend: "deepep"
deepep-config: '{"normal_dispatch":{"num_sms":96},"normal_combine":{"num_sms":96}}'
moe-a2a-backend: "megamoe"

mem-fraction-static: 0.9
max-running-requests: 128
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ frontend:
num_additional_frontends: 8

dynamo:
hash: "34d55a596fb8d3d44daefe425ec1e303131f4d2c"
hash: "81d0555ee23519cea80a42b4fe824e30368b7300"
install: true

model:
path: "deepseek-v4-pro"
container: "lmsysorg/sglang:nightly-dev-cu13-20260510-2473659e"
container: "lmsysorg/sglang:nightly-dev-20260522-c9153da5"
precision: "mxfp4"

sbatch_directives:
Expand All @@ -33,24 +33,14 @@ backend:

prefill_environment:
PYTHONUNBUFFERED: "1"
SGLANG_RADIX_DISABLE_REUSE: "1"
SGLANG_JIT_DEEPGEMM_FAST_WARMUP: "1"
SGLANG_DEFAULT_THINKING: "1"
SGLANG_DSV4_REASONING_EFFORT: "max"
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"

SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "9216"
SGLANG_OPT_FIX_MEGA_MOE_MEMORY: "1"
SGLANG_OPT_FIX_NEXTN_MEGA_MOE: "1"
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"

NCCL_MNNVL_ENABLE: "1"
NCCL_CUMEM_ENABLE: "1"
Expand All @@ -62,23 +52,13 @@ backend:

decode_environment:
PYTHONUNBUFFERED: "1"
SGLANG_RADIX_DISABLE_REUSE: "1"
SGLANG_JIT_DEEPGEMM_FAST_WARMUP: "1"
SGLANG_DEFAULT_THINKING: "1"
SGLANG_DSV4_REASONING_EFFORT: "max"
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"

SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "2048"
SGLANG_OPT_FIX_MEGA_MOE_MEMORY: "1"
SGLANG_OPT_FIX_NEXTN_MEGA_MOE: "1"
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"

NCCL_MNNVL_ENABLE: "1"
NCCL_CUMEM_ENABLE: "1"
Expand Down Expand Up @@ -106,8 +86,7 @@ backend:
enable-dp-attention: true
enable-dp-lm-head: true

moe-a2a-backend: "deepep"
deepep-config: '{"normal_dispatch":{"num_sms":96},"normal_combine":{"num_sms":96}}'
moe-a2a-backend: "megamoe"

mem-fraction-static: 0.9
max-running-requests: 256
Expand All @@ -131,8 +110,7 @@ backend:
enable-dp-attention: true
enable-dp-lm-head: true

moe-a2a-backend: "deepep"
deepep-config: '{"normal_dispatch":{"num_sms":96},"normal_combine":{"num_sms":96}}'
moe-a2a-backend: "megamoe"

speculative-algo: "EAGLE"
speculative-num-steps: 3
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ frontend:
num_additional_frontends: 8

dynamo:
hash: "34d55a596fb8d3d44daefe425ec1e303131f4d2c"
hash: "81d0555ee23519cea80a42b4fe824e30368b7300"
install: true

model:
path: "deepseek-v4-pro"
container: "lmsysorg/sglang:nightly-dev-cu13-20260510-2473659e"
container: "lmsysorg/sglang:nightly-dev-20260522-c9153da5"
precision: "mxfp4"

sbatch_directives:
Expand All @@ -33,24 +33,14 @@ backend:

prefill_environment:
PYTHONUNBUFFERED: "1"
SGLANG_RADIX_DISABLE_REUSE: "1"
SGLANG_JIT_DEEPGEMM_FAST_WARMUP: "1"
SGLANG_DEFAULT_THINKING: "1"
SGLANG_DSV4_REASONING_EFFORT: "max"
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"

SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "9216"
SGLANG_OPT_FIX_MEGA_MOE_MEMORY: "1"
SGLANG_OPT_FIX_NEXTN_MEGA_MOE: "1"
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"

NCCL_MNNVL_ENABLE: "1"
NCCL_CUMEM_ENABLE: "1"
Expand All @@ -62,23 +52,13 @@ backend:

decode_environment:
PYTHONUNBUFFERED: "1"
SGLANG_RADIX_DISABLE_REUSE: "1"
SGLANG_JIT_DEEPGEMM_FAST_WARMUP: "1"
SGLANG_DEFAULT_THINKING: "1"
SGLANG_DSV4_REASONING_EFFORT: "max"
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"

SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "2048"
SGLANG_OPT_FIX_MEGA_MOE_MEMORY: "1"
SGLANG_OPT_FIX_NEXTN_MEGA_MOE: "1"
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"

NCCL_MNNVL_ENABLE: "1"
NCCL_CUMEM_ENABLE: "1"
Expand Down Expand Up @@ -106,8 +86,7 @@ backend:
enable-dp-attention: true
enable-dp-lm-head: true

moe-a2a-backend: "deepep"
deepep-config: '{"normal_dispatch":{"num_sms":96},"normal_combine":{"num_sms":96}}'
moe-a2a-backend: "megamoe"

mem-fraction-static: 0.9
max-running-requests: 256
Expand All @@ -131,8 +110,7 @@ backend:
enable-dp-attention: true
enable-dp-lm-head: true

moe-a2a-backend: "deepep"
deepep-config: '{"normal_dispatch":{"num_sms":96},"normal_combine":{"num_sms":96}}'
moe-a2a-backend: "megamoe"

speculative-algo: "EAGLE"
speculative-num-steps: 3
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ frontend:
num_additional_frontends: 8

dynamo:
hash: "34d55a596fb8d3d44daefe425ec1e303131f4d2c"
hash: "81d0555ee23519cea80a42b4fe824e30368b7300"
install: true

model:
path: "deepseek-v4-pro"
container: "lmsysorg/sglang:nightly-dev-cu13-20260510-2473659e"
container: "lmsysorg/sglang:nightly-dev-20260522-c9153da5"
precision: "mxfp4"

sbatch_directives:
Expand All @@ -33,24 +33,14 @@ backend:

prefill_environment:
PYTHONUNBUFFERED: "1"
SGLANG_RADIX_DISABLE_REUSE: "1"
SGLANG_JIT_DEEPGEMM_FAST_WARMUP: "1"
SGLANG_DEFAULT_THINKING: "1"
SGLANG_DSV4_REASONING_EFFORT: "max"
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"

SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "9216"
SGLANG_OPT_FIX_MEGA_MOE_MEMORY: "1"
SGLANG_OPT_FIX_NEXTN_MEGA_MOE: "1"
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"

NCCL_MNNVL_ENABLE: "1"
NCCL_CUMEM_ENABLE: "1"
Expand All @@ -62,23 +52,13 @@ backend:

decode_environment:
PYTHONUNBUFFERED: "1"
SGLANG_RADIX_DISABLE_REUSE: "1"
SGLANG_JIT_DEEPGEMM_FAST_WARMUP: "1"
SGLANG_DEFAULT_THINKING: "1"
SGLANG_DSV4_REASONING_EFFORT: "max"
SGLANG_OPT_SWA_SPLIT_LEAF_ON_INSERT: "1"
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"

SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK: "4096"
SGLANG_OPT_FIX_MEGA_MOE_MEMORY: "1"
SGLANG_OPT_FIX_NEXTN_MEGA_MOE: "1"
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK: "0"

NCCL_MNNVL_ENABLE: "1"
NCCL_CUMEM_ENABLE: "1"
Expand Down Expand Up @@ -106,8 +86,7 @@ backend:
enable-dp-attention: true
enable-dp-lm-head: true

moe-a2a-backend: "deepep"
deepep-config: '{"normal_dispatch":{"num_sms":96},"normal_combine":{"num_sms":96}}'
moe-a2a-backend: "megamoe"

mem-fraction-static: 0.9
max-running-requests: 512
Expand All @@ -131,8 +110,7 @@ backend:
enable-dp-attention: true
enable-dp-lm-head: true

moe-a2a-backend: "deepep"
deepep-config: '{"normal_dispatch":{"num_sms":96},"normal_combine":{"num_sms":96}}'
moe-a2a-backend: "megamoe"

speculative-algo: "EAGLE"
speculative-num-steps: 3
Expand Down
Loading
Loading