Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg#1560
Conversation
Port PR57 MiniMax-M2.5 FP4 disaggregated multi-node vLLM recipes for GB300 and B300 to SA upstream, including runner wiring and changelog entries while keeping SA-default GB300 account/partition/sqsh paths.
Remove the minimax-specific SLURM account override in launch_b300-nv.sh so SA upstream runs with batch_1/benchmark defaults, per reviewer guidance.
| # B300-only: full-node TP=8 decode (the 8 GPUs of a single B300 node). | ||
| # Cousin of tp4-1p1d.yaml but exercises the wider TP that B300's per-node | ||
| # GPU count makes available. Only the smallest concurrencies (1,4,8) — | ||
| # this topology is decode-latency focused, not throughput. |
There was a problem hiding this comment.
🟡 The tp8-1p1d.yaml header comments claim 'Only the smallest concurrencies (1,4,8)' (1k1k) / 'Smallest concurrencies only (1,4,8)' (8k1k), but both files set concurrencies: "4" — only 4, not 1/4/8. The nvidia-master.yaml entries for tp8-1p1d list conc-list [4] for both ISLs, confirming the recipes are authoritative and the comments are stale. Either update the comments to say 'only conc 4' or expand the concurrencies field to actually include 1 and 8.
Extended reasoning...
What the bug is\n\nTwo new B300-only recipe files document a sweep over three concurrencies in their header comments but only run a single concurrency in the actual benchmark block.\n\nIn benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/1k1k/tp8-1p1d.yaml (lines 3–6):\n\nyaml\n# B300-only: full-node TP=8 decode (the 8 GPUs of a single B300 node).\n# Cousin of tp4-1p1d.yaml but exercises the wider TP that B300's per-node\n# GPU count makes available. Only the smallest concurrencies (1,4,8) —\n# this topology is decode-latency focused, not throughput.\n\n\nBut the benchmark block at the bottom is:\n\nyaml\nbenchmark:\n type: "sa-bench"\n isl: 1024\n osl: 1024\n concurrencies: "4"\n random_range_ratio: 0.8\n\n\nThe 8k1k counterpart (benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/8k1k/tp8-1p1d.yaml) has the same drift — its header says "Smallest concurrencies only (1,4,8)" while concurrencies: "4".\n\nWhich side is authoritative\n\nCross-referencing .github/configs/nvidia-master.yaml, the tp8-1p1d search-space entries for both ISLs only list conc-list: [4]:\n\nyaml\n# 1k1k\n- conc-list: [4]\n prefill: ...\n additional-settings:\n - "CONFIG_FILE=recipes/vllm/minimax-m2.5/1k1k/tp8-1p1d.yaml"\n# 8k1k\n- conc-list: [4]\n prefill: ...\n additional-settings:\n - "CONFIG_FILE=recipes/vllm/minimax-m2.5/8k1k/tp8-1p1d.yaml"\n\n\nSo the recipe concurrencies: "4" and the master conc-list: [4] agree, and the header comments are the outlier. The driver only iterates over what's in those two fields; the comment has no runtime effect.\n\nStep-by-step proof\n\n1. Open 1k1k/tp8-1p1d.yaml. Comment block (lines 3–6) says "Only the smallest concurrencies (1,4,8)".\n2. Scroll to line 75 — concurrencies: "4". The driver parses this "x"-delimited string, yielding the single value [4].\n3. .github/configs/nvidia-master.yaml has conc-list: [4] for this recipe — matches the recipe, not the comment.\n4. Same pattern repeats in 8k1k/tp8-1p1d.yaml.\n\nWhy existing code doesn't prevent it\n\nThere is no validator that checks free-form comments against the actual concurrencies field, so the drift slipped through.\n\nImpact\n\nNone at runtime — benchmarks run exactly the conc set defined in concurrencies and conc-list. The only consequence is that a future reader (or copy-paste donor) of these recipes will be misled into believing tp8-1p1d covers conc 1 and 8 in addition to 4. That's a minor doc-hygiene issue, not a correctness bug.\n\nHow to fix\n\nPick one of:\n\n- Update the header comments to # Only conc 4 — this topology is decode-latency focused, not throughput. (and the equivalent on the 8k1k file), which matches the current concurrencies field and the master conc-list, or\n- Expand the recipes to concurrencies: "1x4x8" and the master conc-list: [1, 4, 8] if covering 1 and 8 was the actual intent.
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX | ||
|
|
||
| - config-keys: | ||
| - minimaxm2.5-fp4-gb300-dynamo-vllm | ||
| description: | ||
| - "Reactivate GB300 minimax disagg vLLM sweep with the pruned pareto-only search space from internal PR57" | ||
| - "Add minimax model routing and minimax srt-slurm fork path in runners/launch_gb300-nv.sh while keeping SA upstream defaults (SLURM_PARTITION=batch_1, SLURM_ACCOUNT=benchmark, SQUASH_FILE under /home/sa-shared/gharunners/squash/)" | ||
| - "Add 1k1k/8k1k minimax recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5/" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX |
There was a problem hiding this comment.
🟡 Both new perf-changelog.yaml entries (lines 3125 and 3133) ship with placeholder pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX URLs that will 404 once merged. The glm5 entry directly above (line 3115) was correctly populated with pull/1514 — these two should likewise be updated to reference this PR (#1560) before merging.
Extended reasoning...
What the bug is
perf-changelog.yaml gains two new changelog entries in this PR, one for minimaxm2.5-fp4-b300-dynamo-vllm and one for minimaxm2.5-fp4-gb300-dynamo-vllm. Both entries terminate with a literal placeholder URL:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXXThe XXXX is meant to be replaced with the actual PR number prior to merging. Per the PR metadata, this is PR #1560, so both placeholders should be replaced with pull/1560.
How it manifests
Once merged, anyone consuming the changelog (UI surfaces that render pr-link, scripts that crawl the file to attribute regressions, humans clicking through history) will follow these URLs to https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX, which is a non-numeric pull number and GitHub will return 404. The two new entries effectively lose their provenance back to the PR that introduced them.
Why existing code doesn't prevent it
The changelog is a hand-edited YAML file with no schema validation in this PR or CI that I can see — placeholders like XXXX parse as valid strings, so YAML lint passes and the entry merges cleanly. The convention is enforced by reviewer attention only, and the glm5 entry immediately above (line 3115, pull/1514) demonstrates the expected post-fix state.
Proof / step-by-step
- Open
perf-changelog.yamlat line 3115 — the priorglm5-fp4-gb300-dynamo-sglangentry haspr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1514, the real PR for that change. - Continue to line 3125 — the new
minimaxm2.5-fp4-b300-dynamo-vllmentry haspr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. - Continue to line 3133 — the new
minimaxm2.5-fp4-gb300-dynamo-vllmentry also haspr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. - The PR header reports this is PR Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg #1560.
- Once merged, fetching
https://github.com/SemiAnalysisAI/InferenceX/pull/XXXXwill not resolve to a valid pull request (404), so the changelog ships with two dead links.
Fix
Replace both XXXX occurrences with 1560 before merging:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1560This is a documentation/metadata issue only — no runtime impact — but worth fixing so the changelog stays a reliable index back to the introducing PRs (the same way the glm5 entry just above already does).
Removed redundant details from MiniMax-M2.5 NVFP4 B300 and GB300 descriptions in the changelog.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26321728933 |
2 similar comments
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26321728933 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26321728933 |
Drop the confirmed unstable conc=512 datapoint from the b300 minimax 8k1k dep4-4p1d recipe and matching nvidia-master search-space to keep curve data consistent.
14c4dcf to
6d9c522
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26423637181 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26423684433 |
Switch minimaxm2.5-fp4-gb300-dynamo-vllm from generic gb300 label to gb300-nv so jobs use launch_gb300-nv.sh and avoid unsupported gb300-cw routing.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a135a6a. Configure here.
Use /scratch/models/MiniMax-M2.5-NVFP4 in launch_gb300-nv.sh so gb300 srtslurm model path resolution matches the cluster's staged model layout.
Remove the login-node fast-path skip in import_squash so every existing squash is validated on compute nodes via unsquashfs before reuse.
9d18e00 to
bea3877
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26467787733 |

Summary
.github/configs/nvidia-master.yamlfor:minimaxm2.5-fp4-gb300-dynamo-vllmminimaxm2.5-fp4-b300-dynamo-vllmbenchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5/benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/runners/launch_gb300-nv.shrunners/launch_b300-nv.shperf-changelog.yamlwith entries for both new configs.Note
Low Risk
Changes are benchmark/CI YAML and launch scripts only; no production serving or auth paths. Runner depends on an external srt-slurm fork branch for new schema fields.
Overview
Adds disaggregated multinode Dynamo + vLLM benchmark coverage for MiniMax-M2.5 NVFP4 on GB300 and B300, using
vllm/vllm-openai:v0.20.1andnvidia/MiniMax-M2.5-NVFP4..github/configs/nvidia-master.yamlgainsminimaxm2.5-fp4-gb300-dynamo-vllmandminimaxm2.5-fp4-b300-dynamo-vllm: multinode, disagg prefill/decode search spaces for 1k/1k and 8k/1k ISL/OSL, each point pointing at aCONFIG_FILEunderrecipes/vllm/minimax-m2.5/.... B300 largely mirrors GB300 but addstp8-1p1dat low concurrency for both sequence lengths.New Slurm recipe trees under
benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5/(GB300) andminimax-m2.5-b300/(B300): Dynamo frontend, Nixl KV transfer, TP/EP/DP decode variants, and sa-bench concurrency sweeps.runners/launch_gb300-nv.shandlaunch_b300-nv.shrouteminimaxm2.5+fp4+dynamo-vllmto the model path/prefix, clonejasonlizhengjian/srt-slurm@lijas/spread-workers, and copy the matching recipe dir intorecipes/vllm/minimax-m2.5. GB300 also hardensimport_squashwhen enroot leaves symlink locks/squash files (ELOOP).perf-changelog.yamldocuments both new config keys.Reviewed by Cursor Bugbot for commit bea3877. Bugbot is set up for automated code reviews on this repo. Configure here.