Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg by yeswanthk-26 · Pull Request #1560 · SemiAnalysisAI/InferenceX

yeswanthk-26 · 2026-05-23T01:15:40Z

Summary

Add config entries in .github/configs/nvidia-master.yaml for:
- minimaxm2.5-fp4-gb300-dynamo-vllm
- minimaxm2.5-fp4-b300-dynamo-vllm
Add recipe sets under:
- benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5/
- benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/
Wire runner logic in:
- runners/launch_gb300-nv.sh
- runners/launch_b300-nv.sh
Update perf-changelog.yaml with entries for both new configs.

Note

Low Risk
Changes are benchmark/CI YAML and launch scripts only; no production serving or auth paths. Runner depends on an external srt-slurm fork branch for new schema fields.

Overview
Adds disaggregated multinode Dynamo + vLLM benchmark coverage for MiniMax-M2.5 NVFP4 on GB300 and B300, using vllm/vllm-openai:v0.20.1 and nvidia/MiniMax-M2.5-NVFP4.

.github/configs/nvidia-master.yaml gains minimaxm2.5-fp4-gb300-dynamo-vllm and minimaxm2.5-fp4-b300-dynamo-vllm: multinode, disagg prefill/decode search spaces for 1k/1k and 8k/1k ISL/OSL, each point pointing at a CONFIG_FILE under recipes/vllm/minimax-m2.5/.... B300 largely mirrors GB300 but adds tp8-1p1d at low concurrency for both sequence lengths.

New Slurm recipe trees under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5/ (GB300) and minimax-m2.5-b300/ (B300): Dynamo frontend, Nixl KV transfer, TP/EP/DP decode variants, and sa-bench concurrency sweeps.

runners/launch_gb300-nv.sh and launch_b300-nv.sh route minimaxm2.5 + fp4 + dynamo-vllm to the model path/prefix, clone jasonlizhengjian/srt-slurm @ lijas/spread-workers, and copy the matching recipe dir into recipes/vllm/minimax-m2.5. GB300 also hardens import_squash when enroot leaves symlink locks/squash files (ELOOP).

perf-changelog.yaml documents both new config keys.

^{Reviewed by Cursor Bugbot for commit bea3877. Bugbot is set up for automated code reviews on this repo. Configure here.}

Port PR57 MiniMax-M2.5 FP4 disaggregated multi-node vLLM recipes for GB300 and B300 to SA upstream, including runner wiring and changelog entries while keeping SA-default GB300 account/partition/sqsh paths.

Remove the minimax-specific SLURM account override in launch_b300-nv.sh so SA upstream runs with batch_1/benchmark defaults, per reviewer guidance.

claude · 2026-05-23T01:23:07Z

+# B300-only: full-node TP=8 decode (the 8 GPUs of a single B300 node).
+# Cousin of tp4-1p1d.yaml but exercises the wider TP that B300's per-node
+# GPU count makes available. Only the smallest concurrencies (1,4,8) —
+# this topology is decode-latency focused, not throughput.


🟡 The tp8-1p1d.yaml header comments claim 'Only the smallest concurrencies (1,4,8)' (1k1k) / 'Smallest concurrencies only (1,4,8)' (8k1k), but both files set concurrencies: "4" — only 4, not 1/4/8. The nvidia-master.yaml entries for tp8-1p1d list conc-list [4] for both ISLs, confirming the recipes are authoritative and the comments are stale. Either update the comments to say 'only conc 4' or expand the concurrencies field to actually include 1 and 8.

Extended reasoning...

What the bug is\n\nTwo new B300-only recipe files document a sweep over three concurrencies in their header comments but only run a single concurrency in the actual benchmark block.\n\nIn benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/1k1k/tp8-1p1d.yaml (lines 3–6):\n\nyaml\n# B300-only: full-node TP=8 decode (the 8 GPUs of a single B300 node).\n# Cousin of tp4-1p1d.yaml but exercises the wider TP that B300's per-node\n# GPU count makes available. Only the smallest concurrencies (1,4,8) —\n# this topology is decode-latency focused, not throughput.\n\n\nBut the benchmark block at the bottom is:\n\nyaml\nbenchmark:\n type: "sa-bench"\n isl: 1024\n osl: 1024\n concurrencies: "4"\n random_range_ratio: 0.8\n\n\nThe 8k1k counterpart (benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/8k1k/tp8-1p1d.yaml) has the same drift — its header says "Smallest concurrencies only (1,4,8)" while concurrencies: "4".\n\nWhich side is authoritative\n\nCross-referencing .github/configs/nvidia-master.yaml, the tp8-1p1d search-space entries for both ISLs only list conc-list: [4]:\n\nyaml\n# 1k1k\n- conc-list: [4]\n prefill: ...\n additional-settings:\n - "CONFIG_FILE=recipes/vllm/minimax-m2.5/1k1k/tp8-1p1d.yaml"\n# 8k1k\n- conc-list: [4]\n prefill: ...\n additional-settings:\n - "CONFIG_FILE=recipes/vllm/minimax-m2.5/8k1k/tp8-1p1d.yaml"\n\n\nSo the recipe concurrencies: "4" and the master conc-list: [4] agree, and the header comments are the outlier. The driver only iterates over what's in those two fields; the comment has no runtime effect.\n\nStep-by-step proof\n\n1. Open 1k1k/tp8-1p1d.yaml. Comment block (lines 3–6) says "Only the smallest concurrencies (1,4,8)".\n2. Scroll to line 75 — concurrencies: "4". The driver parses this "x"-delimited string, yielding the single value [4].\n3. .github/configs/nvidia-master.yaml has conc-list: [4] for this recipe — matches the recipe, not the comment.\n4. Same pattern repeats in 8k1k/tp8-1p1d.yaml.\n\nWhy existing code doesn't prevent it\n\nThere is no validator that checks free-form comments against the actual concurrencies field, so the drift slipped through.\n\nImpact\n\nNone at runtime — benchmarks run exactly the conc set defined in concurrencies and conc-list. The only consequence is that a future reader (or copy-paste donor) of these recipes will be misled into believing tp8-1p1d covers conc 1 and 8 in addition to 4. That's a minor doc-hygiene issue, not a correctness bug.\n\nHow to fix\n\nPick one of:\n\n- Update the header comments to # Only conc 4 — this topology is decode-latency focused, not throughput. (and the equivalent on the 8k1k file), which matches the current concurrencies field and the master conc-list, or\n- Expand the recipes to concurrencies: "1x4x8" and the master conc-list: [1, 4, 8] if covering 1 and 8 was the actual intent.

claude · 2026-05-23T01:23:07Z

+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX
+
+- config-keys:
+    - minimaxm2.5-fp4-gb300-dynamo-vllm
+  description:
+    - "Reactivate GB300 minimax disagg vLLM sweep with the pruned pareto-only search space from internal PR57"
+    - "Add minimax model routing and minimax srt-slurm fork path in runners/launch_gb300-nv.sh while keeping SA upstream defaults (SLURM_PARTITION=batch_1, SLURM_ACCOUNT=benchmark, SQUASH_FILE under /home/sa-shared/gharunners/squash/)"
+    - "Add 1k1k/8k1k minimax recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5/"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX


🟡 Both new perf-changelog.yaml entries (lines 3125 and 3133) ship with placeholder pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX URLs that will 404 once merged. The glm5 entry directly above (line 3115) was correctly populated with pull/1514 — these two should likewise be updated to reference this PR (#1560) before merging.

Extended reasoning...

What the bug is

perf-changelog.yaml gains two new changelog entries in this PR, one for minimaxm2.5-fp4-b300-dynamo-vllm and one for minimaxm2.5-fp4-gb300-dynamo-vllm. Both entries terminate with a literal placeholder URL:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX

The XXXX is meant to be replaced with the actual PR number prior to merging. Per the PR metadata, this is PR #1560, so both placeholders should be replaced with pull/1560.

How it manifests

Once merged, anyone consuming the changelog (UI surfaces that render pr-link, scripts that crawl the file to attribute regressions, humans clicking through history) will follow these URLs to https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX, which is a non-numeric pull number and GitHub will return 404. The two new entries effectively lose their provenance back to the PR that introduced them.

Why existing code doesn't prevent it

The changelog is a hand-edited YAML file with no schema validation in this PR or CI that I can see — placeholders like XXXX parse as valid strings, so YAML lint passes and the entry merges cleanly. The convention is enforced by reviewer attention only, and the glm5 entry immediately above (line 3115, pull/1514) demonstrates the expected post-fix state.

Proof / step-by-step

Open perf-changelog.yaml at line 3115 — the prior glm5-fp4-gb300-dynamo-sglang entry has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1514, the real PR for that change.

Continue to line 3125 — the new minimaxm2.5-fp4-b300-dynamo-vllm entry has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.

Continue to line 3133 — the new minimaxm2.5-fp4-gb300-dynamo-vllm entry also has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.

The PR header reports this is PR Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg #1560.

Once merged, fetching https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX will not resolve to a valid pull request (404), so the changelog ships with two dead links.

Fix

Replace both XXXX occurrences with 1560 before merging:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1560

This is a documentation/metadata issue only — no runtime impact — but worth fixing so the changelog stays a reliable index back to the introducing PRs (the same way the glm5 entry just above already does).

…disagg

Removed redundant details from MiniMax-M2.5 NVFP4 B300 and GB300 descriptions in the changelog.

github-actions · 2026-05-23T04:19:53Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26321728933
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26321728933

github-actions · 2026-05-23T05:11:29Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26321728933
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26321728933

github-actions · 2026-05-23T06:46:26Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26321728933
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26321728933

Drop the confirmed unstable conc=512 datapoint from the b300 minimax 8k1k dep4-4p1d recipe and matching nvidia-master search-space to keep curve data consistent.

github-actions · 2026-05-25T23:22:41Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26423637181
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26423637181

github-actions · 2026-05-26T00:45:44Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26423684433
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26423684433

Switch minimaxm2.5-fp4-gb300-dynamo-vllm from generic gb300 label to gb300-nv so jobs use launch_gb300-nv.sh and avoid unsupported gb300-cw routing.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit a135a6a. Configure here.}

Use /scratch/models/MiniMax-M2.5-NVFP4 in launch_gb300-nv.sh so gb300 srtslurm model path resolution matches the cluster's staged model layout.

Remove the login-node fast-path skip in import_squash so every existing squash is validated on compute nodes via unsquashfs before reuse.

github-actions · 2026-05-26T21:33:53Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26467787733
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26467787733

yeswanthk-26 added 2 commits May 22, 2026 13:33

[GB300][B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs

5aa4520

Port PR57 MiniMax-M2.5 FP4 disaggregated multi-node vLLM recipes for GB300 and B300 to SA upstream, including runner wiring and changelog entries while keeping SA-default GB300 account/partition/sqsh paths.

Use SA default account for b300 minimax run

b887433

Remove the minimax-specific SLURM account override in launch_b300-nv.sh so SA upstream runs with batch_1/benchmark defaults, per reviewer guidance.

yeswanthk-26 requested a review from a team May 23, 2026 01:15

yeswanthk-26 requested review from jgangani and kedarpotdar-nv as code owners May 23, 2026 01:15

github-project-automation Bot added this to InferenceMAX Board May 23, 2026

claude Bot reviewed May 23, 2026

View reviewed changes

Ankur-singh added 2 commits May 22, 2026 19:57

Merge branch 'main' into yeswanth/minimax-fp4-gb300-b300-dynamo-vllm-…

6d14c33

…disagg

Simplify descriptions for MiniMax benchmarks

2f1c47c

Removed redundant details from MiniMax-M2.5 NVFP4 B300 and GB300 descriptions in the changelog.

Ankur-singh added the full-sweep-enabled label May 23, 2026

Remove b300 8k1k conc512 outlier point

6d9c522

Drop the confirmed unstable conc=512 datapoint from the b300 minimax 8k1k dep4-4p1d recipe and matching nvidia-master search-space to keep curve data consistent.

yeswanthk-26 force-pushed the yeswanth/minimax-fp4-gb300-b300-dynamo-vllm-disagg branch from 14c4dcf to 6d9c522 Compare May 25, 2026 23:21

Pin minimax gb300 runs to gb300-nv runner

a135a6a

Switch minimaxm2.5-fp4-gb300-dynamo-vllm from generic gb300 label to gb300-nv so jobs use launch_gb300-nv.sh and avoid unsupported gb300-cw routing.

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread runners/launch_gb300-nv.sh

Comment thread runners/launch_gb300-nv.sh Outdated

yeswanthk-26 added 3 commits May 26, 2026 11:47

Fix gb300 minimax model path to scratch

0f6e7bc

Use /scratch/models/MiniMax-M2.5-NVFP4 in launch_gb300-nv.sh so gb300 srtslurm model path resolution matches the cluster's staged model layout.

Restore squash validation path on gb300-nv

00b1ca0

Remove the login-node fast-path skip in import_squash so every existing squash is validated on compute nodes via unsquashfs before reuse.

Merge main into minimax submission branch

bea3877

yeswanthk-26 force-pushed the yeswanth/minimax-fp4-gb300-b300-dynamo-vllm-disagg branch from 9d18e00 to bea3877 Compare May 26, 2026 21:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg#1560

Yeswanth/minimax fp4 gb300 b300 dynamo vllm disagg#1560
yeswanthk-26 wants to merge 9 commits into
mainfrom
yeswanth/minimax-fp4-gb300-b300-dynamo-vllm-disagg

yeswanthk-26 commented May 23, 2026 •

edited by cursor Bot

Loading

Uh oh!

claude Bot May 23, 2026

Uh oh!

claude Bot May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yeswanthk-26 commented May 23, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

claude Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 23, 2026

Choose a reason for hiding this comment

What the bug is

How it manifests

Why existing code doesn't prevent it

Proof / step-by-step

Fix

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yeswanthk-26 commented May 23, 2026 •

edited by cursor Bot

Loading