Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions .github/configs/nvidia-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4515,7 +4515,7 @@ gptoss-fp4-h100-vllm:
- { tp: 8, conc-start: 4, conc-end: 16 }

minimaxm2.5-fp8-h100-vllm:
image: vllm/vllm-openai:v0.21.0
image: vllm/vllm-openai:v0.19.1-cu130
model: MiniMaxAI/MiniMax-M2.5
model-prefix: minimaxm2.5
runner: h100
Expand All @@ -4527,13 +4527,11 @@ minimaxm2.5-fp8-h100-vllm:
- isl: 1024
osl: 1024
search-space:
# - { tp: 8, ep: 8, conc-start: 4, conc-end: 64 }
- { tp: 4, ep: 4, conc-start: 4, conc-end: 64 }
- { tp: 8, ep: 8, conc-start: 4, conc-end: 128 }
- isl: 8192
osl: 1024
search-space:
# - { tp: 8, ep: 8, conc-start: 4, conc-end: 64 }
- { tp: 4, ep: 4, conc-start: 4, conc-end: 64 }
- { tp: 8, ep: 8, conc-start: 4, conc-end: 128 }
Comment on lines 4527 to +4534
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The comment block at lines 4498-4502 (just below this entry) explicitly claims the original minimaxm2.5-fp8-h100-vllm entry is byte-identical to origin/main with metadata identical to origin/main's version. This PR breaks both invariants by changing the image (v0.21.0v0.19.1-cu130) and rewriting the search-space (tp:4 ep:4 conc-end 64tp:8 ep:8 conc-end 128), so the comment will mislead future readers. Please update or delete those four comment lines in the same commit.

Extended reasoning...

What the stale comment says (lines 4498-4502 post-PR):\n\nyaml\n# Diverged from minimaxm2.5-fp8-h100-vllm (agentic-coding sibling). Metadata is\n# identical to origin/main's minimaxm2.5-fp8-h100-vllm; the split exists because this\n# PR adds an agentic-coding scenarios block that differs from main\n# (either main had none or had a different conc/offload sweep).\n# The original minimaxm2.5-fp8-h100-vllm entry stays byte-identical to origin/main.\n\n\nThis block was authored when the -agentic sibling was split off, as a promise from that earlier PR that it would not touch the original minimaxm2.5-fp8-h100-vllm entry. The promise was true at the moment it was written.\n\nWhy this PR invalidates the comment:\n\nThis PR is precisely the one that modifies the original entry, so both present-tense claims in the comment go stale the moment it lands:\n\n1. "Metadata is identical to origin/main's minimaxm2.5-fp8-h100-vllm" — false against post-merge main. After merge, the agentic sibling pins vllm/vllm-openai:v0.20.2 and uses tp:4 ep:4 conc-end 64, while the new original pins vllm/vllm-openai:v0.19.1-cu130 and uses tp:8 ep:8 conc-end 128. The two entries now diverge in both image AND parallelism strategy — not just the scenarios block.\n2. "The original minimaxm2.5-fp8-h100-vllm entry stays byte-identical to origin/main" — directly contradicted by the very hunk above the comment, which changes the image line and rewrites two search-space lists.\n\nStep-by-step proof:\n\n1. Read line 4480 of the post-PR file: image: vllm/vllm-openai:v0.19.1-cu130.\n2. Read line 4504 of the post-PR file: image: vllm/vllm-openai:v0.20.2 (the agentic sibling).\n3. The agentic sibling's image differs from the original's image → claim #1 ("identical to origin/main's minimaxm2.5-fp8-h100-vllm") is no longer the relationship the comment describes; the agentic sibling is no longer a metadata twin of the original.\n4. The PR diff for minimaxm2.5-fp8-h100-vllm shows non-trivial edits (image bump + search-space rewrite at lines 4480, 4492, 4496) → the original entry is no longer byte-identical to origin/main, refuting claim #2.\n\nImpact:\n\nDocumentation-only — no runtime effect, hence nit severity. But a maintainer reading the file post-merge will see two clearly diverged sibling entries with a comment underneath asserting they're metadata twins and that one of them is unchanged from main. That is actively misleading and the kind of thing that wastes future reviewers' time tracing history.\n\nSuggested fix:\n\nReplace the four comment lines with a brief note describing the current relationship (e.g., "-agentic sibling pins v0.20.2 with tp:4/ep:4 for cpu-offload sweeps; the primary entry tracks the latest minimax recipe"), or simply delete the now-obsolete annotation. Either is a one-line change in the same commit.


# Diverged from minimaxm2.5-fp8-h100-vllm (agentic-coding sibling). Metadata is
# identical to origin/main's minimaxm2.5-fp8-h100-vllm; the split exists because this
Expand Down
13 changes: 6 additions & 7 deletions benchmarks/single_node/minimaxm2.5_fp8_h100.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ check_env_vars \
CONC \
ISL \
OSL \
MAX_MODEL_LEN \
RANDOM_RANGE_RATIO \
RESULT_FILENAME

Expand All @@ -28,7 +27,6 @@ PORT=${PORT:-8888}

if [ "${EVAL_ONLY}" = "true" ]; then
setup_eval_context
MAX_MODEL_LEN="$EVAL_MAX_MODEL_LEN"
fi

if [ "$EP_SIZE" -gt 1 ]; then
Expand All @@ -44,12 +42,13 @@ set -x
vllm serve $MODEL --host 0.0.0.0 --port $PORT \
--tensor-parallel-size=$TP \
$EP \
--gpu-memory-utilization 0.90 \
--max-model-len $MAX_MODEL_LEN \
--max-num-seqs 256 \
--no-enable-prefix-caching \
--trust-remote-code \
--compilation-config '{"cudagraph_mode":"PIECEWISE"}' > $SERVER_LOG 2>&1 &
--enable-auto-tool-choice \
--tool-call-parser minimax_m2 \
--reasoning-parser minimax_m2_append_think \
--compilation-config '{"mode":3,"pass_config":{"fuse_minimax_qk_norm":true}}' \
Comment thread
functionstackx marked this conversation as resolved.
--gpu-memory-utilization 0.9 \
> $SERVER_LOG 2>&1 &

SERVER_PID=$!

Expand Down
10 changes: 10 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3113,3 +3113,13 @@
- "1k1k and 8k1k STP low-latency and max-throughput srt-slurm recipes under benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/gb300-fp4/ (ported from upstream srt-slurm PR #152)"
- "Wire glm5/fp4 model + dynamo-sglang framework branches into runners/launch_gb300-nv.sh"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1514

- config-keys:
- minimaxm2.5-fp8-h100-vllm
description:
- "Update minimaxm2.5-fp8-h100-vllm recipe (v0.19.1)"
- "Image: vllm/vllm-openai:v0.21.0 -> v0.19.1-cu130"
- "Replace recipe flags: drop PIECEWISE/0.90 mem util/256 max-num-seqs/no-prefix-caching/explicit max-model-len; add --enable-auto-tool-choice, --tool-call-parser minimax_m2, --reasoning-parser minimax_m2_append_think, --compilation-config mode:3+fuse_minimax_qk_norm"
- "Search-space: tp:8 ep:8 (TEP=8), conc-end 128 chosen at saturation per local sweep"
- "Local bench: TEP=8 peaks at C=128 with 26923 tot tps (+178% vs TEP=4 peak at C=32 in May 6 j11600242 sweep)"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1516
Loading