Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1617,6 +1617,11 @@ dsv4-fp4-mi355x-sglang:
# at runtime by benchmarks/single_node/dsv4_fp8_mi355x_vllm.sh at a
# pinned SHA. Once both PRs merge into a release, switch to a vLLM ROCm
# MI355X image and remove the build step.
#
# Serving flags follow vllm-project/recipes#433: AITER+AITER_LINEAR,
# mp executor, triton_unfused MoE, async scheduling, max-num-seqs=128,
# max-num-batched-tokens=8192, gpu-mem-util=0.6. Sweep matches the
# sister sglang config (conc 4-64) so vLLM↔SGLang are comparable.
dsv4-fp8-mi355x-vllm:
image: rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post
model: deepseek-ai/DeepSeek-V4-Pro
Expand All @@ -1630,11 +1635,11 @@ dsv4-fp8-mi355x-vllm:
- isl: 1024
osl: 1024
search-space:
- { tp: 8, conc-start: 1, conc-end: 1 }
- { tp: 8, conc-start: 4, conc-end: 64 }
- isl: 8192
osl: 1024
search-space:
- { tp: 8, conc-start: 1, conc-end: 1 }
- { tp: 8, conc-start: 4, conc-end: 64 }

# Day-0 single-sequence marker for DeepSeek-V4 on ATOM (ROCm/ATOM#650).
# PR1 of the ATOM DSv4 series still uses torch sparse-attention fallbacks
Expand Down
17 changes: 13 additions & 4 deletions benchmarks/single_node/dsv4_fp8_mi355x_vllm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,13 @@ set -eo pipefail
# Based on vllm-project/vllm#40889 (AITER-accelerated sparse MLA decode,
# stacked on #40871 which adds base DSv4 ROCm support).
#
# Serving flags follow the validated MI355X recipe from
# vllm-project/recipes#433 (DeepSeek-V4-Pro, TP=8): AITER + AITER_LINEAR,
# triton_unfused MoE, mp executor, async scheduling, max-num-seqs=128,
# max-num-batched-tokens=8192, gpu-mem-util=0.6. Tool-call flags from the
# previous revision are dropped — the recipe omits them and throughput
# benchmarks here do not exercise tool calling.
#
# Uses the ATOM MI355X image as the base (ROCm 7.2.2, PyTorch 2.10,
# aiter with MLA decode, MI355X GPU detection). vLLM is rebuilt from
# the PR branch on top. Once both PRs merge into a release, switch to
Expand Down Expand Up @@ -33,6 +40,7 @@ if [ -n "$ROCR_VISIBLE_DEVICES" ]; then
fi

export VLLM_ROCM_USE_AITER=1
export VLLM_ROCM_USE_AITER_LINEAR=1
export VLLM_TARGET_DEVICE=rocm
export VLLM_ENGINE_READY_TIMEOUT_S=3600
export VLLM_PLUGINS=""
Expand Down Expand Up @@ -487,17 +495,18 @@ start_gpu_monitor
set -x
vllm serve $MODEL --port $PORT \
--tensor-parallel-size $TP \
--gpu-memory-utilization 0.90 \
--distributed-executor-backend mp \
--gpu-memory-utilization 0.6 \
--max-model-len $MAX_MODEL_LEN \
--max-num-seqs 128 \
--max-num-batched-tokens 8192 \
--kv-cache-dtype fp8 \
--trust-remote-code \
--enforce-eager \
--async-scheduling \
--moe-backend "triton_unfused" \
--no-enable-prefix-caching \
--max-num-seqs 32 \
--tokenizer-mode deepseek_v4 \
--tool-call-parser deepseek_v4 \
--enable-auto-tool-choice \
--reasoning-parser deepseek_v4 > $SERVER_LOG 2>&1 &

SERVER_PID=$!
Expand Down
11 changes: 11 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2454,3 +2454,14 @@
description:
- "Update SGLang image from v0.5.10.post1-cu130 to v0.5.11-cu130"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1329

- config-keys:
- dsv4-fp8-mi355x-vllm
description:
- "Adopt validated MI355X serving recipe from vllm-project/recipes#433 (DeepSeek-V4-Pro, TP=8)"
- "Add env: VLLM_ROCM_USE_AITER_LINEAR=1 (alongside existing VLLM_ROCM_USE_AITER=1)"
- "Add server flags: --distributed-executor-backend mp, --max-num-batched-tokens 8192, --async-scheduling"
- "Tune: --gpu-memory-utilization 0.90 -> 0.6, --max-num-seqs 32 -> 128"
- "Drop --tool-call-parser deepseek_v4 / --enable-auto-tool-choice (not in recipe; benchmark doesn't exercise tool calling)"
- "Expand search space from conc=1 to conc 4-64 to match dsv4-fp8-mi355x-sglang for vLLM<->SGLang comparability now that max-num-seqs=128 supports it"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1373
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry at line 2455 has an unfilled placeholder pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX instead of pull/1373. Every other entry in the file uses a real PR number, so the link as-committed returns a 404 and breaks the changelog's audit trail. Trivial one-character fix: replace XXXX with 1373.

Extended reasoning...

What the bug is

The new entry appended to perf-changelog.yaml for the dsv4-fp8-mi355x-vllm recipe change ends with an unfilled template placeholder:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX

This is on line 2455 of the committed file. The placeholder XXXX was never substituted with the actual PR number (1373).

How it manifests

Following the link in the changelog yields a GitHub 404 (no PR numbered XXXX exists in the repo). Any tooling that consumes perf-changelog.yaml to build a changelog, navigate from a recipe key back to its motivating PR, or audit which PR introduced a given configuration change will either error out (URL validation) or silently emit a dead link.

Why existing code doesn't prevent it

perf-changelog.yaml is hand-maintained — there is no automated check that pr-link resolves to a real PR. The PR description itself documents the intended link as pull/1373, and every other entry in the file uses a real numeric PR number (e.g. lines 2426, 2432, 2438, 2444 reference /pull/1329, /pull/1343, /pull/1344, /pull/1346). This is clearly an unfinished template — the author left XXXX as a fill-me-in marker and forgot to update it before pushing.

Note on what reviewers see

The PR-diff viewer renders the bottom of the file with the real PR number 1373 filled in (since GitHub knows the PR number when rendering), but git show dacf068 -- perf-changelog.yaml and reading the on-disk file directly both confirm the committed content is literally XXXX. Multiple independent verifiers confirmed this by reading the file and the git object directly.

Step-by-step proof

  1. The PR is Improve dsv4-fp8-mi355x-vllm with vllm-project/recipes#433 MI355X recipe #1373 ("Improve dsv4-fp8-mi355x-vllm with [Do not merge] Add the Deepseek-V4-Pro supported on MI355x vllm-project/recipes#433 MI355X recipe"), and the PR description's link references pull/1373.
  2. The diff for perf-changelog.yaml appends a new entry whose final line is pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.
  3. On the committed branch (git show dacf068:perf-changelog.yaml → line 2455), the value is the literal string XXXX, not a numeric PR id.
  4. Visiting https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX returns a 404 (GitHub rejects non-numeric PR identifiers).
  5. Every other pr-link entry in the file is a real numeric PR (e.g. /pull/1329, /pull/1344, /pull/1346), so this is the only entry that 404s.

Impact

No runtime effect — this is documentation/metadata only. But perf-changelog.yaml is the audit trail tying recipe-key changes back to the PRs that introduced them; a dead link defeats that purpose for this entry.

Fix

Replace XXXX with 1373 on line 2455 of perf-changelog.yaml.

Loading