[AMD] Add Qwen3.5 FP8 MI355X SGLang disaggregated benchmark#1570
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Claude finished @ChangLiu0709's task in 2m 57s —— View job Review: [AMD] Add Qwen3.5 FP8 MI355X SGLang disaggregated benchmark
Findings1 blocking issue found. See inline comment. 🔴 What looks good
|
| if [[ "${KEEP_LOGS:-0}" == "1" ]]; then | ||
| trap '' EXIT | ||
| else | ||
| trap 'sudo rm -rf "$BENCHMARK_LOGS_DIR" 2>/dev/null || true' EXIT | ||
| fi |
There was a problem hiding this comment.
🔴 The new KEEP_LOGS=1 path only disarms the EXIT trap (lines 56-60), but line 178's unconditional sudo rm -rf "$BENCHMARK_LOGS_DIR" 2>/dev/null || true (and the $BENCHMARK_LOGS_DIR/logs rm at line 167) still fires on the happy path. So a successful run with KEEP_LOGS=1 and no GITHUB_ACTIONS set wipes slurm_job-${JOB_ID}.{out,err} anyway — exactly the local-debugging use case the PR description claims to support. Fix: gate the rms at lines 167 and 178 on "${KEEP_LOGS:-0}" != "1" as well.
Extended reasoning...
What the bug is
The PR documents KEEP_LOGS=1 as the local-debugging knob that "skips EXIT trap log cleanup" so users can inspect Slurm logs under benchmark_logs/ after a run. The implementation (runners/launch_mi355x-amds.sh:56-60) replaces the EXIT trap with trap '' EXIT when KEEP_LOGS=1:
if [[ "${KEEP_LOGS:-0}" == "1" ]]; then
trap '' EXIT
else
trap 'sudo rm -rf "$BENCHMARK_LOGS_DIR" 2>/dev/null || true' EXIT
fiThat correctly preserves logs when the script exits early (e.g. before slurm finishes). But the script ALSO has two inline sudo rm -rf calls at the bottom of the IS_MULTINODE=true branch that are unrelated to the trap:
- Line 167:
sudo rm -rf "$BENCHMARK_LOGS_DIR/logs" 2>/dev/null || true - Line 178:
sudo rm -rf "$BENCHMARK_LOGS_DIR" 2>/dev/null || true
Neither is gated on KEEP_LOGS. On the happy path, both fire after the result-collection phase, so KEEP_LOGS=1 does not actually preserve logs after a successful run — only after an early/failed exit. The PR's own test-plan item ("re-run with KEEP_LOGS=1 and confirm Slurm logs are retained under benchmark_logs/") fails.
Step-by-step proof
Walk through the local-debug case: KEEP_LOGS=1, no GITHUB_ACTIONS set, IS_MULTINODE=true, successful run.
- Line 56-60:
KEEP_LOGS=1→trap '' EXITdisarms the EXIT trap. ✓ - Lines 62-86: Slurm job submitted, log file
$BENCHMARK_LOGS_DIR/slurm_job-${JOB_ID}.out(and.err) gets populated as the job runs. ✓ - Lines 88-158: Job completes, result JSONs are copied to
$GITHUB_WORKSPACE— but theslurm_job-*.{out,err}files themselves are still in$BENCHMARK_LOGS_DIR. ✓ - Line 167:
sudo rm -rf "$BENCHMARK_LOGS_DIR/logs"— removes the per-job subdir. Unconditional. - Lines 170-175:
if [[ -n "${GITHUB_ACTIONS:-}" ]]block is skipped because we're running locally — so thecpofslurm_job-*.{out,err}to$ARTIFACT_DIRnever happens. - Line 178:
sudo rm -rf "$BENCHMARK_LOGS_DIR" 2>/dev/null || true— wipes the entire directory, including the slurm out/err files the user wanted to inspect. This is the bug. - Script exits cleanly. EXIT trap is no-op (good — but irrelevant, the damage is already done by step 6).
User looks under benchmark_logs/ — empty. Feature does not work as advertised in its primary use case.
Why existing code doesn't prevent it
The trap disarmament only matters for paths where control exits before reaching line 178 (e.g. the exit 1 at line 81 when the job dies before creating the log file). On any successful run, control always reaches line 178 because there's no early return. The KEEP_LOGS check was added in only one place when it needed to be added in three.
The GH Actions branch at lines 170-175 partially masks the issue in CI: when GITHUB_ACTIONS is set, the slurm out/err files are first copied to $ARTIFACT_DIR ($GITHUB_WORKSPACE/benchmark_artifacts) before the final rm wipes $BENCHMARK_LOGS_DIR. So CI still gets logs uploaded as artifacts — but local users (the entire target audience for KEEP_LOGS=1) get nothing.
Impact
Documented user-facing feature (KEEP_LOGS=1 for local debugging) does not work on the happy path. The PR's own test plan checkbox cannot be satisfied as written. Severity is normal because the feature is new in this PR and ships broken — not a pre-existing issue.
Fix
Gate both inline rms on KEEP_LOGS, mirroring the trap setup:
if [[ "${KEEP_LOGS:-0}" != "1" ]]; then
sudo rm -rf "$BENCHMARK_LOGS_DIR/logs" 2>/dev/null || true
fi…and similarly at line 178. (Or factor into a helper.) The line-167 rm is arguably also safe to skip on KEEP_LOGS=1 since $BENCHMARK_LOGS_DIR/logs is the per-job subdir with the in-container result tree, which is exactly what a local debugger would want to inspect alongside the slurm out/err files.
a562e2a to
60bd801
Compare
Required when adding new amd-master.yaml benchmark configs (PR #1570). Co-authored-by: Cursor <cursoragent@cursor.com>
Required when adding new amd-master.yaml benchmark configs (PR #1570). Co-authored-by: Cursor <cursoragent@cursor.com>
20e2974 to
b5e21e8
Compare
| - { tp: 4, ep: 1, conc-start: 4, conc-end: 256, spec-decoding: mtp } | ||
| - { tp: 8, ep: 1, conc-start: 4, conc-end: 256, spec-decoding: mtp } | ||
|
|
||
| qwen3.5-fp8-mi355x-sglang-disagg: |
There was a problem hiding this comment.
Chang, we will need to rework this PR to keep the git commit history
Required when adding new amd-master.yaml benchmark configs (PR #1570). Co-authored-by: Cursor <cursoragent@cursor.com>
b5e21e8 to
b564db2
Compare
|
Claude finished @functionstackx's task in 2m 15s —— View job Review: [AMD] Add Qwen3.5 FP8 MI355X SGLang disaggregated benchmark
Findings1 blocking issue found. See inline comment. 🔴 What looks good
|
| if [[ "${KEEP_LOGS:-0}" == "1" ]]; then | ||
| trap '' EXIT | ||
| else | ||
| trap cleanup_and_save_logs EXIT | ||
| fi |
There was a problem hiding this comment.
🔴 BLOCKING: KEEP_LOGS=1 only disarms the EXIT trap (line 73-74) but the inline sudo rm -rf "$BENCHMARK_LOGS_DIR/logs" at line 191 still fires unconditionally on the happy path, wiping the per-job result tree that a local debugger would want to inspect.
Why it matters: The feature's stated purpose (retain logs under benchmark_logs/ for local debugging) doesn't work for successful runs — only for early/failed exits. The PR's own test plan item ("re-run with KEEP_LOGS=1 and confirm Slurm logs are retained") cannot be satisfied.
Fix: Gate line 191 on KEEP_LOGS as well:
| if [[ "${KEEP_LOGS:-0}" == "1" ]]; then | |
| trap '' EXIT | |
| else | |
| trap cleanup_and_save_logs EXIT | |
| fi | |
| + if [[ "${KEEP_LOGS:-0}" != "1" ]]; then | |
| + sudo rm -rf "$BENCHMARK_LOGS_DIR/logs" 2>/dev/null || true | |
| + fi |
Required when adding new amd-master.yaml benchmark configs (PR #1570). Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: chunfangamd <chun.fang@amd.com>
b564db2 to
715d44f
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26540487421 |
Required when adding new amd-master.yaml benchmark configs (PR #1570). Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: chunfangamd <chun.fang@amd.com>
715d44f to
fd83f40
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26543322403 |
functionstackx
left a comment
There was a problem hiding this comment.
hi @ChangLiu0709 @chunfangamd @HaiShaw thanks for the SGLang qwen3.5 mi355 disagg PR but unfortunately it doesn't work yet, can u look into fixing this?
https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26543322403/job/78198590470?pr=1570
https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26543322403/job/78198590469?pr=1570
026-05-28 00:11:09.816 DP0 TP0 EP0] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3985, in run_scheduler_process
scheduler = Scheduler(
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 434, in __init__
self.init_model_worker()
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 705, in init_model_worker
self.init_tp_model_worker()
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 660, in init_tp_model_worker
self.tp_worker = TpModelWorker(**worker_kwargs)
File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 262, in __init__
self._init_model_runner()
File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 347, in _init_model_runner
self._model_runner = ModelRunner(
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 532, in __init__
self.initialize(pre_model_load_memory)
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 652, in initialize
self.load_model()
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 1445, in load_model
self.model = self.loader.load_model(
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 716, in load_model
model = _initialize_model(
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 290, in _initialize_model
return model_class(**kwargs)
File "/sgl-workspace/sglang/python/sglang/srt/models/qwen3_5.py", line 1608, in __init__
super().__init__(config, quant_config, prefix, language_model_cls)
File "/sgl-workspace/sglang/python/sglang/srt/models/qwen3_vl.py", line 1124, in __init__
self.model = language_model_cls(
File "/sgl-workspace/sglang/python/sglang/srt/models/qwen3_5.py", line 1237, in __init__
super().__init__(config=config, quant_config=quant_config, prefix=prefix)
File "/sgl-workspace/sglang/python/sglang/srt/models/qwen3_5.py", line 1046, in __init__
self.layers, self._start_layer, self._end_layer = make_layers(
File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 675, in make_layers
+ get_offloader().wrap_modules(
File "/sgl-workspace/sglang/python/sglang/srt/utils/offloader.py", line 36, in wrap_modules
return list(all_modules_generator)
File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 677, in <genexpr>
layer_fn(idx=idx, prefix=add_prefix(idx, prefix))
File "/sgl-workspace/sglang/python/sglang/srt/models/qwen3_5.py", line 1037, in get_layer
return layer_class(
File "/sgl-workspace/sglang/python/sglang/srt/models/qwen3_5.py", line 554, in __init__
self.mlp = Qwen2MoeSparseMoeBlock(
File "/sgl-workspace/sglang/python/sglang/srt/models/qwen2_moe.py", line 253, in __init__
self.experts = get_moe_impl_class(quant_config)(
File "/sgl-workspace/sglang/python/sglang/srt/layers/moe/ep_moe/layer.py", line 613, in __init__
super().__init__(
File "/sgl-workspace/sglang/python/sglang/srt/layers/moe/ep_moe/layer.py", line 93, in __init__
super().__init__(
File "/sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line 212, in __init__
assert (num_experts - num_shared_slots) % self.moe_ep_size == 0
AssertionError
Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 1279, in _wait_for_scheduler_ready
data = scheduler_pipe_readers[i].recv()
File "/usr/lib/python3.10/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/usr/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.10/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 69, in <module>
run_server(server_args)
File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 50, in run_server
launch_server(server_args)
File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/http_server.py", line 2346, in launch_server
) = Engine._launch_subprocesses(
File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 800, in _launch_subprocesses
scheduler_init_result.wait_for_ready()
File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 651, in wait_for_ready
infos = _wait_for_scheduler_ready(scheduler_pipe_readers, scheduler_procs)
File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 1281, in _wait_for_scheduler_ready
raise _scheduler_died_error(i, scheduler_procs[i])
RuntimeError: Rank 0 scheduler died during initialization (exit code: -3). If exit code is -9 (SIGKILL), a common cause is the OS OOM killer. Run `dmesg -T | grep -i oom` to check.|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26543322403 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26548042416 |
Introduce CI config, model server flags, multinode launch script, and amd_utils plumbing (sudo auto-detect, optional disagg decode TP/DP flags) for qwen3.5-fp8-mi355x-sglang-disagg smoke sweeps on MI355X. Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>
Required when adding new amd-master.yaml benchmark configs (PR #1570). Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>
The search-space uses 1P+1D TP8/EP1 with dp-attn, not TP2/EP2 from the aggregated qwen3.5-fp8-mi355x-sglang recipe. Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>
With --enable-dp-attention + --moe-a2a-backend mori, sglang auto-promotes moe_ep_size=tp_size=8, but is_deepep_class_backend() excludes MoRI, so num_shared_slots stays at the global value (1) and the (num_experts - num_shared_slots) % moe_ep_size assertion in fused_moe_triton/layer.py fires for Qwen3.5 (512 routed + 1 shared). Track upstream sglang for a fix; flip back to dp-attn=true once MoRI is added to is_deepep_class_backend() or shared-slot accounting is reconciled. Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>
d3fa9ff to
99fc3fb
Compare
Matches sister sglang-disagg scripts (dsr1_fp8, dsr1_fp4) and the GLM-5 disagg launch script. submit.sh requires FRAMEWORK; surfacing the missing-var failure at the top of the launch script gives a cleaner error than letting it fail deep inside submit.sh. Addresses Cursor Bugbot review comment on PR #1570. Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26550028638 |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 224d444. Configure here.
| - "1P+1D TP8/EP1 smoke sweep for 1k1k and 8k1k (conc 8-512); MoRI transfer backend" | ||
| - "Add models.yaml server flags and multinode launch script qwen3.5_fp8_mi355x_sglang-disagg.sh" | ||
| - "8k1k row uses dp-attn=false (matches 1k1k): with --enable-dp-attention + --moe-a2a-backend mori, sglang auto-promotes moe_ep_size=tp_size=8, but is_deepep_class_backend() excludes MoRI, so num_shared_slots stays at the global value (1) and the (num_experts - num_shared_slots) % moe_ep_size assertion in fused_moe_triton/layer.py fires for Qwen3.5 (512 routed + 1 shared). Track upstream sglang; flip back to dp-attn=true once MoRI is added to is_deepep_class_backend() or shared-slot accounting is reconciled." | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1570 |
There was a problem hiding this comment.
Changelog entry indentation inconsistent with existing entries
Medium Severity
The new perf-changelog.yaml entry starts at 0-space indentation (- config-keys:) while all surrounding entries use 2-space indentation ( - config-keys:). In YAML, list items at different indentation levels belong to different structural contexts. This means process_changelog.py may not parse the new entry as part of the same changelog list, potentially causing the CI sweep to skip the qwen3.5-fp8-mi355x-sglang-disagg config entirely.
Reviewed by Cursor Bugbot for commit 224d444. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26550085478 |
|
/reuse-sweep-run |


Summary
qwen3.5-fp8-mi355x-sglang-disagginamd-master.yaml: 1P+1D TP8/EP1 smoke sweep for ISL 1K1K and 8K1K, concurrency 8–512, imagelmsysorg/sglang-rocm:v0.5.11-rocm700-mi35x-20260511Qwen3.5-397B-A17B-FP8entry inbenchmarks/multi_node/amd_utils/models.yaml(MoRI transfer, FP8 KV cache, aiter attention, DP / no-DP / EP-only prefill and decode profiles)benchmarks/multi_node/qwen3.5_fp8_mi355x_sglang-disagg.sh(maps CI sweep params → sharedamd_utils/submit.sh; optionalNODE_LISTfor local smoke on fixed nodes)runners/launch_mi355x-amds.sh:KEEP_LOGS=1skips EXIT trap log cleanup for local debuggingjob.slurm/submit.sh/server.sh(uses existingmaindisagg plumbing)Test plan
amd-master.yamlloads and validates the newqwen3.5-fp8-mi355x-sglang-disaggentry (scenarios.fixed-seq-lenschema)amd_utilsscripts)qwen3.5-fp8-mi355x-sglang-disagg1P+1D, ISL/OSL 1024/1024, conc=64 (target: requests complete, results JSON produced)KEEP_LOGS=1and confirm Slurm logs are retained underbenchmark_logs/Out of scope (follow-up PRs)
utils/bench_serving/benchmark_serving.pyQwen3.5 tokenizer fallbacksetup_deps.shimage patches, v0.5.12 + MoRIconn.pyoverlayenable-dp-attention(is_deepep_class_backend/ shared-expert slot accounting)Co-authors
@chunfangamd
@ChangLiu0709
Note
Risk
Low: only adds new config, model block, and launch script; reuses unmodified
amd_utilsfrommain.Overview
First InferenceX recipe for Qwen3.5-397B-A17B-FP8 PD disaggregation on MI355X using the existing multinode SGLang + MoRI stack. Scope is intentionally narrow (PR-1): CI config, model flags, launch script, and
KEEP_LOGSrunner tweak—no sharedamd_utilsedits, no in-container patches, nobenchmark_servingchanges.Note
Low Risk
Additive benchmark config, models.yaml block, and launch script only; reuses existing multinode disagg plumbing with no shared amd_utils core changes.
Overview
Adds Qwen/Qwen3.5-397B-A17B-FP8 prefill–decode disaggregation on MI355X via a new CI key
qwen3.5-fp8-mi355x-sglang-disagg(sglang-disagg,mi355x-disagg, imagelmsysorg/sglang-rocm:v0.5.11-rocm700-mi35x-20260511). Sweeps 1P+1D, TP8/EP1, no MTP, conc 8–512 for 1k/1k and 8k/1k; both rows setdp-attn: falsebecause MoRI + DP-attention currently trips a Qwen3.5 MoE expert-count assertion in upstream SGLang.Adds
Qwen3.5-397B-A17B-FP8inbenchmarks/multi_node/amd_utils/models.yaml(MoRI transfer, FP8 KV, aiter attention, prefill/decode tuning profiles) andbenchmarks/multi_node/qwen3.5_fp8_mi355x_sglang-disagg.shto map matrix env vars into existingamd_utils/submit.sh. Documents the change inperf-changelog.yaml.runners/launch_mi355x-amds.sh:KEEP_LOGS=1skips the EXIT trap that deletes Slurm logs (local debug only). Sharedsubmit.sh/job.slurmare unchanged.Reviewed by Cursor Bugbot for commit 224d444. Bugbot is set up for automated code reviews on this repo. Configure here.