[AMD] Add Qwen3.5 FP8 MI355X SGLang disaggregated benchmark by ChangLiu0709 · Pull Request #1570 · SemiAnalysisAI/InferenceX

ChangLiu0709 · 2026-05-27T13:43:19Z

Summary

Add SGLang disaggregated prefill–decode (PD) benchmark recipe for Qwen/Qwen3.5-397B-A17B-FP8 on MI355X (MoRI backend), mirroring the existing DeepSeek-R1 disagg structure with model-specific tuning (no MTP, no DeepEP dispatch)
New CI config qwen3.5-fp8-mi355x-sglang-disagg in amd-master.yaml: 1P+1D TP8/EP1 smoke sweep for ISL 1K1K and 8K1K, concurrency 8–512, image lmsysorg/sglang-rocm:v0.5.11-rocm700-mi35x-20260511
New Qwen3.5-397B-A17B-FP8 entry in benchmarks/multi_node/amd_utils/models.yaml (MoRI transfer, FP8 KV cache, aiter attention, DP / no-DP / EP-only prefill and decode profiles)
New multinode entry script benchmarks/multi_node/qwen3.5_fp8_mi355x_sglang-disagg.sh (maps CI sweep params → shared amd_utils/submit.sh; optional NODE_LIST for local smoke on fixed nodes)
runners/launch_mi355x-amds.sh: KEEP_LOGS=1 skips EXIT trap log cleanup for local debugging
No changes to shared job.slurm / submit.sh / server.sh (uses existing main disagg plumbing)

Test plan

Verify CI passes: amd-master.yaml loads and validates the new qwen3.5-fp8-mi355x-sglang-disagg entry (scenarios.fixed-seq-len schema)
Confirm existing SGLang disagg benchmarks are unaffected (this PR does not modify shared amd_utils scripts)
Smoke on mia1: qwen3.5-fp8-mi355x-sglang-disagg 1P+1D, ISL/OSL 1024/1024, conc=64 (target: requests complete, results JSON produced)
Optional local: re-run with KEEP_LOGS=1 and confirm Slurm logs are retained under benchmark_logs/

Out of scope (follow-up PRs)

utils/bench_serving/benchmark_serving.py Qwen3.5 tokenizer fallback
GLM-5 disagg config, setup_deps.sh image patches, v0.5.12 + MoRI conn.py overlay
Upstream sglang fix for Qwen3.5 + MoRI + enable-dp-attention (is_deepep_class_backend / shared-expert slot accounting)

Co-authors

@chunfangamd
@ChangLiu0709

Note

Risk
Low: only adds new config, model block, and launch script; reuses unmodified amd_utils from main.

Overview
First InferenceX recipe for Qwen3.5-397B-A17B-FP8 PD disaggregation on MI355X using the existing multinode SGLang + MoRI stack. Scope is intentionally narrow (PR-1): CI config, model flags, launch script, and KEEP_LOGS runner tweak—no shared amd_utils edits, no in-container patches, no benchmark_serving changes.

Note

Low Risk
Additive benchmark config, models.yaml block, and launch script only; reuses existing multinode disagg plumbing with no shared amd_utils core changes.

Overview
Adds Qwen/Qwen3.5-397B-A17B-FP8 prefill–decode disaggregation on MI355X via a new CI key qwen3.5-fp8-mi355x-sglang-disagg (sglang-disagg, mi355x-disagg, image lmsysorg/sglang-rocm:v0.5.11-rocm700-mi35x-20260511). Sweeps 1P+1D, TP8/EP1, no MTP, conc 8–512 for 1k/1k and 8k/1k; both rows set dp-attn: false because MoRI + DP-attention currently trips a Qwen3.5 MoE expert-count assertion in upstream SGLang.

Adds Qwen3.5-397B-A17B-FP8 in benchmarks/multi_node/amd_utils/models.yaml (MoRI transfer, FP8 KV, aiter attention, prefill/decode tuning profiles) and benchmarks/multi_node/qwen3.5_fp8_mi355x_sglang-disagg.sh to map matrix env vars into existing amd_utils/submit.sh. Documents the change in perf-changelog.yaml.

runners/launch_mi355x-amds.sh: KEEP_LOGS=1 skips the EXIT trap that deletes Slurm logs (local debug only). Shared submit.sh / job.slurm are unchanged.

^{Reviewed by Cursor Bugbot for commit 224d444. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-05-27T13:43:41Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-05-27T14:07:20Z

Claude finished @ChangLiu0709's task in 2m 57s —— View job

Review: [AMD] Add Qwen3.5 FP8 MI355X SGLang disaggregated benchmark

Read CLAUDE.md and repository guidelines
Read all changed files
Validate master config & perf-changelog
Validate model prefix in config keys
Validate container image accessibility
Validate benchmark script code style
Check shared amd_utils changes for correctness
Post review findings

Findings

1 blocking issue found. See inline comment.

🔴 perf-changelog.yaml not updated — .github/configs/amd-master.yaml was modified (new qwen3.5-fp8-mi355x-sglang-disagg entry added) but perf-changelog.yaml has no corresponding entry. A new entry must be appended to the end of perf-changelog.yaml.

What looks good

New config (amd-master.yaml): follows the existing disagg schema (scenarios.fixed-seq-len), valid public Docker Hub image (lmsysorg/sglang-rocm:v0.5.11-rocm700-mi35x-20260511), model-prefix: qwen3.5 is consistent with existing Qwen entries.
New benchmark script (qwen3.5_fp8_mi355x_sglang-disagg.sh): mirrors the existing dsr1_fp8_mi355x_sglang-disagg.sh structure exactly, with the addition of ${NODE_LIST:-} for local smoke testing. Clean and minimal.
Sudo auto-detection (job.slurm): well-implemented dual-path — detects passwordless sudo on both the submit node ($SUDO) and compute nodes ($_SUDO), falls back gracefully. Backward-compatible since CI runners have passwordless sudo.
Opt-in decode parallelism flags (server.sh): ENABLE_DISAGG_DECODE_PARALLELISM_FLAGS defaults to false, so older ROCm images aren't affected. The is_truthy() helper is thorough.
Model config (models.yaml): Qwen3.5-397B-A17B-FP8 entry with mtp_flags: "" correctly disables MTP, --moe-dense-tp-size 1 and --disaggregation-transfer-backend mori are Qwen-specific tuning. DP/no-DP/EP-only profiles are well-structured.
KEEP_LOGS=1 (launch_mi355x-amds.sh): simple, clean opt-in for retaining Slurm logs during local debugging.

claude · 2026-05-27T14:16:02Z

+    if [[ "${KEEP_LOGS:-0}" == "1" ]]; then
+        trap '' EXIT
+    else
+        trap 'sudo rm -rf "$BENCHMARK_LOGS_DIR" 2>/dev/null || true' EXIT
+    fi


🔴 The new KEEP_LOGS=1 path only disarms the EXIT trap (lines 56-60), but line 178's unconditional sudo rm -rf "$BENCHMARK_LOGS_DIR" 2>/dev/null || true (and the $BENCHMARK_LOGS_DIR/logs rm at line 167) still fires on the happy path. So a successful run with KEEP_LOGS=1 and no GITHUB_ACTIONS set wipes slurm_job-${JOB_ID}.{out,err} anyway — exactly the local-debugging use case the PR description claims to support. Fix: gate the rms at lines 167 and 178 on "${KEEP_LOGS:-0}" != "1" as well.

Extended reasoning...

What the bug is

The PR documents KEEP_LOGS=1 as the local-debugging knob that "skips EXIT trap log cleanup" so users can inspect Slurm logs under benchmark_logs/ after a run. The implementation (runners/launch_mi355x-amds.sh:56-60) replaces the EXIT trap with trap '' EXIT when KEEP_LOGS=1:

if [[ "${KEEP_LOGS:-0}" == "1" ]]; then trap '' EXIT else trap 'sudo rm -rf "$BENCHMARK_LOGS_DIR" 2>/dev/null || true' EXIT fi

That correctly preserves logs when the script exits early (e.g. before slurm finishes). But the script ALSO has two inline sudo rm -rf calls at the bottom of the IS_MULTINODE=true branch that are unrelated to the trap:

Line 167: sudo rm -rf "$BENCHMARK_LOGS_DIR/logs" 2>/dev/null || true

Line 178: sudo rm -rf "$BENCHMARK_LOGS_DIR" 2>/dev/null || true

Neither is gated on KEEP_LOGS. On the happy path, both fire after the result-collection phase, so KEEP_LOGS=1 does not actually preserve logs after a successful run — only after an early/failed exit. The PR's own test-plan item ("re-run with KEEP_LOGS=1 and confirm Slurm logs are retained under benchmark_logs/") fails.

Step-by-step proof

Walk through the local-debug case: KEEP_LOGS=1, no GITHUB_ACTIONS set, IS_MULTINODE=true, successful run.

Line 56-60: KEEP_LOGS=1 → trap '' EXIT disarms the EXIT trap. ✓

Lines 62-86: Slurm job submitted, log file $BENCHMARK_LOGS_DIR/slurm_job-${JOB_ID}.out (and .err) gets populated as the job runs. ✓

Lines 88-158: Job completes, result JSONs are copied to $GITHUB_WORKSPACE — but the slurm_job-*.{out,err} files themselves are still in $BENCHMARK_LOGS_DIR. ✓

Line 167: sudo rm -rf "$BENCHMARK_LOGS_DIR/logs" — removes the per-job subdir. Unconditional.

Lines 170-175: if [[ -n "${GITHUB_ACTIONS:-}" ]] block is skipped because we're running locally — so the cp of slurm_job-*.{out,err} to $ARTIFACT_DIR never happens.

Line 178: sudo rm -rf "$BENCHMARK_LOGS_DIR" 2>/dev/null || true — wipes the entire directory, including the slurm out/err files the user wanted to inspect. This is the bug.

Script exits cleanly. EXIT trap is no-op (good — but irrelevant, the damage is already done by step 6).

User looks under benchmark_logs/ — empty. Feature does not work as advertised in its primary use case.

Why existing code doesn't prevent it

The trap disarmament only matters for paths where control exits before reaching line 178 (e.g. the exit 1 at line 81 when the job dies before creating the log file). On any successful run, control always reaches line 178 because there's no early return. The KEEP_LOGS check was added in only one place when it needed to be added in three.

The GH Actions branch at lines 170-175 partially masks the issue in CI: when GITHUB_ACTIONS is set, the slurm out/err files are first copied to $ARTIFACT_DIR ($GITHUB_WORKSPACE/benchmark_artifacts) before the final rm wipes $BENCHMARK_LOGS_DIR. So CI still gets logs uploaded as artifacts — but local users (the entire target audience for KEEP_LOGS=1) get nothing.

Impact

Documented user-facing feature (KEEP_LOGS=1 for local debugging) does not work on the happy path. The PR's own test plan checkbox cannot be satisfied as written. Severity is normal because the feature is new in this PR and ships broken — not a pre-existing issue.

Fix

Gate both inline rms on KEEP_LOGS, mirroring the trap setup:

if [[ "${KEEP_LOGS:-0}" != "1" ]]; then sudo rm -rf "$BENCHMARK_LOGS_DIR/logs" 2>/dev/null || true fi

…and similarly at line 178. (Or factor into a helper.) The line-167 rm is arguably also safe to skip on KEEP_LOGS=1 since $BENCHMARK_LOGS_DIR/logs is the per-job subdir with the in-container result tree, which is exactly what a local debugger would want to inspect alongside the slurm out/err files.

Required when adding new amd-master.yaml benchmark configs (PR #1570). Co-authored-by: Cursor <cursoragent@cursor.com>

chunfangamd · 2026-05-27T16:43:14Z

      - { tp: 4, ep: 1, conc-start: 4, conc-end: 256, spec-decoding: mtp }
      - { tp: 8, ep: 1, conc-start: 4, conc-end: 256, spec-decoding: mtp }

+qwen3.5-fp8-mi355x-sglang-disagg:


Chang, we will need to rework this PR to keep the git commit history

Required when adding new amd-master.yaml benchmark configs (PR #1570). Co-authored-by: Cursor <cursoragent@cursor.com>

claude · 2026-05-27T21:43:14Z

Claude finished @functionstackx's task in 2m 15s —— View job

Review: [AMD] Add Qwen3.5 FP8 MI355X SGLang disaggregated benchmark

Read CLAUDE.md and repository guidelines
Read all changed files and validate against review checklist
Validate master config & perf-changelog
Validate model prefix in config keys
Validate container image accessibility
Validate benchmark script code style
Check KEEP_LOGS implementation correctness
Post review findings

Findings

1 blocking issue found. See inline comment.

🔴 KEEP_LOGS=1 doesn't guard inline cleanup — The EXIT trap is correctly gated (runners/launch_mi355x-amds.sh:73-74), but the inline sudo rm -rf "$BENCHMARK_LOGS_DIR/logs" at line 191 still fires unconditionally on successful runs. This wipes the per-job result tree even with KEEP_LOGS=1, making the feature ineffective for its primary use case (local debugging of successful runs). Gate line 191 with if [[ "${KEEP_LOGS:-0}" != "1" ]] — see inline comment for the suggested fix.

What looks good

Master config (amd-master.yaml): Valid schema, public Docker Hub image (lmsysorg/sglang-rocm:v0.5.11-rocm700-mi35x-20260511), model-prefix: qwen3.5 consistent with existing entries.
perf-changelog.yaml: Entry correctly appended at the end of the file with proper config-key and PR link.
Benchmark script (qwen3.5_fp8_mi355x_sglang-disagg.sh): Mirrors the existing dsr1_fp8_mi355x_sglang-disagg.sh structure cleanly, with the useful ${NODE_LIST:-} addition for local smoke testing.
Model config (models.yaml): Qwen3.5-397B-A17B-FP8 with mtp_flags: "" correctly disables MTP; MoRI transfer, FP8 KV, aiter attention flags look correct. DP/no-DP/EP-only profiles are well-structured.
8k1k comment was fixed in b564db2 — now accurately says TP8/EP1 smoke sweep with dp-attn.

claude · 2026-05-27T21:45:21Z

+    if [[ "${KEEP_LOGS:-0}" == "1" ]]; then
+        trap '' EXIT
+    else
+        trap cleanup_and_save_logs EXIT
+    fi


🔴 BLOCKING: KEEP_LOGS=1 only disarms the EXIT trap (line 73-74) but the inline sudo rm -rf "$BENCHMARK_LOGS_DIR/logs" at line 191 still fires unconditionally on the happy path, wiping the per-job result tree that a local debugger would want to inspect.

Why it matters: The feature's stated purpose (retain logs under benchmark_logs/ for local debugging) doesn't work for successful runs — only for early/failed exits. The PR's own test plan item ("re-run with KEEP_LOGS=1 and confirm Slurm logs are retained") cannot be satisfied.

Fix: Gate line 191 on KEEP_LOGS as well:

Suggested change

if [[ "${KEEP_LOGS:-0}" == "1" ]]; then

trap '' EXIT

else

trap cleanup_and_save_logs EXIT

fi

+ if [[ "${KEEP_LOGS:-0}" != "1" ]]; then

+ sudo rm -rf "$BENCHMARK_LOGS_DIR/logs" 2>/dev/null || true

+ fi

Required when adding new amd-master.yaml benchmark configs (PR #1570). Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: chunfangamd <chun.fang@amd.com>

github-actions · 2026-05-27T22:47:31Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26540487421
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26540487421

Required when adding new amd-master.yaml benchmark configs (PR #1570). Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: chunfangamd <chun.fang@amd.com>

github-actions · 2026-05-27T22:53:18Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26543322403
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26543322403

functionstackx

hi @ChangLiu0709 @chunfangamd @HaiShaw thanks for the SGLang qwen3.5 mi355 disagg PR but unfortunately it doesn't work yet, can u look into fixing this?

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26543322403/job/78198590470?pr=1570

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26543322403/job/78198590469?pr=1570

026-05-28 00:11:09.816 DP0 TP0 EP0] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3985, in run_scheduler_process
    scheduler = Scheduler(
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 434, in __init__
    self.init_model_worker()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 705, in init_model_worker
    self.init_tp_model_worker()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 660, in init_tp_model_worker
    self.tp_worker = TpModelWorker(**worker_kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 262, in __init__
    self._init_model_runner()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 347, in _init_model_runner
    self._model_runner = ModelRunner(
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 532, in __init__
    self.initialize(pre_model_load_memory)
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 652, in initialize
    self.load_model()
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 1445, in load_model
    self.model = self.loader.load_model(
  File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 716, in load_model
    model = _initialize_model(
  File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 290, in _initialize_model
    return model_class(**kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/models/qwen3_5.py", line 1608, in __init__
    super().__init__(config, quant_config, prefix, language_model_cls)
  File "/sgl-workspace/sglang/python/sglang/srt/models/qwen3_vl.py", line 1124, in __init__
    self.model = language_model_cls(
  File "/sgl-workspace/sglang/python/sglang/srt/models/qwen3_5.py", line 1237, in __init__
    super().__init__(config=config, quant_config=quant_config, prefix=prefix)
  File "/sgl-workspace/sglang/python/sglang/srt/models/qwen3_5.py", line 1046, in __init__
    self.layers, self._start_layer, self._end_layer = make_layers(
  File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 675, in make_layers
    + get_offloader().wrap_modules(
  File "/sgl-workspace/sglang/python/sglang/srt/utils/offloader.py", line 36, in wrap_modules
    return list(all_modules_generator)
  File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 677, in <genexpr>
    layer_fn(idx=idx, prefix=add_prefix(idx, prefix))
  File "/sgl-workspace/sglang/python/sglang/srt/models/qwen3_5.py", line 1037, in get_layer
    return layer_class(
  File "/sgl-workspace/sglang/python/sglang/srt/models/qwen3_5.py", line 554, in __init__
    self.mlp = Qwen2MoeSparseMoeBlock(
  File "/sgl-workspace/sglang/python/sglang/srt/models/qwen2_moe.py", line 253, in __init__
    self.experts = get_moe_impl_class(quant_config)(
  File "/sgl-workspace/sglang/python/sglang/srt/layers/moe/ep_moe/layer.py", line 613, in __init__
    super().__init__(
  File "/sgl-workspace/sglang/python/sglang/srt/layers/moe/ep_moe/layer.py", line 93, in __init__
    super().__init__(
  File "/sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line 212, in __init__
    assert (num_experts - num_shared_slots) % self.moe_ep_size == 0
AssertionError
Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 1279, in _wait_for_scheduler_ready
    data = scheduler_pipe_readers[i].recv()
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 69, in <module>
    run_server(server_args)
  File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 50, in run_server
    launch_server(server_args)
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/http_server.py", line 2346, in launch_server
    ) = Engine._launch_subprocesses(
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 800, in _launch_subprocesses
    scheduler_init_result.wait_for_ready()
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 651, in wait_for_ready
    infos = _wait_for_scheduler_ready(scheduler_pipe_readers, scheduler_procs)
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 1281, in _wait_for_scheduler_ready
    raise _scheduler_died_error(i, scheduler_procs[i])
RuntimeError: Rank 0 scheduler died during initialization (exit code: -3). If exit code is -9 (SIGKILL), a common cause is the OS OOM killer. Run `dmesg -T | grep -i oom` to check.

github-actions · 2026-05-28T00:53:07Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26543322403
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26543322403

github-actions · 2026-05-28T01:56:32Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26548042416
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26548042416

Introduce CI config, model server flags, multinode launch script, and amd_utils plumbing (sudo auto-detect, optional disagg decode TP/DP flags) for qwen3.5-fp8-mi355x-sglang-disagg smoke sweeps on MI355X. Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>

Required when adding new amd-master.yaml benchmark configs (PR #1570). Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>

The search-space uses 1P+1D TP8/EP1 with dp-attn, not TP2/EP2 from the aggregated qwen3.5-fp8-mi355x-sglang recipe. Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>

With --enable-dp-attention + --moe-a2a-backend mori, sglang auto-promotes moe_ep_size=tp_size=8, but is_deepep_class_backend() excludes MoRI, so num_shared_slots stays at the global value (1) and the (num_experts - num_shared_slots) % moe_ep_size assertion in fused_moe_triton/layer.py fires for Qwen3.5 (512 routed + 1 shared). Track upstream sglang for a fix; flip back to dp-attn=true once MoRI is added to is_deepep_class_backend() or shared-slot accounting is reconciled. Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>

Matches sister sglang-disagg scripts (dsr1_fp8, dsr1_fp4) and the GLM-5 disagg launch script. submit.sh requires FRAMEWORK; surfacing the missing-var failure at the top of the launch script gives a cleaner error than letting it fail deep inside submit.sh. Addresses Cursor Bugbot review comment on PR #1570. Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>

github-actions · 2026-05-28T02:01:41Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26550028638
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26550028638

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 224d444. Configure here.}

cursor · 2026-05-28T02:07:35Z

+    - "1P+1D TP8/EP1 smoke sweep for 1k1k and 8k1k (conc 8-512); MoRI transfer backend"
+    - "Add models.yaml server flags and multinode launch script qwen3.5_fp8_mi355x_sglang-disagg.sh"
+    - "8k1k row uses dp-attn=false (matches 1k1k): with --enable-dp-attention + --moe-a2a-backend mori, sglang auto-promotes moe_ep_size=tp_size=8, but is_deepep_class_backend() excludes MoRI, so num_shared_slots stays at the global value (1) and the (num_experts - num_shared_slots) % moe_ep_size assertion in fused_moe_triton/layer.py fires for Qwen3.5 (512 routed + 1 shared). Track upstream sglang; flip back to dp-attn=true once MoRI is added to is_deepep_class_backend() or shared-slot accounting is reconciled."
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1570


Changelog entry indentation inconsistent with existing entries

Medium Severity

The new perf-changelog.yaml entry starts at 0-space indentation (- config-keys:) while all surrounding entries use 2-space indentation ( - config-keys:). In YAML, list items at different indentation levels belong to different structural contexts. This means process_changelog.py may not parse the new entry as part of the same changelog list, potentially causing the CI sweep to skip the qwen3.5-fp8-mi355x-sglang-disagg config entirely.

^{Reviewed by Cursor Bugbot for commit 224d444. Configure here.}

github-actions · 2026-05-28T04:42:12Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26550085478
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26550085478

functionstackx · 2026-05-28T04:46:27Z

/reuse-sweep-run

github-project-automation Bot added this to InferenceMAX Board May 27, 2026

ChangLiu0709 marked this pull request as ready for review May 27, 2026 14:06

ChangLiu0709 requested a review from a team May 27, 2026 14:06

ChangLiu0709 requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners May 27, 2026 14:06

claude Bot reviewed May 27, 2026

View reviewed changes

Comment thread .github/configs/amd-master.yaml

claude Bot reviewed May 27, 2026

View reviewed changes

ChangLiu0709 force-pushed the chang/qwen35-fp8-mi355x-sglang-disagg-pr-1 branch from a562e2a to 60bd801 Compare May 27, 2026 14:35

ChangLiu0709 pushed a commit that referenced this pull request May 27, 2026

Add perf-changelog entry for qwen3.5-fp8-mi355x-sglang-disagg.

ecaf39f

Required when adding new amd-master.yaml benchmark configs (PR #1570). Co-authored-by: Cursor <cursoragent@cursor.com>

ChangLiu0709 mentioned this pull request May 27, 2026

[MoRI short term temp patch] GLM-5 FP8 MI355X SGLang disaggregated #1572

Merged

4 tasks

cursor Bot reviewed May 27, 2026

View reviewed changes

Comment thread runners/launch_mi355x-amds.sh

Comment thread benchmarks/multi_node/qwen3.5_fp8_mi355x_sglang-disagg.sh

ChangLiu0709 mentioned this pull request May 27, 2026

bench_serving: fallback when vLLM get_tokenizer hits removed API #1573

Merged

3 tasks

functionstackx pushed a commit that referenced this pull request May 27, 2026

Add perf-changelog entry for qwen3.5-fp8-mi355x-sglang-disagg.

a614fe6

Required when adding new amd-master.yaml benchmark configs (PR #1570). Co-authored-by: Cursor <cursoragent@cursor.com>

functionstackx force-pushed the chang/qwen35-fp8-mi355x-sglang-disagg-pr-1 branch from 20e2974 to b5e21e8 Compare May 27, 2026 16:08

chunfangamd requested changes May 27, 2026

View reviewed changes

chunfangamd marked this pull request as draft May 27, 2026 16:45

functionstackx pushed a commit that referenced this pull request May 27, 2026

Add perf-changelog entry for qwen3.5-fp8-mi355x-sglang-disagg.

a6fa7dc

Required when adding new amd-master.yaml benchmark configs (PR #1570). Co-authored-by: Cursor <cursoragent@cursor.com>

functionstackx force-pushed the chang/qwen35-fp8-mi355x-sglang-disagg-pr-1 branch from b5e21e8 to b564db2 Compare May 27, 2026 21:37

functionstackx marked this pull request as ready for review May 27, 2026 21:42

functionstackx added the sweep-enabled label May 27, 2026

functionstackx changed the title ~~[AMD] Add Qwen3.5 FP8 MI355X SGLang disaggregated benchmark~~ [AMD][WIP- small changes suggested by chunfang] Add Qwen3.5 FP8 MI355X SGLang disaggregated benchmark May 27, 2026

claude Bot reviewed May 27, 2026

View reviewed changes

functionstackx changed the title ~~[AMD][WIP- small changes suggested by chunfang] Add Qwen3.5 FP8 MI355X SGLang disaggregated benchmark~~ [AMD] Add Qwen3.5 FP8 MI355X SGLang disaggregated benchmark May 27, 2026

functionstackx removed the sweep-enabled label May 27, 2026

functionstackx force-pushed the chang/qwen35-fp8-mi355x-sglang-disagg-pr-1 branch from b564db2 to 715d44f Compare May 27, 2026 22:47

functionstackx force-pushed the chang/qwen35-fp8-mi355x-sglang-disagg-pr-1 branch from 715d44f to fd83f40 Compare May 27, 2026 22:48

functionstackx added the sweep-enabled label May 27, 2026

cursor Bot reviewed May 27, 2026

View reviewed changes

Comment thread benchmarks/multi_node/qwen3.5_fp8_mi355x_sglang-disagg.sh Outdated

functionstackx requested changes May 28, 2026

View reviewed changes

functionstackx removed the sweep-enabled label May 28, 2026

cliu1004@amd.com and others added 4 commits May 27, 2026 21:58

Add perf-changelog entry for qwen3.5-fp8-mi355x-sglang-disagg.

a5944df

Required when adding new amd-master.yaml benchmark configs (PR #1570). Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>

functionstackx force-pushed the chang/qwen35-fp8-mi355x-sglang-disagg-pr-1 branch from d3fa9ff to 99fc3fb Compare May 28, 2026 01:59

functionstackx added the non-canary-full-sweep-enabled Run the full sweep without the canary gate (full search space, no trim) label May 28, 2026

cursor Bot reviewed May 28, 2026

View reviewed changes

functionstackx merged commit 7ae613c into main May 28, 2026
27 checks passed

functionstackx deleted the chang/qwen35-fp8-mi355x-sglang-disagg-pr-1 branch May 28, 2026 04:46

github-project-automation Bot moved this to Done in InferenceMAX Board May 28, 2026

Conversation

ChangLiu0709 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Out of scope (follow-up PRs)

Co-authors

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

claude Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: [AMD] Add Qwen3.5 FP8 MI355X SGLang disaggregated benchmark

Findings

What looks good

Uh oh!

Uh oh!

claude Bot May 27, 2026

Choose a reason for hiding this comment

What the bug is

Step-by-step proof

Why existing code doesn't prevent it

Impact

Fix

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chunfangamd May 27, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: [AMD] Add Qwen3.5 FP8 MI355X SGLang disaggregated benchmark

Findings

What looks good

Uh oh!

claude Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 28, 2026

Choose a reason for hiding this comment

Changelog entry indentation inconsistent with existing entries

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

functionstackx commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChangLiu0709 commented May 27, 2026 •

edited

Loading

claude Bot commented May 27, 2026 •

edited

Loading

claude Bot commented May 27, 2026 •

edited

Loading