[MoRI short term temp patch] GLM-5 FP8 MI355X SGLang disaggregated by ChangLiu0709 · Pull Request #1572 · SemiAnalysisAI/InferenceX

ChangLiu0709 · 2026-05-27T15:01:44Z

Summary

Add SGLang disaggregated prefill–decode (PD) benchmark recipe for zai-org/GLM-5-FP8 on MI355X (MoRI backend, NSA attention)
New CI config glm5-fp8-mi355x-sglang-disagg in amd-master.yaml: 1P+1D TP8/EP1 smoke sweep for ISL 1K1K and 8K1K, concurrency 8–512, image lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260517 (same as aggregated glm5-fp8-mi355x-sglang)
New GLM-5-FP8 entry in benchmarks/multi_node/amd_utils/models.yaml (MoRI transfer, glm tool/reasoning parsers, multithread load)
New multinode entry script benchmarks/multi_node/glm5_fp8_mi355x_sglang-disagg.sh
setup_deps.sh: idempotent in-container patches required for GLM-5 on ROCm images
- aiter gluon pa_mqa_logits 3D instr_shape fix (NSA / Triton ≥ 3.5)
- install transformers with glm_moe_dsa when MODEL_NAME=GLM-5-FP8
env.sh: GLM-5-specific exports (SGLANG_ROCM_FUSED_DECODE_MLA=0, quick reduce, safetensors fast GPU)
server.sh: source setup_deps.sh before env.sh (patches run once per container start)

Test plan

Verify CI passes: amd-master.yaml validates glm5-fp8-mi355x-sglang-disagg (scenarios.fixed-seq-len)
Confirm setup_deps.sh is a no-op or fast on non–GLM-5 disagg jobs (transformers install gated on MODEL_NAME)
Smoke on mia1: glm5-fp8-mi355x-sglang-disagg 1P+1D, ISL/OSL 1024/1024, conc=64
Verify model weights resolve at /it-share/data/GLM-5-FP8 (runner sets MODEL_NAME from zai-org/GLM-5-FP8)

Dependencies

Builds on shared MI355X disagg plumbing from main (and optionally after PR [AMD] Add Qwen3.5 FP8 MI355X SGLang disaggregated benchmark #1570 for Qwen3.5 disagg)
Does not include Qwen3.5 config, benchmark_serving tokenizer fallback, or MoRI conn.py overlay (follow-up PRs)

Co-authors

@ChangLiu0709
@chunfangamd
@Duyi-Wang
@functionstackx

Note

Risk
Medium: server.sh now sources setup_deps.sh for all SGLang disagg runs; gluon patch is global but idempotent; transformers install only runs for GLM-5-FP8.

Overview
Enables GLM-5 PD disaggregation in InferenceX CI. Image patches address gaps not yet in upstream aiter/transformers/docker images; runtime env tweaks reflect GLM-5 NSA architecture vs MLA-based models like DeepSeek-R1.

Note

Medium Risk
Touches all SGLang disagg container starts via setup_deps.sh and can auto-overlay upstream mori/conn.py for v0.5.12.post1 images; changes affect PD state transfer correctness, not auth or data stores.

Overview
Adds GLM-5-FP8 MI355X SGLang prefill–decode disaggregation to CI: new glm5-fp8-mi355x-sglang-disagg in amd-master.yaml (image v0.5.12.post1-rocm720-mi35x-20260523, 1P+1D TP8/EP1, dp-attn: false, conc 8–512 for 1k1k and 8k1k), GLM-5-FP8 in models.yaml, and glm5_fp8_mi355x_sglang-disagg.sh.

Runtime fixes so GLM-5 (NSA / hybrid state) can run on that image: in-tree patches/mori_conn.py bind-mounted from job.slurm when the image tag contains v0.5.12.post1 (opt-out MORI_CONN_PATCH=skip); env.sh GLM-5 exports; setup_deps.sh gated for sglang-disagg (aiter gluon 3D instr_shape, transformers glm_moe_dsa only for GLM-5-FP8); server_sglang.sh sources setup_deps.sh before env.sh.

^{Reviewed by Cursor Bugbot for commit a8722db. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-05-27T15:01:58Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-05-27T15:27:27Z

Claude finished @ChangLiu0709's task in 1m 45s —— View job

PR Review: [AMD] Add GLM-5 FP8 MI355X SGLang disaggregated benchmark

Read CLAUDE.md and repo guidelines
Review .github/configs/amd-master.yaml changes
Review perf-changelog.yaml changes
Review models.yaml changes
Review setup_deps.sh (new file)
Review glm5_fp8_mi355x_sglang-disagg.sh (new file)
Review env.sh and server.sh changes
Validate model prefix, image accessibility, perf-changelog position
Post review findings

Summary

One blocking issue: perf-changelog.yaml has pr-link: .../pull/TBD instead of the actual PR number 1572. One minor bug in setup_deps.sh where the "already applied" log line prints the epoch timestamp instead of elapsed wallclock seconds. Otherwise the PR is well-structured — patches are idempotent, the transformers install is properly gated on MODEL_NAME, the disagg config aligns with the existing glm5-fp8-mi355x-sglang aggregated config, and the perf-changelog entry is correctly appended at the end of the file.

claude · 2026-05-27T15:38:59Z

+# Summary
+# =============================================================================
+
+_SETUP_END=$(date +%s)
+if [[ ${#_SETUP_INSTALLED[@]} -eq 0 ]]; then
+    echo "[SETUP] All patches already applied (${_SETUP_END}s wallclock)"
+else
+    echo "[SETUP] Applied: ${_SETUP_INSTALLED[*]} in $(( _SETUP_END - _SETUP_START ))s"
+fi


🟡 Three related logging/observability nits in the new setup_deps.sh:

L96: _SETUP_INSTALLED+=("gluon-instr-shape-fix") runs unconditionally even when the Python heredoc takes one of its three sys.exit(0) early-return paths (target file missing, sentinel already present, pattern not found). The final "Applied: gluon-instr-shape-fix" summary therefore lies on every restart after the first run.

L135: prints the absolute Unix epoch (e.g. 1779895831s wallclock) instead of the elapsed duration — should be $(( _SETUP_END - _SETUP_START ))s to match the else branch on L137. Currently latent because of [NVIDIA] Add TRT-LLM 70B FP8 via slurm #1.

L117-119: pip install exit status is never checked, so a network/dep/SHA-GC failure still appends transformers-glm5 to _SETUP_INSTALLED and the script reports success; the job then crashes much later at sglang model load with a confusing unknown model type: glm_moe_dsa. install_transformers_glm5 would do well to follow the explicit-return-code pattern it almost already uses.

Extended reasoning...

What the bug is

This is a small bundle of observability defects in the brand-new benchmarks/multi_node/amd_utils/setup_deps.sh (sourced from server.sh on every disagg server start). None of them break the happy path, but together they degrade the diagnostic value of the [SETUP] log — which is the only thing an operator has when an image bump or pinned-SHA drift breaks the patches.

(1) _SETUP_INSTALLED appended unconditionally for the gluon patch (L96)

patch_gluon_pa_mqa_logits_instr_shape runs a Python heredoc that has three sys.exit(0) early returns:

L38 target file missing (/sgl-workspace/aiter/aiter/ops/triton/gluon/pa_mqa_logits.py not found)

L44 sentinel [PATCHED] 3D instr_shape for base gluon variant already in source — patch already applied

L88 old block not in source — aiter version drift, prints WARN

All three exit with code 0, so bash sees no failure. The very next line in bash — _SETUP_INSTALLED+=("gluon-instr-shape-fix") at L96 — runs regardless. After the first server start on a given container, the sentinel is in place, so case (2) is the common case, and yet every subsequent restart will log:

[SETUP] Applied: gluon-instr-shape-fix in 0s

…even though the function was a pure no-op. Compare with install_transformers_glm5 (L106) which only appends after pip install actually runs (its two early return 0 paths happen before the array append).

(2) Epoch printed as wallclock (L135)

_SETUP_END=$(date +%s) if [[ ${#_SETUP_INSTALLED[@]} -eq 0 ]]; then echo "[SETUP] All patches already applied (${_SETUP_END}s wallclock)" # ← BUG else echo "[SETUP] Applied: ${_SETUP_INSTALLED[*]} in $(( _SETUP_END - _SETUP_START ))s" fi

${_SETUP_END} is the absolute Unix epoch — when this branch fires it will print something like 1779895831s wallclock, which is gibberish. The else branch already does the subtraction correctly. Currently latent thanks to (1) — the array is always non-empty — but the moment (1) is fixed this branch starts firing and the log line is wrong. Should also be $(( _SETUP_END - _SETUP_START ))s.

(3) pip install exit status not checked (L117-119)

pip install --quiet -U --no-cache-dir \ "git+https://github.com/huggingface/transformers.git@6ed9ee36f608fd145168377345bfc4a5de12e1e2" _SETUP_INSTALLED+=("transformers-glm5")

The function header is bare #!/bin/bash (no set -e/set -o pipefail), so a non-zero exit from pip install just continues. Plausible failure modes: cloning github.com over a flaky link, dep resolution conflict with packages already baked into the SGLang container, the pinned commit SHA being rewritten/GC'd upstream, or git missing from the image. In all of these the function returns success, [SETUP] Applied: ... transformers-glm5 is printed, server.sh launches sglang, and the job dies at model load with AutoConfig: unknown model type: glm_moe_dsa — several minutes later, in a completely different layer of the stack.

Step-by-step proof

Imagine a clean container start where the aiter base image already contains the gluon fix (e.g. a future image bump that picks it up upstream) and we restart the server:

patch_gluon_pa_mqa_logits_instr_shape runs.

Python heredoc reads the file, sees the sentinel [PATCHED] 3D instr_shape for base gluon variant is not there (upstream uses different wording) — actually, let me pick the more common scenario: this is the second server start on the same container. The sentinel is in the source (we wrote it on the first start). Python prints gluon pa_mqa_logits 3D instr_shape patch already applied and sys.exit(0) (L44 of the heredoc).

Bash returns from python3 -c '...' with $?=0.

_SETUP_INSTALLED+=("gluon-instr-shape-fix") executes (L96).

install_transformers_glm5 runs. If MODEL_NAME != GLM-5-FP8 it returns immediately. Otherwise the AutoConfig probe succeeds (transformers was installed on first start), echo transformers already supports GLM-5 is printed, and the function returns without appending.

Final summary: _SETUP_INSTALLED=("gluon-instr-shape-fix") — length 1, not 0, so the else branch fires:
[SETUP] Applied: gluon-instr-shape-fix in 0s — misleading: nothing was applied.

If we ever did manage to hit the if-branch (e.g. someone reorders or removes patches), we would get [SETUP] All patches already applied (1779895831s wallclock) — gibberish.

For (3): on a hypothetical first-start where pip install fails (DNS hiccup, GC'd SHA, …), pip writes its error to stderr, returns non-zero, bash unconditionally appends transformers-glm5, and the summary logs success. Then server.sh continues, sglang launches, model load fails with unknown model_type: glm_moe_dsa.

Why the existing pattern doesn't protect us

The Python sentinel check makes the patch itself idempotent, but doesn't constrain the bash-side array.

--quiet on pip install only suppresses progress, not errors, so a refutation argument might say "the error is visible in stderr". That's fair, but the summary line lies, and the lie comes after the error in the log — operators triaging from the end of the log (Applied: ... transformers-glm5) reasonably assume the install succeeded. Plus, the downstream sglang error (unknown model type: glm_moe_dsa) does eventually point an experienced operator at transformers compatibility, but conflating "pip failed" with "transformers shipped doesn't have glm_moe_dsa" wastes time on the wrong hypothesis (image bump? upstream model code change? need a newer pin?). Failing fast at setup_deps.sh with a clear ERROR: transformers install failed is strictly more informative.

How to fix

All three are one-line nits and worth fixing together since the file is brand new:

Gate the gluon append on a real success signal. Easiest: have the Python script print a known token (e.g. APPLIED) only on success, capture it in bash, and append conditionally. Or use distinct non-zero exits for the no-op cases. (sys.exit(0) everywhere is the wrong abstraction here.)

L135: change ${_SETUP_END}s wallclock → $(( _SETUP_END - _SETUP_START ))s wallclock.

L117-119: pip install ... || { echo "ERROR: transformers install failed" >&2; return 1; } — and have server.sh propagate the failure (e.g. source setup_deps.sh || exit 1).

Addressing the refutation

A verifier noted that pip errors aren't fully silent (true — they go to stderr), that the downstream sglang failure is itself somewhat diagnostic (true — for someone already familiar with the stack), and that style-consistency favors leaving things as-is (the gluon patch also doesn't check its sub-process). I agree (3) on its own is nit severity — pip will scream into stderr and the job will eventually die. But: (a) the same file already has a working explicit-return-code pattern in install_transformers_glm5's AutoConfig probe, so adding a one-line guard after the pip call is consistent with this function's own style, not a foreign import; (b) the three issues compound — the summary line currently lies on the happy path (#1) and would print gibberish in the dead-code branch (#2) and would lie on the pip-failure path (#3) — together they undermine the only signal an operator has when an image bump breaks the patches. Worth bundling into one cleanup pass on a brand-new file.

Sub-issue #2 (epoch → elapsed at L766) was picked up in 8315abd, thanks. The other two are still latent:

[NVIDIA] Add TRT-LLM 70B FP8 via slurm #1: _SETUP_INSTALLED+=("gluon-instr-shape-fix") at L712 still runs after every sys.exit(0) path in the heredoc — so [SETUP] Installed: gluon-instr-shape-fix will be logged on every container restart even when the patch was a no-op (sentinel already present, the common case). One-line fix: append only when the heredoc actually wrote the file (e.g. exit non-zero on no-op paths, or print a sentinel token and grep for it).

[NVIDIA] update vllm b200 image. TODO: add logic for docker runner. #3: pip install … at L733-734 still has its exit status discarded; a network/SHA-GC failure appends transformers-glm5 and the summary line lies. pip install … || { echo "ERROR: transformers install failed" >&2; return 1; } matches the explicit-return-code style this function already uses for the AutoConfig probe.

Both are still nit severity individually, but together with #2 fixed the file is one cleanup pass away from a clean observability story.

claude · 2026-05-27T21:43:26Z

Claude finished @functionstackx's task in 2m 56s —— View job

PR Review: [AMD] Add GLM-5 FP8 MI355X SGLang disaggregated benchmark

Read CLAUDE.md and repo guidelines
Review .github/configs/amd-master.yaml changes
Review perf-changelog.yaml changes
Review models.yaml changes
Review setup_deps.sh changes
Review glm5_fp8_mi355x_sglang-disagg.sh (new file)
Review env.sh and server_sglang.sh changes
Validate model prefix, image accessibility, perf-changelog position
Post review findings

Summary

Previous blocking issues from the first review have been fixed: perf-changelog.yaml now has the correct PR link (#1572 instead of TBD), and the epoch timestamp bug on the "already present" log line has been corrected.

Two remaining warnings posted as inline comments on setup_deps.sh:

Gluon patch tracking (L712) — _SETUP_INSTALLED+=("gluon-instr-shape-fix") appends unconditionally even when the Python heredoc takes an early-exit (no-op) path, so the summary lies on every restart after the first run. This matches the pre-existing vLLM pattern (L518, L629) so not blocking, but worth fixing since the file is being reworked.
Unchecked pip install (L733-735) — if the transformers install fails (network, GC'd SHA, dep conflict), the script still reports success and the job crashes minutes later at model load with a confusing unknown model type error. Adding || { echo "ERROR" >&2; return 1; } would fail fast with a clear message.

Everything else looks correct: config aligns with existing glm5-fp8-mi355x-sglang aggregated config, image is publicly accessible on Docker Hub, perf-changelog entry is appended at the end, model-prefix: glm5 is consistent with existing configs, ENGINE dispatch logic is sound, GLM-5 env tuning (disable fused-decode-MLA for NSA) is appropriate, and the launch script follows the established disagg pattern.

claude · 2026-05-27T21:46:07Z

+open(target, "w").write(new_src)
+print("[SETUP] Patched: gluon pa_mqa_logits 3D instr_shape for base variant")
+'
+    _SETUP_INSTALLED+=("gluon-instr-shape-fix")


🟡 WARNING: _SETUP_INSTALLED+=("gluon-instr-shape-fix") runs unconditionally — even when the Python heredoc exits early (file not found, sentinel present, pattern missing). All three early-return paths use sys.exit(0), so bash always reaches this line.

Why it matters: On every container restart after the first run, the summary falsely reports Applied: gluon-instr-shape-fix when nothing was actually patched. This also means the "All dependencies already present" branch on L765 can never fire for SGLang runs.

Fix: Gate the append on actual patch application. Easiest approach — check Python's stdout:

Suggested change

_SETUP_INSTALLED+=("gluon-instr-shape-fix")

if python3 -c '...' 2>&1 | grep -q "^\\[SETUP\\] Patched:"; then

_SETUP_INSTALLED+=("gluon-instr-shape-fix")

fi

Or use distinct exit codes (e.g., sys.exit(0) for success, sys.exit(2) for no-op).

Note: This matches a pre-existing pattern from the vLLM patches (L518, L629), so not blocking, but worth cleaning up since this file is being substantially reworked in this PR anyway.

claude · 2026-05-27T21:46:15Z

+    pip install --quiet -U --no-cache-dir \
+        "git+https://github.com/huggingface/transformers.git@6ed9ee36f608fd145168377345bfc4a5de12e1e2"
+    _SETUP_INSTALLED+=("transformers-glm5")


🟡 WARNING: pip install exit status is not checked. If it fails (network issue, GC'd commit SHA, dep conflict), transformers-glm5 is still appended to _SETUP_INSTALLED and the summary reports success. The job then crashes at model load with unknown model type: glm_moe_dsa — minutes later, in a different layer.

Why it matters: Operators triaging from the end of the log see Applied: transformers-glm5 and reasonably assume the install succeeded, wasting time on the wrong hypothesis.

Fix:

Suggested change

pip install --quiet -U --no-cache-dir \

"git+https://github.com/huggingface/transformers.git@6ed9ee36f608fd145168377345bfc4a5de12e1e2"

_SETUP_INSTALLED+=("transformers-glm5")

pip install --quiet -U --no-cache-dir \

"git+https://github.com/huggingface/transformers.git@6ed9ee36f608fd145168377345bfc4a5de12e1e2" \

|| { echo "[SETUP] ERROR: transformers install failed" >&2; return 1; }

_SETUP_INSTALLED+=("transformers-glm5")

github-actions · 2026-05-27T21:48:13Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26540490749
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26540490749

claude · 2026-05-27T21:53:34Z

+    $PREFILL_NUM_WORKERS \
+    $DECODE_NODES \
+    $DECODE_NUM_WORKERS \
+    $ISL $OSL "${CONC_LIST// /x}" inf \
+    ${PREFILL_ENABLE_EP} ${PREFILL_ENABLE_DP} \
+    ${DECODE_ENABLE_EP} ${DECODE_ENABLE_DP} \
+    ${PREFILL_TP} ${DECODE_TP} \
+    ${RANDOM_RANGE_RATIO} \
+    ${NODE_LIST:-})


🟡 Two copy-paste inconsistencies vs sister disagg launch scripts: (1) line 76 passes ${NODE_LIST:-} (unquoted, with underscore) while sister kimik2.5/minimaxm2.5 vllm-disagg scripts use quoted "${NODELIST:-}" — a caller that exports the documented NODELIST env var will have it silently dropped here because this script reads the (undefined) $NODE_LIST; (2) the check_env_vars list at lines 5-22 omits FRAMEWORK, which both dsr1_fp8_mi355x_sglang-disagg.sh:23 and dsr1_fp4_mi355x_sglang-disagg.sh:23 include as the 18th required var. Fix: rename to "${NODELIST:-}" (quoted) and append FRAMEWORK to the check_env_vars list to match the byte-near-identical siblings.

Extended reasoning...

What the bug is

The new glm5_fp8_mi355x_sglang-disagg.sh is essentially copy-pasted from the sister disagg launch scripts but drifts in two small ways at the file boundaries:

Issue 1 — NODE_LIST vs NODELIST (L76). This file passes the optional 16th positional arg to submit.sh as ${NODE_LIST:-} (unquoted, underscored). The two existing vllm-disagg sisters in the same directory both pass it as "${NODELIST:-}" (quoted, no underscore):

benchmarks/multi_node/kimik2.5_fp4_mi355x_vllm-disagg.sh:73 → "${NODELIST:-}"

benchmarks/multi_node/minimaxm2.5_fp8_mi355x_vllm-disagg.sh:71 → "${NODELIST:-}"

The dsr1 sglang-disagg sisters don't pass arg 16 at all, so the only established convention for arg-16 in this directory is NODELIST. submit.sh:91 reads positional arg 16 into a local var also called NODE_LIST, which is why this is purely a launch-script-level mismatch — the caller-facing env var is what differs.

Issue 2 — FRAMEWORK omitted from check_env_vars (L5-L22). Both sglang-disagg sisters include FRAMEWORK as the 18th required var:

dsr1_fp8_mi355x_sglang-disagg.sh:23 → FRAMEWORK

dsr1_fp4_mi355x_sglang-disagg.sh:23 → FRAMEWORK

submit.sh:70 enforces check_env FRAMEWORK and submit.sh:97 uses it to drive export ENGINE="${FRAMEWORK:-sglang}", so a missing FRAMEWORK will fail anyway — just deeper in the stack, defeating the early-validation purpose of check_env_vars.

Step-by-step proof of impact

For issue 1, imagine a CI runner or operator following the documented sister-script pattern and exporting NODELIST=mia1-01,mia1-02 before invoking the recipe:

runner exports NODELIST=mia1-01,mia1-02.

Launch script kimik2.5/minimaxm2.5 → expands to "mia1-01,mia1-02" as positional arg 16 → submit.sh reads it into its internal NODE_LIST → sbatch gets the node pinning.

Launch script glm5 → expands ${NODE_LIST:-} → empty string (because NODE_LIST was never exported, only NODELIST was) → arg 16 is empty → sbatch never gets the pinning.

The user's intent silently disappears. The unquoted form would also word-split a whitespace-containing value into multiple positional args, vs the quoted sister form which preserves it as a single arg.

For issue 2, if FRAMEWORK is unset for this new recipe, the failure mode is the same (job fails) but the error message originates from amd_utils/submit.sh:70 instead of the launch script. The launch script's whole purpose of doing early environment validation is bypassed for this variable.

Why existing code doesn't prevent it

For (1), submit.sh happens to use the same internal name NODE_LIST, which is why this is easy to miss in code review — both names "work" at the submit boundary. The caller-side mismatch only surfaces when someone actually exports the env var.

For (2), submit.sh:70 does enforce FRAMEWORK downstream, so the job will fail closed, not silently corrupt output. But the launch script's purpose is to surface missing config at the earliest, most-targeted location.

Addressing the verifier note on current latency

A verifier observed that no code in the repo currently exports either NODELIST or NODE_LIST, so today both kimi/minimax and glm5 disagg jobs run without explicit node pinning — the inconsistency is dormant. That's fair: this is a latent bug, not a current-CI breakage. But the file is brand new in this PR, the sister-script pattern is right there to copy, and the fix is a one-character rename plus quotes plus one line in check_env_vars. Best fixed at introduction time so future runners that adopt node-pinning don't get a silent feature-loss bug on this particular recipe.

How to fix

check_env_vars \\ ... - RANDOM_RANGE_RATIO + RANDOM_RANGE_RATIO \\ + FRAMEWORK ... ${RANDOM_RANGE_RATIO} \\ - ${NODE_LIST:-}) + "${NODELIST:-}")

github-actions · 2026-05-27T23:14:39Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26543294366
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26543294366

github-actions · 2026-05-28T00:05:00Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26544255929
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26544255929

functionstackx

hi @ChangLiu0709 @chunfangamd @HaiShaw thanks for the SGLang GLM5 mi355 disagg PR but unfortunately it doesn't work yet, can u look into fixing this?

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26544255929/job/78192785656?pr=1572

[2026-05-27 23:38:59.096 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 4049, in run_scheduler_process
    scheduler.run_event_loop()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1547, in run_event_loop
    dispatch_event_loop(self)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3932, in dispatch_event_loop
    scheduler.event_loop_overlap_disagg_decode()
  File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 1606, in event_loop_overlap_disagg_decode
    self.process_decode_queue()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 1773, in process_decode_queue
    req_conns, _ = self.disagg_decode_prealloc_queue.pop_preallocated()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 735, in pop_preallocated
    self._resolve_pending_reqs()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 729, in _resolve_pending_reqs
    decode_req.kv_receiver.init(prefill_dp_rank)
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py", line 1407, in init
    super().init(prefill_dp_rank)
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/common/conn.py", line 783, in init
    self._setup_bootstrap_infos()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/common/conn.py", line 835, in _setup_bootstrap_infos
    self._register_kv_args()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py", line 1423, in _register_kv_args
    packed_state_item_lens = b"".join(
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py", line 1424, in <genexpr>
    struct.pack("I", item_len)
struct.error: required argument is not an integer
[2026-05-27 23:38:59.097 TP7] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 4049, in run_scheduler_process
    scheduler.run_event_loop()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1547, in run_event_loop
    dispatch_event_loop(self)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3932, in dispatch_event_loop
    scheduler.event_loop_overlap_disagg_decode()
  File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 1606, in event_loop_overlap_disagg_decode
    self.process_decode_queue()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 1773, in process_decode_queue
    req_conns, _ = self.disagg_decode_prealloc_queue.pop_preallocated()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 735, in pop_preallocated
    self._resolve_pending_reqs()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 729, in _resolve_pending_reqs
    decode_req.kv_receiver.init(prefill_dp_rank)
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py", line 1407, in init
    super().init(prefill_dp_rank)
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/common/conn.py", line 783, in init
    self._setup_bootstrap_infos()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/common/conn.py", line 835, in _setup_bootstrap_infos
    self._register_kv_args()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py", line 1423, in _register_kv_args
    packed_state_item_lens = b"".join(
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py", line 1424, in <genexpr>
    struct.pack("I", item_len)
struct.error: required argument is not an integer

cursor · 2026-05-28T02:05:41Z

+    echo "[SETUP] Installing transformers with GLM-5 (glm_moe_dsa) support..."
+    pip install --quiet -U --no-cache-dir \
+        "git+https://github.com/huggingface/transformers.git@6ed9ee36f608fd145168377345bfc4a5de12e1e2"
+    _SETUP_INSTALLED+=("transformers-glm5")


Missing post-install verification for transformers pip install

Medium Severity

install_transformers_glm5 does not verify that the pip install from the pinned git commit actually succeeded before recording it in _SETUP_INSTALLED. The analogous install_amd_quark function in the same file includes a post-install check (python3 -c "import quark"), but this function blindly proceeds. If the git clone or pip install fails (network issue, transient error), the GLM-5 model will fail to load later with a cryptic error that's hard to debug in CI.

^{Reviewed by Cursor Bugbot for commit c3bef8f. Configure here.}

…p crash (#1578) sglang v0.5.12.post1 ships an unmigrated MoRI PD-disagg backend (legacy singular `state_type` + flat-int wire format) that crashes hybrid-attention models at PD-disagg startup. PR #1572's CI run (actions/runs/26544255929/job/78192785656) hit this exact failure: File ".../disaggregation/mori/conn.py", line 1424, in <genexpr> struct.pack("I", item_len) struct.error: required argument is not an integer Root cause: KVArgs.state_item_lens is List[List[int]] for any model with state_types: List[StateType] (Qwen3.5-MoE, GLM-5 NSA, etc.), but MoRI's `_register_kv_args` iterates and packs each element with naked `struct.pack("I", item_len)`, expecting flat List[int]. Two further sites in `send_state` and `_send_swa_dsa_state` have the same legacy-API assumption. This PR ports the validated overlay from Chun Fang's working branch (chun-chang/sglang-disagg-qwen3.5, commits 48e459b + c4e397d) and stacks it on top of PR #1572: - benchmarks/multi_node/amd_utils/patches/mori_conn.py (new, 1665 lines) Drop-in replacement for sglang v0.5.12.post1's disaggregation/mori/conn.py with four conservative patches: 1. Sender flatten — handle nested state_item_lens 2. state_type plural-API fallback (matches Mooncake/NIXL) 3. Consumer normalize state_item_lens at send_state entry 4. SWA/DSA rank+length normalize before group_concurrent_contiguous (fixes GLM-5 DSA single-component np.diff broadcast crash) - benchmarks/multi_node/amd_utils/patches/README.md (new) Bug analysis, when-to-use table, opt-out knob documentation. - benchmarks/multi_node/amd_utils/job.slurm (+25) Auto-bind-mount the overlay when DOCKER_IMAGE_NAME contains "v0.5.12.post1". Opt-out via MORI_CONN_PATCH=skip. Appends to ${EXTRA_DOCKER_MOUNTS:-} so callers can still inject other mounts. - .github/configs/amd-master.yaml (+1/-1) Image bump v0.5.12-...-20260517 → v0.5.12.post1-...-20260523 to unlock the auto-apply gate (matches chun-chang lineage). - perf-changelog.yaml (+2/-1) Document the image bump and overlay rationale. Validated on chun-chang at the same image tag: glm5-fp8-mi355x-sglang-disagg GSM8K strict-match = 0.9712 ± 0.0046 glm5-fp8-mi355x-sglang-disagg GSM8K flexible-extract = 0.9704 ± 0.0047 Stop-gap until sglang migrates MoRI to the plural state_types API that Mooncake (mooncake/conn.py:912) and NIXL (nixl/conn.py:1381) already use. Tracks sgl-project/sglang#21886 and #22665. Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>

Introduce glm5-fp8-mi355x-sglang-disagg CI config, model server flags, launch script, setup_deps.sh image patches, and GLM-5 env tuning for MoRI PD disaggregation on MI355X. Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>

- Add missing FRAMEWORK to check_env_vars list to match sister sglang-disagg scripts (dsr1_fp8, dsr1_fp4) - Rename NODE_LIST to NODELIST (quoted) to match the convention used by kimik2.5/minimaxm2.5 vllm-disagg sisters Co-authored-by: functionstackx <functionstackx@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>

…p crash (#1578) sglang v0.5.12.post1 ships an unmigrated MoRI PD-disagg backend (legacy singular `state_type` + flat-int wire format) that crashes hybrid-attention models at PD-disagg startup. PR #1572's CI run (actions/runs/26544255929/job/78192785656) hit this exact failure: File ".../disaggregation/mori/conn.py", line 1424, in <genexpr> struct.pack("I", item_len) struct.error: required argument is not an integer Root cause: KVArgs.state_item_lens is List[List[int]] for any model with state_types: List[StateType] (Qwen3.5-MoE, GLM-5 NSA, etc.), but MoRI's `_register_kv_args` iterates and packs each element with naked `struct.pack("I", item_len)`, expecting flat List[int]. Two further sites in `send_state` and `_send_swa_dsa_state` have the same legacy-API assumption. This PR ports the validated overlay from Chun Fang's working branch (chun-chang/sglang-disagg-qwen3.5, commits 48e459b + c4e397d) and stacks it on top of PR #1572: - benchmarks/multi_node/amd_utils/patches/mori_conn.py (new, 1665 lines) Drop-in replacement for sglang v0.5.12.post1's disaggregation/mori/conn.py with four conservative patches: 1. Sender flatten — handle nested state_item_lens 2. state_type plural-API fallback (matches Mooncake/NIXL) 3. Consumer normalize state_item_lens at send_state entry 4. SWA/DSA rank+length normalize before group_concurrent_contiguous (fixes GLM-5 DSA single-component np.diff broadcast crash) - benchmarks/multi_node/amd_utils/patches/README.md (new) Bug analysis, when-to-use table, opt-out knob documentation. - benchmarks/multi_node/amd_utils/job.slurm (+25) Auto-bind-mount the overlay when DOCKER_IMAGE_NAME contains "v0.5.12.post1". Opt-out via MORI_CONN_PATCH=skip. Appends to ${EXTRA_DOCKER_MOUNTS:-} so callers can still inject other mounts. - .github/configs/amd-master.yaml (+1/-1) Image bump v0.5.12-...-20260517 → v0.5.12.post1-...-20260523 to unlock the auto-apply gate (matches chun-chang lineage). - perf-changelog.yaml (+2/-1) Document the image bump and overlay rationale. Validated on chun-chang at the same image tag: glm5-fp8-mi355x-sglang-disagg GSM8K strict-match = 0.9712 ± 0.0046 glm5-fp8-mi355x-sglang-disagg GSM8K flexible-extract = 0.9704 ± 0.0047 Stop-gap until sglang migrates MoRI to the plural state_types API that Mooncake (mooncake/conn.py:912) and NIXL (nixl/conn.py:1381) already use. Tracks sgl-project/sglang#21886 and #22665. Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>

functionstackx · 2026-05-28T05:03:09Z

hi @ChangLiu0709 @chunfangamd @HaiShaw thanks for the SGLang GLM5 mi355 disagg PR but unfortunately it doesn't work yet, can u look into fixing this?

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26544255929/job/78192785656?pr=1572

[2026-05-27 23:38:59.096 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 4049, in run_scheduler_process
    scheduler.run_event_loop()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1547, in run_event_loop
    dispatch_event_loop(self)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3932, in dispatch_event_loop
    scheduler.event_loop_overlap_disagg_decode()
  File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 1606, in event_loop_overlap_disagg_decode
    self.process_decode_queue()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 1773, in process_decode_queue
    req_conns, _ = self.disagg_decode_prealloc_queue.pop_preallocated()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 735, in pop_preallocated
    self._resolve_pending_reqs()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 729, in _resolve_pending_reqs
    decode_req.kv_receiver.init(prefill_dp_rank)
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py", line 1407, in init
    super().init(prefill_dp_rank)
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/common/conn.py", line 783, in init
    self._setup_bootstrap_infos()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/common/conn.py", line 835, in _setup_bootstrap_infos
    self._register_kv_args()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py", line 1423, in _register_kv_args
    packed_state_item_lens = b"".join(
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py", line 1424, in <genexpr>
    struct.pack("I", item_len)
struct.error: required argument is not an integer
[2026-05-27 23:38:59.097 TP7] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 4049, in run_scheduler_process
    scheduler.run_event_loop()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1547, in run_event_loop
    dispatch_event_loop(self)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3932, in dispatch_event_loop
    scheduler.event_loop_overlap_disagg_decode()
  File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 1606, in event_loop_overlap_disagg_decode
    self.process_decode_queue()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 1773, in process_decode_queue
    req_conns, _ = self.disagg_decode_prealloc_queue.pop_preallocated()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 735, in pop_preallocated
    self._resolve_pending_reqs()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 729, in _resolve_pending_reqs
    decode_req.kv_receiver.init(prefill_dp_rank)
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py", line 1407, in init
    super().init(prefill_dp_rank)
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/common/conn.py", line 783, in init
    self._setup_bootstrap_infos()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/common/conn.py", line 835, in _setup_bootstrap_infos
    self._register_kv_args()
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py", line 1423, in _register_kv_args
    packed_state_item_lens = b"".join(
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py", line 1424, in <genexpr>
    struct.pack("I", item_len)
struct.error: required argument is not an integer

merged short term patch via chun's patch #1578

+                            len(self.state_mem_descs),
+                            inferred,
+                        )
+                    except Exception:


+                        len(dst_state_indices),
+                        new_common,
+                    )
+                except Exception:


+                    len(_raw_state_dim_per_tensor),
+                    _raw_state_dim_per_tensor[:3],
+                )
+            except Exception:


cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit a8722db. Configure here.}

cursor · 2026-05-28T05:08:12Z

+                    pass
+                self.__class__._logged_dsa_index_flatten = True
+            src_state_indices = src_state_indices[:new_common]
+            dst_state_indices = dst_state_indices[:new_common]


Ravel after truncation causes silent state data loss

High Severity

PATCH #4's .ravel() normalization is placed after the initial length-based truncation at lines 1135–1149, but that truncation uses len() which returns the outer dimension of a 2D array. When src_state_indices is 2D shape (1, N) (from _normalize_state_indices) and dst_state_indices is 1D shape (N,) (from np.frombuffer in TransferInfo.from_zmq), the pre-truncation computes common_len = min(1, N) = 1 and truncates dst_state_indices to a single element. The subsequent ravel then finds mismatched lengths and re-truncates both to 1 element, silently discarding N−1 state entries. The ravel needs to happen before the initial length comparison, not after it.

Additional Locations (1)

benchmarks/multi_node/amd_utils/patches/mori_conn.py#L1134-L1149

^{Reviewed by Cursor Bugbot for commit a8722db. Configure here.}

cursor · 2026-05-28T05:08:12Z

+                            out.append(int(x))
+                else:
+                    out.append(int(entry))
+            return out


Duplicated flatten helpers across two methods

Low Severity

_flatten_state_seq (nested inside send_state) and _flatten_skip_none (nested inside _register_kv_args) implement identical flattening logic for List[List[int]] → List[int]. Having two copies means a bug fix in one won't automatically propagate to the other. A single module-level helper would eliminate this duplication.

Additional Locations (1)

benchmarks/multi_node/amd_utils/patches/mori_conn.py#L1526-L1551

^{Reviewed by Cursor Bugbot for commit a8722db. Configure here.}

github-actions · 2026-05-28T07:36:47Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26555773349
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26555773349

Oren resolved the issue.

chunfangamd

LGTM

functionstackx · 2026-05-28T16:28:25Z

/reuse-sweep-run

github-project-automation Bot added this to InferenceMAX Board May 27, 2026

ChangLiu0709 marked this pull request as ready for review May 27, 2026 15:26

ChangLiu0709 requested a review from a team May 27, 2026 15:26

ChangLiu0709 requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners May 27, 2026 15:26

claude Bot reviewed May 27, 2026

View reviewed changes

Comment thread perf-changelog.yaml Outdated

claude Bot reviewed May 27, 2026

View reviewed changes

Comment thread benchmarks/multi_node/amd_utils/setup_deps.sh Outdated

cursor Bot reviewed May 27, 2026

View reviewed changes

Comment thread benchmarks/multi_node/amd_utils/setup_deps.sh

claude Bot reviewed May 27, 2026

View reviewed changes

chunfangamd previously requested changes May 27, 2026

View reviewed changes

Comment thread .github/configs/amd-master.yaml

chunfangamd marked this pull request as draft May 27, 2026 16:44

functionstackx force-pushed the chang/glm5-fp8-mi355x-sglang-disagg-pr-2 branch from 98f1645 to 8315abd Compare May 27, 2026 21:40

functionstackx marked this pull request as ready for review May 27, 2026 21:43

functionstackx added the sweep-enabled label May 27, 2026

functionstackx changed the title ~~[AMD] Add GLM-5 FP8 MI355X SGLang disaggregated benchmark~~ [AMD][WIP- small changes suggested by chunfang] Add GLM-5 FP8 MI355X SGLang disaggregated benchmark May 27, 2026

claude Bot reviewed May 27, 2026

View reviewed changes

functionstackx removed the sweep-enabled label May 27, 2026

functionstackx force-pushed the chang/glm5-fp8-mi355x-sglang-disagg-pr-2 branch from 8315abd to 0a39b9e Compare May 27, 2026 21:48

functionstackx changed the title ~~[AMD][WIP- small changes suggested by chunfang] Add GLM-5 FP8 MI355X SGLang disaggregated benchmark~~ [AMD] Add GLM-5 FP8 MI355X SGLang disaggregated benchmark May 27, 2026

functionstackx added the sweep-enabled label May 27, 2026

claude Bot reviewed May 27, 2026

View reviewed changes

cursor Bot reviewed May 27, 2026

View reviewed changes

Comment thread benchmarks/multi_node/glm5_fp8_mi355x_sglang-disagg.sh Outdated

functionstackx removed the sweep-enabled label May 28, 2026

functionstackx reviewed May 28, 2026

View reviewed changes

functionstackx force-pushed the chang/glm5-fp8-mi355x-sglang-disagg-pr-2 branch from 02e5a2b to 70ee756 Compare May 28, 2026 01:59

functionstackx changed the title ~~[AMD] Add GLM-5 FP8 MI355X SGLang disaggregated benchmark~~ [AMD][MoRI broken] Add GLM-5 FP8 MI355X SGLang disaggregated benchmark May 28, 2026

cursor Bot reviewed May 28, 2026

View reviewed changes

This was referenced May 28, 2026

short term patch: GLM-5 disagg: port MoRI conn.py overlay to fix PD startup crash #1578

Merged

[Bug] AMD glm5 & dsv4 MoRI disagg connector error: struct.error: required argument is not an integer sgl-project/sglang#26525

Open

cliu1004@amd.com and others added 3 commits May 28, 2026 01:01

Update benchmarks/multi_node/amd_utils/setup_deps.sh

f7bac55

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Co-authored-by: chunfangamd <chun.fang@amd.com> Co-authored-by: ChangLiu0709 <cliu1004@amd.com>

functionstackx force-pushed the chang/glm5-fp8-mi355x-sglang-disagg-pr-2 branch from ca5efd9 to f3dcff0 Compare May 28, 2026 05:01

functionstackx added the non-canary-full-sweep-enabled Run the full sweep without the canary gate (full search space, no trim) label May 28, 2026

github-code-quality Bot found potential problems May 28, 2026

View reviewed changes

functionstackx changed the title ~~[AMD][MoRI broken] Add GLM-5 FP8 MI355X SGLang disaggregated benchmark~~ [MoRI short term temp patch] GLM-5 FP8 MI355X SGLang disaggregated May 28, 2026

cursor Bot reviewed May 28, 2026

View reviewed changes

chunfangamd approved these changes May 28, 2026

View reviewed changes

chunfangamd enabled auto-merge (squash) May 28, 2026 08:10

ChangLiu0709 mentioned this pull request May 28, 2026

[ROCm] Hotpatch aiter gluon pa_mqa_logits 3D instr_shape for GLM-5 (Triton 3.5+) sgl-project/sglang#26572

Open

4 tasks

functionstackx disabled auto-merge May 28, 2026 16:27

functionstackx merged commit 466be35 into main May 28, 2026
54 checks passed

functionstackx deleted the chang/glm5-fp8-mi355x-sglang-disagg-pr-2 branch May 28, 2026 16:28

github-project-automation Bot moved this to Done in InferenceMAX Board May 28, 2026

Conversation

ChangLiu0709 commented May 27, 2026 • edited by chunfangamd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Dependencies

Co-authors

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

claude Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: [AMD] Add GLM-5 FP8 MI355X SGLang disaggregated benchmark

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot May 27, 2026

Choose a reason for hiding this comment

What the bug is

(1) _SETUP_INSTALLED appended unconditionally for the gluon patch (L96)

(2) Epoch printed as wallclock (L135)

(3) pip install exit status not checked (L117-119)

Step-by-step proof

Why the existing pattern doesn't protect us

How to fix

Addressing the refutation

Uh oh!

claude Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: [AMD] Add GLM-5 FP8 MI355X SGLang disaggregated benchmark

Summary

Uh oh!

claude Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

claude Bot May 27, 2026

Choose a reason for hiding this comment

What the bug is

Step-by-step proof of impact

Why existing code doesn't prevent it

Addressing the verifier note on current latency

How to fix

Uh oh!

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 28, 2026

Choose a reason for hiding this comment

Missing post-install verification for transformers pip install

Uh oh!

functionstackx commented May 28, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 28, 2026

Choose a reason for hiding this comment

Ravel after truncation causes silent state data loss

Uh oh!

cursor Bot May 28, 2026

Choose a reason for hiding this comment

Duplicated flatten helpers across two methods

Uh oh!

github-actions Bot commented May 28, 2026

ChangLiu0709 commented May 27, 2026 •

edited by chunfangamd

Loading

claude Bot commented May 27, 2026 •

edited

Loading

(1) `_SETUP_INSTALLED` appended unconditionally for the gluon patch (L96)

claude Bot commented May 27, 2026 •

edited

Loading