[ROCm] Hotpatch aiter gluon pa_mqa_logits 3D instr_shape for GLM-5 (Triton 3.5+) by ChangLiu0709 · Pull Request #26572 · sgl-project/sglang

ChangLiu0709 · 2026-05-28T11:44:42Z

Background

While adding GLM-5 FP8 on MI355X with SGLang in InferenceX (#1572), serving fails due to a bug in the NSA attention gluon pa_mqa_logits kernel (#26533). See Summary below for root cause and this PR's fix.

Summary

GLM-5 FP8 on MI355X (DSA / NSA attention) can fail when the vendored aiter in ROCm images predates ROCm/aiter#2575: the base _gluon_deepgemm_fp8_paged_mqa_logits kernel hardcoded 2D instr_shape=[16, 16] while Triton ≥ 3.5 requires 3D [16, 16, 32] when _Use_2d_instr_shape_mfma_layout is false (same conditional already used in the preshuffle variants).

This PR ports the idempotent hotpatch used in InferenceX disagg benchmarks into the SGLang repo:

scripts/ci/amd/patch_aiter_gluon_pa_mqa_logits.py — shared patch script (no-op when aiter already includes ROCm/aiter#2575)
docker/rocm.Dockerfile — run patch after aiter checkout (before setup.py build)
scripts/ci/amd/amd_ci_install_dependency.sh — run patch on /sgl-workspace/aiter in CI (covers pre-installed and rebuilt aiter)
GLM-5 docs — document ROCm env vars used in production (SGLANG_ROCM_FUSED_DECODE_MLA=0, ROCM_QUICK_REDUCE_QUANTIZATION=INT4, SAFETENSORS_FAST_GPU=1)

Not in scope

AITER_COMMIT bump in this PR: The Dockerfile default (46e6c92) already includes ROCm/aiter#2575. We only add an idempotent hotpatch for older vendored aiter.
When the hotpatch is unnecessary: Once lmsysorg/sglang-rocm images are rebuilt with aiter that includes ROCm/aiter#2575, the runtime/build-time patch is a no-op and InferenceX can drop the equivalent setup_deps.sh gluon patch (#1572).
Transformers glm_moe_dsa pip install: Handled in InferenceX for Mori images; SGLang uses in-tree GlmMoeDsaForCausalLM and transformers==5.8.1.

Co-authors

@ChangLiu0709
@chunfangamd

Test plan

python3 scripts/ci/amd/patch_aiter_gluon_pa_mqa_logits.py on aiter checkout before Support w8a8 fp8 block-wise quantization #2575 → patch applies once; second run is no-op
python3 scripts/ci/amd/patch_aiter_gluon_pa_mqa_logits.py on current aiter 46e6c92 → warns or no-op (pattern absent)
GLM-5 MI35x perf/accuracy CI (test_glm5_perf_mi35x.py) with rebuilt ROCm image
Manual: zai-org/GLM-5-FP8 serve on MI355X with documented ROCm env vars

Add idempotent patch script (ROCm/aiter#2575) for older vendored aiter in ROCm images: base _gluon_deepgemm_fp8_paged_mqa_logits used 2D MFMA instr_shape; GLM-5 needs 3D when _Use_2d_instr_shape_mfma_layout is false. Apply at docker build and AMD CI. Document GLM-5 ROCm env vars (SGLANG_ROCM_FUSED_DECODE_MLA=0, quick reduce, safetensors fast GPU). Co-authored-by: Cursor <cursoragent@cursor.com>

gemini-code-assist

Code Review

This pull request introduces a hotpatch script (patch_aiter_gluon_pa_mqa_logits.py) to update aiter's gluon/pa_mqa_logits.py for Triton 3.5+ compatibility on ROCm, applying it in both the Dockerfile and CI dependency installation scripts. It also updates the GLM-5 deployment documentation and command generator to include AMD-specific environment variables. A review comment suggests using a with statement when reading the target file in the hotpatch script to ensure proper resource management.

gemini-code-assist · 2026-05-28T11:45:28Z

+        print(f"[aiter-hotpatch] {target} not found, skipping")
+        return False
+
+    src = open(target, encoding="utf-8").read()


It is recommended to use a with statement when opening files to ensure that file descriptors are closed properly and promptly, rather than relying on garbage collection.

Suggested change

src = open(target, encoding="utf-8").read()

with open(target, encoding="utf-8") as f:

src = f.read()

gemini-code-assist · 2026-05-28T15:12:44Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

1am9trash · 2026-05-29T01:54:33Z

Hi @ChangLiu0709,
Thanks for the patch. After tracing through the timeline, the dependency order on main looks like it already guarantees the bug can't happen:

aiter #2575 (2026-04-03) — fixes the instr_shape
sglang #22264 (2026-04-11) — bumps aiter to v0.1.12.post1 (includes PR#2575)
sglang #22657 (2026-04-13) — only then removes the if False: guard and enables the gluon kernel, explicitly citing PR#2575

So the gluon path may only become reachable after the fix is in aiter.
Could you share under what conditions this bug actually triggers?

ChangLiu0709 requested review from Fridge003, HaiShaw, JustinTong0323, ishandhanani, ispobock, wisclmy0611, yctseng0211 and zijiexia as code owners May 28, 2026 11:44

github-actions Bot added documentation Improvements or additions to documentation amd labels May 28, 2026

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

ChangLiu0709 marked this pull request as draft May 28, 2026 11:46

ChangLiu0709 marked this pull request as ready for review May 28, 2026 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] Hotpatch aiter gluon pa_mqa_logits 3D instr_shape for GLM-5 (Triton 3.5+)#26572

[ROCm] Hotpatch aiter gluon pa_mqa_logits 3D instr_shape for GLM-5 (Triton 3.5+)#26572
ChangLiu0709 wants to merge 1 commit into
sgl-project:mainfrom
ChangLiu0709:chang/glm5-rocm-gluon-pa-mqa-instr-shape

ChangLiu0709 commented May 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot commented May 28, 2026

Uh oh!

1am9trash commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	src = open(target, encoding="utf-8").read()
	with open(target, encoding="utf-8") as f:
	src = f.read()

Conversation

ChangLiu0709 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Summary

Not in scope

Co-authors

Test plan

Related

CI States

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot commented May 28, 2026

Uh oh!

1am9trash commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ChangLiu0709 commented May 28, 2026 •

edited

Loading