gpt-oss-fp4-mi355x: pin to v0.19 + switch to AITER-env-based recipe by xiaohuguo2023 · Pull Request #1531 · SemiAnalysisAI/InferenceX

xiaohuguo2023 · 2026-05-20T14:59:55Z

Pinning back to vllm 0.19.0 (was bumped to 0.21.0 in Update gptoss-fp4-mi355x-vllm vLLM ROCm image to v0.21.0 #1406). vllm 0.21 hits
a ~10% regression on MI355x gpt-oss-fp4 — bisected to ~4% from the new ROCm
fuse_allreduce_rms pass (recoverable via --compilation-config) and ~6%
from the ROCm 7.0201 → 7.0202 patch in the v0.21 image (not launcher-fixable).
Staying on 0.19 while we sort both. Heads-up: Klaud Cold will try to re-bump
on its next cron.
Also reworked the launcher: dropped AMDGCN_USE_BUFFER_OPS,
VLLM_ROCM_USE_AITER_TRITON_ROPE, the explicit --attention-backend,
fuse_rope_kvcache, and use_inductor_graph_partition. Added the AITER env
vars (VLLM_ROCM_USE_AITER_MOE/RMSNORM/UNIFIED_ATTENTION/FUSED_MOE_A16W4 +
HSA_NO_SCRATCH_RECLAIM=1), plus --max-num-seqs 256 and --async-scheduling.

Local numbers (single MI355x, v0.19.0 image, 3 runs each, median):

TP	ISL/OSL	conc	dashboard	this PR	Δ
1	1k/1k	4	1653	1662	+0.5%
1	1k/1k	8	2668	2787	+4.5%
1	1k/1k	16	4189	4549	+8.6%
1	1k/1k	32	6275	6785	+8.1%
1	1k/1k	64	8812	10573	+20.0%
1	1k/1k	128	13186	16067	+21.8%
4	1k/1k	4	1203	2498	+107.6%
4	1k/1k	8	2100	4253	+102.5%
8	1k/1k	4	2152	2505	+16.4%
8	1k/1k	8	3950	4678	+18.4%
8	1k/1k	16	6668	8571	+28.5%
1	8k/1k	4	5974	7222	+20.9%
1	8k/1k	8	9210	11154	+21.1%
1	8k/1k	16	14499	16663	+14.9%
1	8k/1k	32	20528	23445	+14.2%
1	8k/1k	64	29448	32071	+8.9%
1	8k/1k	128	38532	39333	+2.1%
4	8k/1k	4	4826	10520	+118.0%
8	8k/1k	4	7966	10602	+33.1%
8	8k/1k	8	14198	19205	+35.3%

Avg +30.3% across all 20 combos vs the 2026-03-09 dashboard. TP=4 sees the
biggest wins (~100%+).

Note

Low Risk
Benchmark-only image, env, sweep, and launcher changes for one MI355X vLLM config; no production app or security surface.

Overview
Pins gptoss-fp4-mi355x-vllm back to vllm/vllm-openai-rocm:v0.19.0 (reverting the v0.21.0 bump) and widens the fixed-seq-len sweep with TP=2 and higher TP=4 concurrency ranges on 1k/1k and 8k/1k.

The MI355X launcher (gptoss_fp4_mi355x.sh) moves to an AITER env-var recipe (MOE, RMSNorm, unified attention, fused MoE A16W4, HSA_NO_SCRATCH_RECLAIM), drops the old MEC-firmware gate, buffer-ops, explicit attention backend, and fuse_rope_kvcache compile flags, and adds --max-num-seqs 256 and --async-scheduling.

A discarded pre-flight warmup (CONC×5 prompts at full concurrency) runs before the measured benchmark so AITER Triton autotune does not under-report cold-cache CI throughput. perf-changelog.yaml documents the change.

^{Reviewed by Cursor Bugbot for commit e658541. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-05-20T15:00:08Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

chunfangamd · 2026-05-20T15:46:41Z

@claude /sweep test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi355x-vllm

Klaud-Cold · 2026-05-20T15:47:15Z

Claude finished @chunfangamd's task in 0s —— View job

I'll analyze this and get back to you.

chunfangamd

LGTM

github-actions · 2026-05-20T16:06:08Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26173755971
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26173755971

functionstackx

can u instead fix it in vllm upstream & switch to using nightly and then switch to that? what are the in progress PRs to fix it in vllm & rocm?

regression from ROCm 7.0201 → 7.0202
regression from v0.19 -> v0.21.0

fuse_allreduce_rms pass (recoverable via --compilation-config)

add it to recipes? https://recipes.vllm.ai/openai/gpt-oss-120b?hardware=mi355x

chunfangamd · 2026-05-20T19:50:23Z

@functionstackx There seems to be a misunderstanding. If we check the script changes, we can find that this PR is not intended to address the regression issue; that fix is separate and currently ongoing. @xiaohuguo2023 based his work on version 0.19.0 and has also applied it to the latest image. However, the performance gain in the latest version is lower than that of version 0.19.0, which was anticipated.

Absolutely, we should update the recipe page

chunfangamd · 2026-05-20T19:56:04Z

@xiaohuguo2023, could you please double-check the performance gain? It seems not to align with the local test.

functionstackx · 2026-05-20T20:10:14Z

and currently ongoing. @xiaohuguo2023 based his work on version 0.19.0 and has also applied it to the latest image. However, the performance gain in the latest version is lower than that of version 0.19.0, which was anticipated.

what is the vLLM PRs that fixes this in upstream HEAD

github-actions · 2026-05-20T20:10:19Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26186157722
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26186157722

chunfangamd · 2026-05-21T12:54:05Z

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi355x-vllm

github-actions · 2026-05-21T12:54:19Z

@chunfangamd Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26227187638
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi355x-vllm
Pinned ref: 1b98226
Approval: not required (trusted collaborator).

github-actions · 2026-05-21T13:22:57Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26227015172
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26227015172

xiaohuguo2023 · 2026-05-24T10:07:41Z

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi355x-vllm

github-actions · 2026-05-24T10:07:50Z

@xiaohuguo2023 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26358364428
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi355x-vllm
Pinned ref: 1b98226
Approval: not required (trusted collaborator).

chunfangamd · 2026-05-25T06:37:15Z

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi355x-vllm

claude · 2026-05-25T06:37:19Z

Claude finished @chunfangamd's task in 1m 33s —— View job

PR Review: gpt-oss-fp4-mi355x pin to v0.19 + AITER recipe

Gather context and read changed files
Validate master config changes
Validate benchmark script style and correctness
Validate perf-changelog entry
Post review findings

Summary

One blocking issue found: the new perf-changelog.yaml entry landed in the middle of the file (line ~3086 of 3147) instead of at the end, because the Merge branch 'main' commit brought in newer entries that now sit after it. Move the block to the end of the file to restore chronological order.

Everything else looks good:

Config: Image pin to v0.19.0 is valid, model prefix gptoss is correct, search-space extensions (TP=2 added, TP=4 conc-end widened) are reasonable.
Benchmark script: vllm serve args are properly formatted on separate lines. The pre-flight warmup pass is well-motivated (AITER Triton JIT cold-cache bias) and correctly discards results to /tmp/. HSA_NO_SCRATCH_RECLAIM=1 is now unconditional (previously gated on MEC FW version) — intentional per the AITER recipe.
Image accessibility: vllm/vllm-openai-rocm:v0.19.0 is a public Docker Hub image.
PR link: perf-changelog.yaml correctly references pull/1531.

github-actions · 2026-05-25T06:37:23Z

@chunfangamd Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26387067776
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi355x-vllm
Pinned ref: 6b98c26
Approval: not required (trusted collaborator).

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 6b98c26. Configure here.}

claude

LGTM — benchmark recipe update (image pin + AITER env vars + sweep-space expansion); only finding is a nit on the warmup-comment wording.

Extended reasoning...

Overview

This PR is scoped to the gpt-oss-fp4-mi355x-vllm benchmark recipe: it pins the vLLM image back from v0.21.0 to v0.19.0 (regression hedge), switches benchmarks/single_node/gptoss_fp4_mi355x.sh to an AITER-env-based recipe (MOE/RMSNORM/UnifiedAttn/A16W4 + HSA_NO_SCRATCH_RECLAIM=1), drops legacy launcher flags, adds --max-num-seqs 256 and --async-scheduling, adds a pre-flight warmup pass, and expands the search-space in .github/configs/amd-master.yaml (adds tp=2 and widens tp=4). A matching perf-changelog.yaml entry is included, and pr-link was correctly substituted to #1531 after my prior comment.

Security risks

None. Changes are confined to benchmark configuration, a launcher shell script, and a human-readable changelog. No auth, crypto, permissions, network, or user-input handling code is touched.

Level of scrutiny

Low. This is benchmark/CI infrastructure (the launcher is invoked by the sweep system, not production serving), and the substantive choices (image pin rationale, AITER flag set, sweep grid) are domain-specific tuning decisions that the AMD recipe owners have already iterated on in-thread — chunfangamd and xiaohuguo2023 have run multiple /sweep test-config runs against this branch.

Other factors

The only finding from the bug-hunting pass is a [Nit] inline comment noting the warmup-pass rationale mis-describes the underlying tool — purely a maintainer-readability concern, no functional impact. My earlier review's blocker (literal XXX placeholder in the pr-link) has been resolved in the current diff. Sweep runs have been kicked off by the recipe owners, which is the appropriate validation for this kind of change.

github-actions · 2026-05-25T06:58:13Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26387052649
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26387052649

github-actions · 2026-05-25T10:14:08Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26387052649
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26387052649

Pins the image back to vllm/vllm-openai-rocm:v0.19.0 (was bumped to v0.21.0 in #1406). v0.21 introduces a ROCm/AITER perf regression on MI355x for gpt-oss that we're still tracking down; staying on v0.19 in the meantime. Also rewrites the launcher to enable the AITER kernel paths via env vars (AITER MOE/RMSNorm/UnifiedAttn/A16W4 + HSA_NO_SCRATCH_RECLAIM=1) and drops the now-obsolete TRITON_ROPE/BUFFER_OPS/--attention-backend/ fuse_rope_kvcache/use_inductor_graph_partition bits. Also adds --max-num-seqs 256 and --async-scheduling.

github-actions · 2026-05-25T11:18:54Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26396409260
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26396409260

xiaohuguo2023 · 2026-05-25T16:37:32Z

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi355x-vllm

functionstackx · 2026-05-25T19:04:25Z

can u instead fix it in vllm upstream & switch to using nightly and then switch to that? what are the in progress PRs to fix it in vllm & rocm?

regression from ROCm 7.0201 → 7.0202
regression from v0.19 -> v0.21.0

fuse_allreduce_rms pass (recoverable via --compilation-config)

add it to recipes? https://recipes.vllm.ai/openai/gpt-oss-120b?hardware=mi355x

xiaohuguo2023 requested a review from a team May 20, 2026 14:59

xiaohuguo2023 requested review from billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners May 20, 2026 14:59

github-project-automation Bot added this to InferenceMAX Board May 20, 2026

xiaohuguo2023 requested a review from 1am9trash as a code owner May 20, 2026 14:59

claude Bot reviewed May 20, 2026

View reviewed changes

Comment thread perf-changelog.yaml Outdated

chunfangamd added the full-sweep-enabled label May 20, 2026

chunfangamd approved these changes May 20, 2026

View reviewed changes

functionstackx requested changes May 20, 2026

View reviewed changes

xiaohuguo2023 force-pushed the gptoss-mi355x-pin-v019-aiter-recipe branch from 88e13ec to 1b98226 Compare May 21, 2026 12:50

chunfangamd marked this pull request as draft May 23, 2026 17:45

chunfangamd marked this pull request as ready for review May 25, 2026 06:36

claude Bot reviewed May 25, 2026

View reviewed changes

Comment thread perf-changelog.yaml Outdated

cursor Bot reviewed May 25, 2026

View reviewed changes

Comment thread benchmarks/single_node/gptoss_fp4_mi355x.sh Outdated

Comment thread benchmarks/single_node/gptoss_fp4_mi355x.sh Outdated

claude Bot reviewed May 25, 2026

View reviewed changes

Comment thread benchmarks/single_node/gptoss_fp4_mi355x.sh Outdated

xiaohuguo2023 added 5 commits May 25, 2026 05:37

update PR number

e1d96b8

adds a pre-flight warmup pass before the measured benchmark

addd9d8

extend sweep search-space to match B200 coverage

efd9616

move PR 1531 entry to bottom (addresses bot review)

e658541

xiaohuguo2023 force-pushed the gptoss-mi355x-pin-v019-aiter-recipe branch from 6b98c26 to e658541 Compare May 25, 2026 10:42

chunfangamd marked this pull request as draft May 25, 2026 10:59

revert pre-flight warmup pass

8b8ff70

functionstackx closed this May 25, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board May 25, 2026

Conversation

xiaohuguo2023 commented May 20, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

chunfangamd commented May 20, 2026

Uh oh!

Klaud-Cold commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chunfangamd left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

functionstackx left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chunfangamd commented May 20, 2026

Uh oh!

chunfangamd commented May 20, 2026

Uh oh!

functionstackx commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

chunfangamd commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

xiaohuguo2023 commented May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

chunfangamd commented May 25, 2026

Uh oh!

claude Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: gpt-oss-fp4-mi355x pin to v0.19 + AITER recipe

Summary

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

xiaohuguo2023 commented May 25, 2026

Uh oh!

functionstackx commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

xiaohuguo2023 commented May 20, 2026 •

edited by cursor Bot

Loading

Klaud-Cold commented May 20, 2026 •

edited

Loading

functionstackx left a comment •

edited

Loading

claude Bot commented May 25, 2026 •

edited

Loading

functionstackx commented May 25, 2026 •

edited

Loading