[AMD] Fix eval for dsr1 fp4 by billishyahao · Pull Request #1566 · SemiAnalysisAI/InferenceX

billishyahao · 2026-05-26T15:52:00Z

This patch is to

Fix the eval result of dsr1 fp4 with fp8 blockwise combine
Bump the image to May 19
Add conc 512 new sweep points

Note

Medium Risk
Benchmark and serving env changes affect published perf/eval numbers; decode launch line may use a literal {MORI_COMBINE_DTYPE_DECODE} if the missing $ in the diff is not fixed before merge.

Overview
Updates DeepSeek R1 FP4 disaggregated SGLang benchmarks on MI355X to a May 19 v0.5.12 ROCm image and adjusts MoRI FP8 combine settings so evals pass: prefill uses fp8_direct_cast, decode uses fp8, with combine dtype passed into launch_server (replacing the old SGLANG_MORI_FP8_COMB flag). SGLANG_ENABLE_OVERLAP_PLAN_STREAM is turned off.

Sweep changes add 1×DEP8 + 1×DEP8 points (including conc 512 on the non-MTP 8192 scenario) and retune MTP layouts from smaller TP4 setups toward DEP8 with updated concurrency lists and DECODE_MTP_SIZE. server.sh now derives max bench concurrency from the sweep list and applies DP+EP overrides for max running requests and MoRI dispatch/MoE token limits. perf-changelog.yaml records the config bump.

^{Reviewed by Cursor Bugbot for commit d327d7b. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-05-26T15:52:11Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-26T15:52:11Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-26T15:52:11Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 7b95bd6. Configure here.}

cursor · 2026-05-26T15:52:23Z

    fi
    set +x
-    DECODE_CMD="${DECODE_MORI_MOE_ENV} SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK=${MORI_MAX_DISPATCH_TOKENS_DECODE} python3 -m sglang.launch_server \
+    DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK=${MORI_MAX_DISPATCH_TOKENS_DECODE} python3 -m sglang.launch_server \


Decode combine dtype not expanded

High Severity

On the decode launch command, SGLANG_MORI_COMBINE_DTYPE is set to the literal string {MORI_COMBINE_DTYPE_DECODE} because the $ is missing before the braces. Prefill uses ${MORI_COMBINE_DTYPE_PREFILL} correctly, so decode never gets fp8 from env.sh and blockwise combine settings stay wrong for decode.

^{Reviewed by Cursor Bugbot for commit 7b95bd6. Configure here.}

@claude fix this

claude

Additional findings (outside current diff — PR may have been updated during review):

🔴 benchmarks/multi_node/amd_utils/server.sh:725 — On line 725, the decode launch command sets SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} — the leading $ is missing before the brace, so bash passes the literal string {MORI_COMBINE_DTYPE_DECODE} as the env var value instead of expanding to fp8. The two prefill counterparts at lines 425 and 657 correctly use ${MORI_COMBINE_DTYPE_PREFILL}. This defeats the stated PR purpose (fix dsr1 fp4 eval with fp8 blockwise combine for decode), since the decode worker never receives the fp8 combine_dtype.
Extended reasoning...

What the bug is

On benchmarks/multi_node/amd_utils/server.sh:725, the decode DECODE_CMD string is built with:
```
DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} ..."
```
The leading $ is missing before {MORI_COMBINE_DTYPE_DECODE}. In bash, {NAME} without a leading dollar sign is not variable expansion — it is a literal sequence of curly braces and characters (brace expansion does not apply here either, since there is no comma or .. inside). So when this string is later passed to eval, sglang's decode process is exec'd with the env var literally set to the string {MORI_COMBINE_DTYPE_DECODE} instead of fp8 (the value that env.sh exports for MORI_COMBINE_DTYPE_DECODE).

Why the existing pattern doesn't catch it

The same patch correctly uses ${MORI_COMBINE_DTYPE_PREFILL} (with the $) on the two prefill command-build sites at server.sh:425 and server.sh:657. That confirms the author knew the proper expansion syntax — this is a typo on the decode-only line. Because set +x is in effect when DECODE_CMD is assembled, the rendered command is not echoed, so the bad value is not visible at a glance in logs.

Impact

The whole point of the PR ("Fix the eval result of dsr1 fp4 with fp8 blockwise combine") relies on the decode worker actually receiving SGLANG_MORI_COMBINE_DTYPE=fp8. With this typo, decode receives an invalid literal value and the blockwise combine setting for decode is never applied — either sglang errors out parsing the value, or it silently falls back to a default, producing exactly the broken behavior the PR claims to fix. The two prefill sites are unaffected, so prefill will get fp8_direct_cast correctly, but the eval regression on the decode side will remain.

Step-by-step proof
1. env.sh exports: MORI_COMBINE_DTYPE_DECODE=fp8 (env.sh:44 in the patch).
2. server.sh sources env.sh near the top.
3. At line 725, bash assigns DECODE_CMD from a double-quoted string. Inside double quotes, only $VAR and ${VAR} trigger parameter expansion; {VAR} does not. So DECODE_CMD ends up containing the literal substring SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE}.
4. Around line 756, eval "$DECODE_CMD" runs that string. Bash parses SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} as a command-prefix env assignment of value {MORI_COMBINE_DTYPE_DECODE} (literal 26 chars, including the braces).
5. The sglang decode server therefore sees os.environ['SGLANG_MORI_COMBINE_DTYPE'] == '{MORI_COMBINE_DTYPE_DECODE}', not 'fp8'.
You can reproduce this in any shell:
```
$ MORI_COMBINE_DTYPE_DECODE=fp8
$ CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} env | grep SGLANG_MORI_COMBINE_DTYPE"
$ eval "$CMD"
SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE}
```
Compare with the prefill form (correct):
```
$ MORI_COMBINE_DTYPE_PREFILL=fp8_direct_cast
$ CMD="SGLANG_MORI_COMBINE_DTYPE=${MORI_COMBINE_DTYPE_PREFILL} env | grep SGLANG_MORI_COMBINE_DTYPE"
$ eval "$CMD"
SGLANG_MORI_COMBINE_DTYPE=fp8_direct_cast
```
Fix

Add the missing $ on line 725:
```
-    DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} ...
+    DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE=${MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} ...
```
This was also independently flagged by Cursor Bugbot at the same location with High severity.
_{🔬 also observed by cursor}

github-actions · 2026-05-27T00:21:49Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26459653823
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26459653823

functionstackx · 2026-05-27T02:48:09Z

@billishyahao is this PR ready for review?

github-actions · 2026-05-27T02:58:29Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26488008644
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26488008644

billishyahao · 2026-05-27T03:22:43Z

@billishyahao is this PR ready for review?

yes, please

functionstackx · 2026-05-27T03:23:36Z

@billishyahao is this PR ready for review?

yes, please

@Oseltamivir or @cquil11 can u review this PR?

billishyahao added 6 commits May 13, 2026 15:14

bump image

cd3a2cb

change env accordingly

ffb9b82

fix

60df23c

only conc 512

f5b7263

fix

5dddb4e

fix

7b95bd6

billishyahao requested a review from a team May 26, 2026 15:52

billishyahao requested review from 1am9trash, chunfangamd, seungrokj and yctseng0211 as code owners May 26, 2026 15:52

github-project-automation Bot added this to InferenceMAX Board May 26, 2026

cursor Bot reviewed May 26, 2026

View reviewed changes

fix

4816729

billishyahao added AMD full-sweep-enabled labels May 26, 2026

Merge remote-tracking branch 'inf/main' into amd/mi355x-dsfp4-may12

feffffc

claude Bot reviewed May 26, 2026

View reviewed changes

billishyahao requested review from cquil11 and functionstackx May 26, 2026 16:00

billishyahao mentioned this pull request May 26, 2026

[AMD] add mori blog lm-sys/lm-sys.github.io#336

Open

fix

130d359

Merge remote-tracking branch 'inf/main' into amd/mi355x-dsfp4-may12

d327d7b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Fix eval for dsr1 fp4 #1566

[AMD] Fix eval for dsr1 fp4 #1566
billishyahao wants to merge 10 commits into
mainfrom
amd/mi355x-dsfp4-may12

billishyahao commented May 26, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 26, 2026

Uh oh!

cquil11 May 27, 2026

Uh oh!

claude Bot left a comment

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

functionstackx commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

billishyahao commented May 27, 2026

Uh oh!

functionstackx commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

billishyahao commented May 26, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 26, 2026

Choose a reason for hiding this comment

Decode combine dtype not expanded

Uh oh!

cquil11 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

What the bug is

Why the existing pattern doesn't catch it

Impact

Step-by-step proof

Fix

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

functionstackx commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

billishyahao commented May 27, 2026

Uh oh!

functionstackx commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

billishyahao commented May 26, 2026 •

edited by cursor Bot

Loading