Skip to content

[AMD] Fix eval for dsr1 fp4 #1566

Open
billishyahao wants to merge 10 commits into
mainfrom
amd/mi355x-dsfp4-may12
Open

[AMD] Fix eval for dsr1 fp4 #1566
billishyahao wants to merge 10 commits into
mainfrom
amd/mi355x-dsfp4-may12

Conversation

@billishyahao
Copy link
Copy Markdown
Collaborator

@billishyahao billishyahao commented May 26, 2026

This patch is to

  1. Fix the eval result of dsr1 fp4 with fp8 blockwise combine
  2. Bump the image to May 19
  3. Add conc 512 new sweep points

Note

Medium Risk
Benchmark and serving env changes affect published perf/eval numbers; decode launch line may use a literal {MORI_COMBINE_DTYPE_DECODE} if the missing $ in the diff is not fixed before merge.

Overview
Updates DeepSeek R1 FP4 disaggregated SGLang benchmarks on MI355X to a May 19 v0.5.12 ROCm image and adjusts MoRI FP8 combine settings so evals pass: prefill uses fp8_direct_cast, decode uses fp8, with combine dtype passed into launch_server (replacing the old SGLANG_MORI_FP8_COMB flag). SGLANG_ENABLE_OVERLAP_PLAN_STREAM is turned off.

Sweep changes add 1×DEP8 + 1×DEP8 points (including conc 512 on the non-MTP 8192 scenario) and retune MTP layouts from smaller TP4 setups toward DEP8 with updated concurrency lists and DECODE_MTP_SIZE. server.sh now derives max bench concurrency from the sweep list and applies DP+EP overrides for max running requests and MoRI dispatch/MoE token limits. perf-changelog.yaml records the config bump.

Reviewed by Cursor Bugbot for commit d327d7b. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

2 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7b95bd6. Configure here.

fi
set +x
DECODE_CMD="${DECODE_MORI_MOE_ENV} SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK=${MORI_MAX_DISPATCH_TOKENS_DECODE} python3 -m sglang.launch_server \
DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK=${MORI_MAX_DISPATCH_TOKENS_DECODE} python3 -m sglang.launch_server \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decode combine dtype not expanded

High Severity

On the decode launch command, SGLANG_MORI_COMBINE_DTYPE is set to the literal string {MORI_COMBINE_DTYPE_DECODE} because the $ is missing before the braces. Prefill uses ${MORI_COMBINE_DTYPE_PREFILL} correctly, so decode never gets fp8 from env.sh and blockwise combine settings stay wrong for decode.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7b95bd6. Configure here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude fix this

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🔴 benchmarks/multi_node/amd_utils/server.sh:725 — On line 725, the decode launch command sets SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} — the leading $ is missing before the brace, so bash passes the literal string {MORI_COMBINE_DTYPE_DECODE} as the env var value instead of expanding to fp8. The two prefill counterparts at lines 425 and 657 correctly use ${MORI_COMBINE_DTYPE_PREFILL}. This defeats the stated PR purpose (fix dsr1 fp4 eval with fp8 blockwise combine for decode), since the decode worker never receives the fp8 combine_dtype.

    Extended reasoning...

    What the bug is

    On benchmarks/multi_node/amd_utils/server.sh:725, the decode DECODE_CMD string is built with:

    DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} ..."

    The leading $ is missing before {MORI_COMBINE_DTYPE_DECODE}. In bash, {NAME} without a leading dollar sign is not variable expansion — it is a literal sequence of curly braces and characters (brace expansion does not apply here either, since there is no comma or .. inside). So when this string is later passed to eval, sglang's decode process is exec'd with the env var literally set to the string {MORI_COMBINE_DTYPE_DECODE} instead of fp8 (the value that env.sh exports for MORI_COMBINE_DTYPE_DECODE).

    Why the existing pattern doesn't catch it

    The same patch correctly uses ${MORI_COMBINE_DTYPE_PREFILL} (with the $) on the two prefill command-build sites at server.sh:425 and server.sh:657. That confirms the author knew the proper expansion syntax — this is a typo on the decode-only line. Because set +x is in effect when DECODE_CMD is assembled, the rendered command is not echoed, so the bad value is not visible at a glance in logs.

    Impact

    The whole point of the PR ("Fix the eval result of dsr1 fp4 with fp8 blockwise combine") relies on the decode worker actually receiving SGLANG_MORI_COMBINE_DTYPE=fp8. With this typo, decode receives an invalid literal value and the blockwise combine setting for decode is never applied — either sglang errors out parsing the value, or it silently falls back to a default, producing exactly the broken behavior the PR claims to fix. The two prefill sites are unaffected, so prefill will get fp8_direct_cast correctly, but the eval regression on the decode side will remain.

    Step-by-step proof

    1. env.sh exports: MORI_COMBINE_DTYPE_DECODE=fp8 (env.sh:44 in the patch).
    2. server.sh sources env.sh near the top.
    3. At line 725, bash assigns DECODE_CMD from a double-quoted string. Inside double quotes, only $VAR and ${VAR} trigger parameter expansion; {VAR} does not. So DECODE_CMD ends up containing the literal substring SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE}.
    4. Around line 756, eval "$DECODE_CMD" runs that string. Bash parses SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} as a command-prefix env assignment of value {MORI_COMBINE_DTYPE_DECODE} (literal 26 chars, including the braces).
    5. The sglang decode server therefore sees os.environ['SGLANG_MORI_COMBINE_DTYPE'] == '{MORI_COMBINE_DTYPE_DECODE}', not 'fp8'.

    You can reproduce this in any shell:

    $ MORI_COMBINE_DTYPE_DECODE=fp8
    $ CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} env | grep SGLANG_MORI_COMBINE_DTYPE"
    $ eval "$CMD"
    SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE}

    Compare with the prefill form (correct):

    $ MORI_COMBINE_DTYPE_PREFILL=fp8_direct_cast
    $ CMD="SGLANG_MORI_COMBINE_DTYPE=${MORI_COMBINE_DTYPE_PREFILL} env | grep SGLANG_MORI_COMBINE_DTYPE"
    $ eval "$CMD"
    SGLANG_MORI_COMBINE_DTYPE=fp8_direct_cast

    Fix

    Add the missing $ on line 725:

    -    DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} ...
    +    DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE=${MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} ...

    This was also independently flagged by Cursor Bugbot at the same location with High severity.

    🔬 also observed by cursor

@github-actions
Copy link
Copy Markdown
Contributor

@functionstackx
Copy link
Copy Markdown
Collaborator

@billishyahao is this PR ready for review?

@github-actions
Copy link
Copy Markdown
Contributor

@billishyahao
Copy link
Copy Markdown
Collaborator Author

@billishyahao is this PR ready for review?

yes, please

@functionstackx
Copy link
Copy Markdown
Collaborator

@billishyahao is this PR ready for review?

yes, please

@Oseltamivir or @cquil11 can u review this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants