[AMD] Fix eval for dsr1 fp4 #1566
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
2 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 7b95bd6. Configure here.
| fi | ||
| set +x | ||
| DECODE_CMD="${DECODE_MORI_MOE_ENV} SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK=${MORI_MAX_DISPATCH_TOKENS_DECODE} python3 -m sglang.launch_server \ | ||
| DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK=${MORI_MAX_DISPATCH_TOKENS_DECODE} python3 -m sglang.launch_server \ |
There was a problem hiding this comment.
Decode combine dtype not expanded
High Severity
On the decode launch command, SGLANG_MORI_COMBINE_DTYPE is set to the literal string {MORI_COMBINE_DTYPE_DECODE} because the $ is missing before the braces. Prefill uses ${MORI_COMBINE_DTYPE_PREFILL} correctly, so decode never gets fp8 from env.sh and blockwise combine settings stay wrong for decode.
Reviewed by Cursor Bugbot for commit 7b95bd6. Configure here.
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🔴
benchmarks/multi_node/amd_utils/server.sh:725— On line 725, the decode launch command setsSGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE}— the leading$is missing before the brace, so bash passes the literal string{MORI_COMBINE_DTYPE_DECODE}as the env var value instead of expanding tofp8. The two prefill counterparts at lines 425 and 657 correctly use${MORI_COMBINE_DTYPE_PREFILL}. This defeats the stated PR purpose (fix dsr1 fp4 eval with fp8 blockwise combine for decode), since the decode worker never receives the fp8 combine_dtype.Extended reasoning...
What the bug is
On
benchmarks/multi_node/amd_utils/server.sh:725, the decodeDECODE_CMDstring is built with:DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} ..."The leading
$is missing before{MORI_COMBINE_DTYPE_DECODE}. In bash,{NAME}without a leading dollar sign is not variable expansion — it is a literal sequence of curly braces and characters (brace expansion does not apply here either, since there is no comma or..inside). So when this string is later passed toeval, sglang's decode process is exec'd with the env var literally set to the string{MORI_COMBINE_DTYPE_DECODE}instead offp8(the value thatenv.shexports forMORI_COMBINE_DTYPE_DECODE).Why the existing pattern doesn't catch it
The same patch correctly uses
${MORI_COMBINE_DTYPE_PREFILL}(with the$) on the two prefill command-build sites atserver.sh:425andserver.sh:657. That confirms the author knew the proper expansion syntax — this is a typo on the decode-only line. Becauseset +xis in effect whenDECODE_CMDis assembled, the rendered command is not echoed, so the bad value is not visible at a glance in logs.Impact
The whole point of the PR ("Fix the eval result of dsr1 fp4 with fp8 blockwise combine") relies on the decode worker actually receiving
SGLANG_MORI_COMBINE_DTYPE=fp8. With this typo, decode receives an invalid literal value and the blockwise combine setting for decode is never applied — either sglang errors out parsing the value, or it silently falls back to a default, producing exactly the broken behavior the PR claims to fix. The two prefill sites are unaffected, so prefill will getfp8_direct_castcorrectly, but the eval regression on the decode side will remain.Step-by-step proof
env.shexports:MORI_COMBINE_DTYPE_DECODE=fp8(env.sh:44 in the patch).server.shsourcesenv.shnear the top.- At line 725, bash assigns
DECODE_CMDfrom a double-quoted string. Inside double quotes, only$VARand${VAR}trigger parameter expansion;{VAR}does not. SoDECODE_CMDends up containing the literal substringSGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE}. - Around line 756,
eval "$DECODE_CMD"runs that string. Bash parsesSGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE}as a command-prefix env assignment of value{MORI_COMBINE_DTYPE_DECODE}(literal 26 chars, including the braces). - The sglang decode server therefore sees
os.environ['SGLANG_MORI_COMBINE_DTYPE'] == '{MORI_COMBINE_DTYPE_DECODE}', not'fp8'.
You can reproduce this in any shell:
$ MORI_COMBINE_DTYPE_DECODE=fp8 $ CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} env | grep SGLANG_MORI_COMBINE_DTYPE" $ eval "$CMD" SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE}
Compare with the prefill form (correct):
$ MORI_COMBINE_DTYPE_PREFILL=fp8_direct_cast $ CMD="SGLANG_MORI_COMBINE_DTYPE=${MORI_COMBINE_DTYPE_PREFILL} env | grep SGLANG_MORI_COMBINE_DTYPE" $ eval "$CMD" SGLANG_MORI_COMBINE_DTYPE=fp8_direct_cast
Fix
Add the missing
$on line 725:- DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE={MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} ... + DECODE_CMD="SGLANG_MORI_COMBINE_DTYPE=${MORI_COMBINE_DTYPE_DECODE} ${DECODE_MORI_MOE_ENV} ...
This was also independently flagged by Cursor Bugbot at the same location with High severity.
🔬 also observed by cursor
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26459653823 |
|
@billishyahao is this PR ready for review? |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26488008644 |
yes, please |
@Oseltamivir or @cquil11 can u review this PR? |


This patch is to
Note
Medium Risk
Benchmark and serving env changes affect published perf/eval numbers; decode launch line may use a literal
{MORI_COMBINE_DTYPE_DECODE}if the missing$in the diff is not fixed before merge.Overview
Updates DeepSeek R1 FP4 disaggregated SGLang benchmarks on MI355X to a May 19 v0.5.12 ROCm image and adjusts MoRI FP8 combine settings so evals pass: prefill uses
fp8_direct_cast, decode usesfp8, with combine dtype passed intolaunch_server(replacing the oldSGLANG_MORI_FP8_COMBflag).SGLANG_ENABLE_OVERLAP_PLAN_STREAMis turned off.Sweep changes add 1×DEP8 + 1×DEP8 points (including conc 512 on the non-MTP 8192 scenario) and retune MTP layouts from smaller TP4 setups toward DEP8 with updated concurrency lists and
DECODE_MTP_SIZE.server.shnow derives max bench concurrency from the sweep list and applies DP+EP overrides for max running requests and MoRI dispatch/MoE token limits.perf-changelog.yamlrecords the config bump.Reviewed by Cursor Bugbot for commit d327d7b. Bugbot is set up for automated code reviews on this repo. Configure here.