Skip to content

[AMD] add mori blog#336

Open
billishyahao wants to merge 3 commits into
lm-sys:mainfrom
billishyahao:add_mori_blog
Open

[AMD] add mori blog#336
billishyahao wants to merge 3 commits into
lm-sys:mainfrom
billishyahao:add_mori_blog

Conversation

@billishyahao
Copy link
Copy Markdown

Add mori sglang blog
cc @HaiShaw @Duyi-Wang

Copy link
Copy Markdown
Contributor

@HaiShaw HaiShaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HaiShaw
Copy link
Copy Markdown
Contributor

HaiShaw commented May 26, 2026

@merrymercy and @wisclmy0611 please help review.

@functionstackx
Copy link
Copy Markdown

nice blog tho one thing missing from ur blog is that it fails simple grade school math as @billishyahao & @HaiShaw has root caused this to yall using FP8 direct cast EP combine instead of quant aware FP8 EP combine. you probably wanna fix that before u publish

image image

@functionstackx
Copy link
Copy Markdown

functionstackx commented May 26, 2026

i would recommend shipping the fix into production InferenceX repo first and then redoing the screenshots before shipping

@billishyahao
Copy link
Copy Markdown
Author

Hi @functionstackx Thanks for the reminders. We have fixed it through adding fp8 blockwise support for mori ROCm/mori#311 and sglang side change sgl-project/sglang#24879 and use this new function to resolve the accuracy issue through SemiAnalysisAI/InferenceX#1566

nice blog tho one thing missing from ur blog is that it fails simple grade school math as @billishyahao & @HaiShaw has root caused this to yall using FP8 direct cast EP combine instead of quant aware FP8 EP combine. you probably wanna fix that before u publish

image image

Comment thread blog/2026-05-28-mori.md
Comment on lines +32 to +35
<img src="/images/blog/mori/curve1.png"
style="display: block; margin: 20px auto 0; width: 75%; max-width: 100%; height: auto;">

*Figure 2: Full pareto curve — throughput vs interactivity for AMD Instinct™ MI355X and B200 configurations*
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these r the curves using direct cast fp8 EP combine where gsm8k accuracy is not good, the FP8 blocksize EP combine is probably slightly worse perf compared to direct cast fp8

I would recommend shipping InferenceX that fixes accuracy and updating the the curve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants