[AMD] add mori blog#336
Conversation
|
@merrymercy and @wisclmy0611 please help review. |
|
nice blog tho one thing missing from ur blog is that it fails simple grade school math as @billishyahao & @HaiShaw has root caused this to yall using FP8 direct cast EP combine instead of quant aware FP8 EP combine. you probably wanna fix that before u publish
|
|
i would recommend shipping the fix into production InferenceX repo first and then redoing the screenshots before shipping |
|
Hi @functionstackx Thanks for the reminders. We have fixed it through adding fp8 blockwise support for mori ROCm/mori#311 and sglang side change sgl-project/sglang#24879 and use this new function to resolve the accuracy issue through SemiAnalysisAI/InferenceX#1566
|
| <img src="/images/blog/mori/curve1.png" | ||
| style="display: block; margin: 20px auto 0; width: 75%; max-width: 100%; height: auto;"> | ||
|
|
||
| *Figure 2: Full pareto curve — throughput vs interactivity for AMD Instinct™ MI355X and B200 configurations* |
There was a problem hiding this comment.
these r the curves using direct cast fp8 EP combine where gsm8k accuracy is not good, the FP8 blocksize EP combine is probably slightly worse perf compared to direct cast fp8
I would recommend shipping InferenceX that fixes accuracy and updating the the curve




Add mori sglang blog
cc @HaiShaw @Duyi-Wang