[Fix] Remove MoRI-IO patches from vLLM Disagg benchmarks #1585
[Fix] Remove MoRI-IO patches from vLLM Disagg benchmarks #1585simondanielsson wants to merge 8 commits into
Conversation
…m image Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
908939c to
89b9243
Compare
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
| image: vllm/vllm-openai-rocm:nightly-bf610c2f56764e1b30bc6065f4ceace3d6e59036 | ||
| # TODO(simondanielsson): change to pinned version once https://github.com/vllm-project/vllm/pull/40344 | ||
| # is part of official release, likely 0.22.0. | ||
| image: vllm/vllm-openai-rocm:nightly |
There was a problem hiding this comment.
can u have an specific night hash pinned here here instead of just the generic "nightly"
There was a problem hiding this comment.
We can! The caveat (and why I didn't pin it here) was that the pinned nightlies are pruned from docker hub after 14 days so the config will then cease working in 2 weeks. WDYT?
|
|
||
| minimaxm2.5-fp8-mi355x-vllm-disagg: | ||
| image: vllm/vllm-openai-rocm:nightly-a6682d1d259cca69a9ae737ea5608fbbe7520031 | ||
| # TODO(simondanielsson): change to pinned version once https://github.com/vllm-project/vllm/pull/40344 |
There was a problem hiding this comment.
can u have an specific night hash pinned here here instead of just the generic "nightly"
|
Claude finished @simondanielsson's task in 2m 25s —— View job PR Review: Remove MoRI-IO patches from vLLM Disagg benchmarks
SummaryClean removal of ~555 lines of MoRI-IO monkey patches now that they've been upstreamed in vllm#40344. The three changes (backend rename One blocking issue: The unpinned |
|
Run is succeeding, and perf is very similar to the existing Kimi vllm-disagg results (with some noise of course) |
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 3c94a6f. Configure here.
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

These patches were upstreamed in vllm-project/vllm#40344 so we can use the nightly image instead.
Switching to nightly also requires us to:
moritomori_low_latencyread_mode=1flag.Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26813329592
Results from this run are very similar to the existing Kimi vllm disagg results, as expected
Note
Low Risk
Benchmark and container bootstrap only; no application auth or production serving paths, with behavior intended to match prior patched runs via upstream vLLM.
Overview
Moves MI355X vLLM disaggregated Kimi K2.5 (FP4) and MiniMax M2.5 (FP8) benchmarks onto a newer vLLM ROCm nightly that includes upstream MoRI-IO fixes (vllm#40344), so the large runtime Python patches in
setup_deps.share removed.Config and launch:
amd-master.yamldrops per-scenarioVLLM_MORIIO_CONNECTOR_READ_MODEsettings;models_vllm.yamlswitches MoE all2all frommoritomori_low_latency; prefill/decode/consumerkv-transfer-confignow setsread_mode: trueinkv_connector_extra_config.job.slurm/submit.shno longer pass the old read-mode env var; default vllm-router image is bumped.perf-changelog.yamldocuments the change.Reviewed by Cursor Bugbot for commit f3b4132. Bugbot is set up for automated code reviews on this repo. Configure here.