Skip to content

Update Minimax2.5 H100#484

Open
faradawn wants to merge 2 commits into
vllm-project:mainfrom
faradawn:minimax-m25-h100-fp8
Open

Update Minimax2.5 H100#484
faradawn wants to merge 2 commits into
vllm-project:mainfrom
faradawn:minimax-m25-h100-fp8

Conversation

@faradawn
Copy link
Copy Markdown
Collaborator

Update the MiniMax-M2.5 recipe to match the new H100 FP8 launch flags validated in SemiAnalysisAI/InferenceX#1516: drop the explicit cudagraph_mode from the compilation-config, switch the reasoning parser to minimax_m2_append_think, and document TEP=8 (tensor-parallel-size 8 with --enable-expert-parallel) as the recommended single-node H100 strategy in place of TP4+EP.

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
vllm-recipes Ready Ready Preview, Comment May 21, 2026 11:42pm

Request Review

@faradawn faradawn changed the title update Minimax2.5 fp8 h100 Update Minimax2.5 H100 May 21, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the MiniMax-M2.5 model configuration by adopting the minimax_m2_append_think reasoning parser, removing the cudagraph_mode: PIECEWISE parameter, and increasing the tensor parallel size for H100 configurations. Review feedback highlights several inconsistencies where these updates were not applied to all relevant examples in the guide, specifically the NVIDIA H200 and AMD ROCm sections. Additionally, a correction was suggested for the H100 heading to align with the established naming convention.

- "--trust-remote-code"
- "--compilation-config"
- '{"mode":3,"cudagraph_mode":"PIECEWISE","pass_config":{"fuse_minimax_qk_norm":true}}'
- '{"mode":3,"pass_config":{"fuse_minimax_qk_norm":true}}'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The removal of cudagraph_mode: PIECEWISE should be applied consistently across all examples in the guide section. While it has been removed here and in the Docker example (line 155), the NVIDIA H200 example at line 179 still includes this parameter.

args:
- "--reasoning-parser"
- "minimax_m2"
- "minimax_m2_append_think"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The update to the minimax_m2_append_think reasoning parser should be applied consistently across all examples in the guide section. Currently, the NVIDIA H200 example (line 178) and the AMD ROCm example (line 204) still reference the old minimax_m2 parser.

Pure TP8 is not supported. For >4 GPUs use DP+EP or TP+EP.

### TP4+EP (recommended for H100)
### TEP=8 (recommended for H100)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The heading TEP=8 is inconsistent with the previous heading style (TP4+EP) and the terminology used in the preceding text (line 184: TP+EP). Consider using TP8+EP to maintain consistency with the rest of the document.

  ### TP8+EP (recommended for H100)

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
@faradawn
Copy link
Copy Markdown
Collaborator Author

@esmeetu
Copy link
Copy Markdown
Member

esmeetu commented May 22, 2026

Could you clarify why we need to change the reasoning parser?

@faradawn
Copy link
Copy Markdown
Collaborator Author

I think its to add support for thinking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants