Update Minimax2.5 H100#484
Conversation
Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Code Review
This pull request updates the MiniMax-M2.5 model configuration by adopting the minimax_m2_append_think reasoning parser, removing the cudagraph_mode: PIECEWISE parameter, and increasing the tensor parallel size for H100 configurations. Review feedback highlights several inconsistencies where these updates were not applied to all relevant examples in the guide, specifically the NVIDIA H200 and AMD ROCm sections. Additionally, a correction was suggested for the H100 heading to align with the established naming convention.
| - "--trust-remote-code" | ||
| - "--compilation-config" | ||
| - '{"mode":3,"cudagraph_mode":"PIECEWISE","pass_config":{"fuse_minimax_qk_norm":true}}' | ||
| - '{"mode":3,"pass_config":{"fuse_minimax_qk_norm":true}}' |
| args: | ||
| - "--reasoning-parser" | ||
| - "minimax_m2" | ||
| - "minimax_m2_append_think" |
| Pure TP8 is not supported. For >4 GPUs use DP+EP or TP+EP. | ||
|
|
||
| ### TP4+EP (recommended for H100) | ||
| ### TEP=8 (recommended for H100) |
Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
|
NVFP4 H100 configs from https://github.com/SemiAnalysisAI/InferenceX/pull/1517/changes |
|
Could you clarify why we need to change the reasoning parser? |
|
I think its to add support for thinking. |
Update the MiniMax-M2.5 recipe to match the new H100 FP8 launch flags validated in SemiAnalysisAI/InferenceX#1516: drop the explicit cudagraph_mode from the compilation-config, switch the reasoning parser to minimax_m2_append_think, and document TEP=8 (tensor-parallel-size 8 with --enable-expert-parallel) as the recommended single-node H100 strategy in place of TP4+EP.