Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 19 additions & 8 deletions models/MiniMaxAI/MiniMax-M2.5.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ meta:
slug: "minimax-m2.5"
provider: "MiniMax"
description: "MiniMax M2.5 MoE language model (230B total / 10B active) for coding, agent toolchains, and long-context reasoning — native FP8 checkpoint"
date_updated: 2026-05-18
date_updated: 2026-05-21
difficulty: intermediate
tasks:
- text
Expand All @@ -27,7 +27,7 @@ model:
base_args:
- "--trust-remote-code"
- "--compilation-config"
- '{"mode":3,"cudagraph_mode":"PIECEWISE","pass_config":{"fuse_minimax_qk_norm":true}}'
- '{"mode":3,"pass_config":{"fuse_minimax_qk_norm":true}}'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The removal of cudagraph_mode: PIECEWISE should be applied consistently across all examples in the guide section. While it has been removed here and in the Docker example (line 155), the NVIDIA H200 example at line 179 still includes this parameter.

base_env: {}

features:
Expand All @@ -41,7 +41,7 @@ features:
description: "MiniMax M2 reasoning parser for chain-of-thought extraction"
args:
- "--reasoning-parser"
- "minimax_m2"
- "minimax_m2_append_think"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The update to the minimax_m2_append_think reasoning parser should be applied consistently across all examples in the guide section. Currently, the NVIDIA H200 example (line 178) and the AMD ROCm example (line 204) still reference the old minimax_m2 parser.


opt_in_features: []

Expand Down Expand Up @@ -150,9 +150,9 @@ guide: |
--attention-backend FLASHINFER \
--enable-flashinfer-autotune \
--tool-call-parser minimax_m2 \
--reasoning-parser minimax_m2 \
--reasoning-parser minimax_m2_append_think \
--enable-auto-tool-choice \
--compilation-config '{"mode":3,"cudagraph_mode":"PIECEWISE","pass_config":{"fuse_minimax_qk_norm":true}}' \
--compilation-config '{"mode":3,"pass_config":{"fuse_minimax_qk_norm":true}}' \
--trust-remote-code
```

Expand Down Expand Up @@ -183,18 +183,29 @@ guide: |

Pure TP8 is not supported. For >4 GPUs use DP+EP or TP+EP.

### TP4+EP (recommended for H100)
### TEP=8 (recommended for H100)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The heading TEP=8 is inconsistent with the previous heading style (TP4+EP) and the terminology used in the preceding text (line 184: TP+EP). Consider using TP8+EP to maintain consistency with the rest of the document.

  ### TP8+EP (recommended for H100)


```bash
vllm serve MiniMaxAI/MiniMax-M2.5 \
--tensor-parallel-size 4 \
--tensor-parallel-size 8 \
--enable-expert-parallel \
--tool-call-parser minimax_m2 \
--reasoning-parser minimax_m2 \
--reasoning-parser minimax_m2_append_think \
--compilation-config '{"mode":3,"pass_config":{"fuse_minimax_qk_norm":true}}' \
--enable-auto-tool-choice
```

### H100 NVFP4 (TP=8)

```bash
vllm serve nvidia/MiniMax-M2.5-NVFP4 \
--tensor-parallel-size 8 \
--tool-call-parser minimax_m2 \
--reasoning-parser minimax_m2_append_think \
--enable-auto-tool-choice \
--trust-remote-code
```

### AMD ROCm

```bash
Expand Down