Skip to content

Update MiniMax M2.5 H200 recipe#474

Merged
esmeetu merged 1 commit into
vllm-project:mainfrom
anish-shanbhag:codex/minimax-m25-h200-recipe
May 20, 2026
Merged

Update MiniMax M2.5 H200 recipe#474
esmeetu merged 1 commit into
vllm-project:mainfrom
anish-shanbhag:codex/minimax-m25-h200-recipe

Conversation

@anish-shanbhag
Copy link
Copy Markdown
Contributor

@anish-shanbhag anish-shanbhag commented May 18, 2026

Summary: Mark MiniMax-M2.5 verified on H200 and align the recipe with SemiAnalysisAI/InferenceX#1354. Pin vLLM image/min version to v0.20.2 and add the FP8 KV cache, FlashInfer attention/autotune, Triton MoE, and MiniMax QK norm fusion settings.

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
vllm-recipes Ready Ready Preview, Comment May 18, 2026 7:47pm

Request Review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the MiniMax-M2.5 model configuration, bumping the vLLM version to 0.20.2, adding H200 hardware support, and introducing performance-optimizing environment variables and hardware overrides for Hopper. Feedback indicates that the Docker usage example in the guide should be updated to include the new environment variables via '-e' flags to ensure consistency with the manual execution instructions.

Comment thread models/MiniMaxAI/MiniMax-M2.5.yaml
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
@esmeetu
Copy link
Copy Markdown
Member

esmeetu commented May 20, 2026

LGTM. Thanks!

@esmeetu esmeetu merged commit b3f013c into vllm-project:main May 20, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants