[Do not merge] Add the Deepseek-V4-Pro supported on MI355x by wuhuikx · Pull Request #433 · vllm-project/recipes

wuhuikx · 2026-05-01T10:18:31Z

No description provided.

vercel · 2026-05-01T10:18:36Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
vllm-recipes	Ready	Preview, Comment	May 7, 2026 2:51am

gemini-code-assist

Code Review

This pull request introduces a comprehensive usage guide for running DeepSeek-V4 on AMD ROCm hardware (specifically MI355X) and updates the DeepSeek-V4-Pro model configuration to include verified AMD support and optimized hardware overrides. The review feedback focuses on improving the portability and consistency of the documentation, including updating installation commands to modern standards, ensuring path consistency for profilers and caches, and correcting model name mismatches in example code.

gemini-code-assist · 2026-05-01T10:19:55Z

+cd vllm
+git fetch origin pull/40871/head:pr_dsv4
+git checkout pr_dsv4
+python3 setup.py develop


Using python3 setup.py develop is deprecated in favor of pip install -e .. It is recommended to use the modern approach for installing the package in editable mode.

Suggested change

python3 setup.py develop

pip install -e .

gemini-code-assist · 2026-05-01T10:19:56Z

+  --max-num-batched-tokens ${max_num_batched_tokens} \
+  --distributed-executor-backend mp \
+  --trust-remote-code \
+  --profiler-config '{"profiler":"torch","torch_profiler_dir":"./vllm_profile"}' \


There is a mismatch between the environment variable VLLM_TORCH_PROFILER_DIR (set to /app/vllm_profile on line 45) and the path provided in --profiler-config (./vllm_profile). It is better to use a consistent path to avoid confusion and ensure profiles are stored in the expected location.

Suggested change

--profiler-config '{"profiler":"torch","torch_profiler_dir":"./vllm_profile"}' \

--profiler-config '{"profiler":"torch","torch_profiler_dir":"/app/vllm_profile"}' \

gemini-code-assist · 2026-05-01T10:19:56Z

+```bash
+MODEL=/home/models/DeepSeek-V4-Flash
+lm_eval --model local-completions \
+  --model_args model=$MODEL,base_url=http://0.0.0.0:8001/v1/completions,num_concurrent=4,max_retries=10,max_gen_toks=2048,timeout=60000 \


Using 0.0.0.0 as a connection address is non-standard and may not work as expected on all operating systems. It is safer to use 127.0.0.1 or localhost for connecting to a service running on the same machine.

Suggested change

--model_args model=$MODEL,base_url=http://0.0.0.0:8001/v1/completions,num_concurrent=4,max_retries=10,max_gen_toks=2048,timeout=60000 \

--model_args model=$MODEL,base_url=http://127.0.0.1:8001/v1/completions,num_concurrent=4,max_retries=10,max_gen_toks=2048,timeout=60000 \

gemini-code-assist · 2026-05-01T10:19:56Z

+export HF_HOME=/data/huggingface-cache
+export VLLM_ROCM_USE_AITER=1
+export VLLM_ROCM_USE_AITER_LINEAR=1
+rm -rf /root/.cache/vllm/torch_compile_cache


Hardcoding the /root/ directory is not portable and will fail if the user is not running as the root user. Using ~/.cache or $HOME/.cache is a more robust approach.

Suggested change

rm -rf /root/.cache/vllm/torch_compile_cache

rm -rf ~/.cache/vllm/torch_compile_cache

gemini-code-assist · 2026-05-01T10:19:56Z

+```bash
+MODEL=/home/models/DeepSeek-V4-Pro
+lm_eval --model local-completions \
+  --model_args model=$MODEL,base_url=http://0.0.0.0:8001/v1/completions,num_concurrent=2,max_retries=10,max_gen_toks=2048,timeout=60000 \


Using 0.0.0.0 as a connection address is non-standard. It is recommended to use 127.0.0.1 or localhost for local connections.

Suggested change

--model_args model=$MODEL,base_url=http://0.0.0.0:8001/v1/completions,num_concurrent=2,max_retries=10,max_gen_toks=2048,timeout=60000 \

--model_args model=$MODEL,base_url=http://127.0.0.1:8001/v1/completions,num_concurrent=2,max_retries=10,max_gen_toks=2048,timeout=60000 \

gemini-code-assist · 2026-05-01T10:19:56Z

+from openai import OpenAI
+
+client = OpenAI(base_url="http://localhost:8001/v1", api_key="EMPTY")
+model = "deepseek-ai/DeepSeek-V4-Pro"


The model name used in the OpenAI client example (deepseek-ai/DeepSeek-V4-Pro) does not match the model path used to launch the server (/home/models/DeepSeek-V4-Pro on line 139). vLLM requires the model name in the request to match the name or path provided at startup unless --served-model-name is used.

Suggested change

model = "deepseek-ai/DeepSeek-V4-Pro"

model = "/home/models/DeepSeek-V4-Pro"

Signed-off-by: wuhuikx <hattie.wu@amd.com>

The recipe (vllm-project/recipes#433) specifies --moe-backend triton_unfused, but that choice was never accepted into vLLM main — likely it lived on the #40871 PR branch and was renamed/removed before merge. In vllm/vllm-openai-rocm:nightly (which the recipe itself uses), the legal choices are: aiter, auto, cutlass, deep_gemm, emulation, flashinfer_cutedsl, flashinfer_cutlass, flashinfer_trtllm, marlin, triton. Drop the flag entirely and let vLLM's `auto` selector pick the backend. With VLLM_ROCM_USE_AITER=1 set, that resolves to the AITER MoE path on ROCm — the same kernel family the recipe was steering toward. All other remaining flags and env vars verified valid in vLLM 0.20.2.

I dropped --moe-backend triton_unfused based on a stale error message ("invalid choice ... choose from aiter, auto, ...") from the previous run, but that error came from the cached squashfs of an April 25 build that pre-dated #40871. The pinned nightly-dcacdf9a8860a8640 DOES have triton_unfused in MoEBackend — verified by reading vllm/config/kernel.py at that exact commit on GitHub. Without --moe-backend triton_unfused, vLLM's auto selector picks a backend that doesn't register w13_weight_scale / w2_weight_scale on the FP4 expert layers, so safetensors loading throws: KeyError: 'layers.0.ffn.experts.w13_weight_scale' at vllm/model_executor/models/deepseek_v4.py:1492 This matches the recipe (vllm-project/recipes#433) line-for-line now, with the only intentional deviations being InferenceX conventions: * --max-model-len $MAX_MODEL_LEN (sized to ISL+OSL+256) * --no-enable-prefix-caching (fair benchmark comparisons) * VLLM_ENGINE_READY_TIMEOUT_S=3600 (cold HF-cache tolerance) None of those interact with weight loading; they were not implicated in either failure.

vercel Bot deployed to Preview May 1, 2026 10:19 View deployment

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

vercel Bot deployed to Preview May 1, 2026 10:25 View deployment

wuhuikx force-pushed the hattiw/deepseek-v4 branch from df89987 to ff6cd47 Compare May 1, 2026 10:26

vercel Bot deployed to Preview May 1, 2026 10:27 View deployment

vercel Bot deployed to Preview May 1, 2026 10:34 View deployment

vercel Bot deployed to Preview May 1, 2026 10:40 View deployment

vercel Bot deployed to Preview May 1, 2026 12:42 View deployment

vercel Bot deployed to Preview May 1, 2026 13:13 View deployment

vercel Bot deployed to Preview May 1, 2026 13:20 View deployment

vercel Bot deployed to Preview May 1, 2026 16:37 View deployment

wuhuikx mentioned this pull request May 6, 2026

[Performance]: Deepseek-V4 Support and Optimization on ROCm Backend vllm-project/vllm#41820

Open

22 tasks

vercel Bot deployed to Preview May 6, 2026 15:11 View deployment

wuhuikx added 9 commits May 6, 2026 10:11

Add the Deepseek-V4-Pro supported on MI355x

c887b5e

Signed-off-by: wuhuikx <hattie.wu@amd.com>

Update the Deepseek-V4-Flash support

fcd1e7e

Signed-off-by: wuhuikx <hattie.wu@amd.com>

Update the feature matrix

2c8b04b

Signed-off-by: wuhuikx <hattie.wu@amd.com>

Update the recipe

038809e

Signed-off-by: wuhuikx <hattie.wu@amd.com>

Update the smoking test result

3d41f2e

Signed-off-by: wuhuikx <hattie.wu@amd.com>

Update the gms8k result

9271152

Signed-off-by: wuhuikx <hattie.wu@amd.com>

Remove the smoke result

0e0ada6

Signed-off-by: wuhuikx <hattie.wu@amd.com>

Add docker info

a6f74b8

Signed-off-by: wuhuikx <hattie.wu@amd.com>

Update the docker to vllm/vllm-open-rocm:nightly

13db197

Signed-off-by: wuhuikx <hattie.wu@amd.com>

wuhuikx force-pushed the hattiw/deepseek-v4 branch from cebf4c1 to 13db197 Compare May 6, 2026 15:12

vercel Bot deployed to Preview May 6, 2026 15:13 View deployment

Update the docker to vllm/vllm-openai-rocm:nightly

d6dc5cc

Signed-off-by: wuhuikx <hattie.wu@amd.com>

vercel Bot deployed to Preview May 7, 2026 02:51 View deployment

This was referenced May 13, 2026

Improve dsv4-fp8-mi355x-vllm with vllm-project/recipes#433 MI355X recipe SemiAnalysisAI/InferenceX#1373

Closed

dsv4-fp4-mi355x-vllm and adopt recipes#433 SemiAnalysisAI/InferenceX#1374

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Do not merge] Add the Deepseek-V4-Pro supported on MI355x#433

[Do not merge] Add the Deepseek-V4-Pro supported on MI355x#433
wuhuikx wants to merge 10 commits into
vllm-project:mainfrom
wuhuikx:hattiw/deepseek-v4

wuhuikx commented May 1, 2026

Uh oh!

vercel Bot commented May 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	--profiler-config '{"profiler":"torch","torch_profiler_dir":"./vllm_profile"}' \
	--profiler-config '{"profiler":"torch","torch_profiler_dir":"/app/vllm_profile"}' \

	--model_args model=$MODEL,base_url=http://0.0.0.0:8001/v1/completions,num_concurrent=4,max_retries=10,max_gen_toks=2048,timeout=60000 \
	--model_args model=$MODEL,base_url=http://127.0.0.1:8001/v1/completions,num_concurrent=4,max_retries=10,max_gen_toks=2048,timeout=60000 \

	rm -rf /root/.cache/vllm/torch_compile_cache
	rm -rf ~/.cache/vllm/torch_compile_cache

	model = "deepseek-ai/DeepSeek-V4-Pro"
	model = "/home/models/DeepSeek-V4-Pro"

Conversation

wuhuikx commented May 1, 2026

Uh oh!

vercel Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 1, 2026 •

edited

Loading