Skip to content

DEEPSEEK-V4-FLASH ON H20-141G*8 SINGLE NODE ERROR #429

@cxljkb1994

Description

@cxljkb1994

(APIServer pid=7) INFO 04-30 08:57:19 [utils.py:299]
(APIServer pid=7) INFO 04-30 08:57:19 [utils.py:299] █ █ █▄ ▄█
(APIServer pid=7) INFO 04-30 08:57:19 [utils.py:299] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.20.0
(APIServer pid=7) INFO 04-30 08:57:19 [utils.py:299] █▄█▀ █ █ █ █ model /export/model
(APIServer pid=7) INFO 04-30 08:57:19 [utils.py:299] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
(APIServer pid=7) INFO 04-30 08:57:19 [utils.py:299]

(Worker_TP0_EP0 pid=794) INFO 04-30 08:58:47 [flashinfer_all_reduce.py:149] Initialized FlashInfer Allreduce norm fusion workspace with backend=trtllm
(Worker_TP0_EP0 pid=794) INFO 04-30 08:58:51 [backends.py:376] Cache the graph of compile range (1, 64) for later use
(Worker_TP0_EP0 pid=794) INFO 04-30 08:58:51 [backends.py:376] Cache the graph of compile range (65, 2048) for later use
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] WorkerProc hit an exception.
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] output = func(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return func(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] self.model_runner.profile_run()
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5848, in profile_run
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return func(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5537, in _dummy_run
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] outputs = self.model(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in call
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return self.runnable(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return forward_call(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 1474, in forward
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] hidden_states = self.model(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 623, in call
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 183, in aot_compile
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return self._compiled_callable.aot_compile((args, kwargs))
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/dynamo/eval_frame.py", line 873, in aot_compile
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return aot_compile_fullgraph(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/dynamo/aot_compile.py", line 368, in aot_compile_fullgraph
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] compiled_fn = backend(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/init.py", line 2535, in call
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return self.compiler_fn(model
, inputs
, **self.kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return func(*args, **kwds)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1194, in call
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] PiecewiseCompileInterpreter(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return func(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 721, in run
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return super().run(*args)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 200, in run
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] self.env[node] = self.run_node(node)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 297, in run_node
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return getattr(self, n.op)(n.target, args, kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 748, in call_module
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] piecewise_backend = PiecewiseBackend(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 190, in init
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] self.compile_all_ranges()
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return func(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 351, in compile
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] compiled_graph, handle = self.compiler.compile(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 372, in compile
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/init.py", line 444, in standalone_compile
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return standalone_compile(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] compiled_fn = compile_fx(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return compile_fx(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return _maybe_wrap_and_compile_fx_main(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return _compile_fx_main(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] raise InductorError(e, currentframe()).with_traceback(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] _recursive_post_grad_passes(gm, is_inference=is_inference)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] post_grad_passes(gm, is_inference)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 357, in post_grad_passes
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ).apply_graph_pass(decompose_triton_kernel_wrapper_functional)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return pass_fn(self.gm.graph)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1255, in decompose_triton_kernel_wrapper_functional
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] graph_pass.apply(graph)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 2063, in apply
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] entry.apply(m, graph, node)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 1132, in apply
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] self.handler(match, *match.args, **match.kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1253, in _
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] match.replace_by_example(decomp, flat_args, run_functional_passes=False)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 316, in replace_by_example
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] assert len(graph_with_eager_vals.graph.nodes) == len(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] torch._inductor.exc.InductorError: AssertionError:
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] output = func(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return func(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] self.model_runner.profile_run()
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5848, in profile_run
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return func(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5537, in _dummy_run
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] outputs = self.model(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in call
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return self.runnable(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return forward_call(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 1474, in forward
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] hidden_states = self.model(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 623, in call
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 183, in aot_compile
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return self._compiled_callable.aot_compile((args, kwargs))
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/dynamo/eval_frame.py", line 873, in aot_compile
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return aot_compile_fullgraph(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/dynamo/aot_compile.py", line 368, in aot_compile_fullgraph
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] compiled_fn = backend(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/init.py", line 2535, in call
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return self.compiler_fn(model
, inputs
, **self.kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return func(*args, **kwds)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1194, in call
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] PiecewiseCompileInterpreter(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return func(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 721, in run
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return super().run(*args)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 200, in run
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] self.env[node] = self.run_node(node)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 297, in run_node
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return getattr(self, n.op)(n.target, args, kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 748, in call_module
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] piecewise_backend = PiecewiseBackend(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 190, in init
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] self.compile_all_ranges()
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] return func(*args, **kwargs)
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 351, in compile
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] compiled_graph, handle = self.compiler.compile(
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=800) ERROR 04-30 08:58:52 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 372, in compile

以下是我使用的是部署命令
vllm
serve
/export/model
--host 0.0.0.0
--port 30000
--served-model-name DeepSeek-v4-Flash
--download-dir /export/model
--trust-remote-code
--kv-cache-dtype fp8
--block-size 256
--enable-expert-parallel
--tensor-parallel-size 8
--tokenizer-mode deepseek_v4
--tool-call-parser deepseek_v4
--enable-auto-tool-choice
--reasoning-parser deepseek_v4
--speculative_config '{"method":"mtp","num_speculative_tokens":1}'

如果运行参数中新增 --enforce-eager 能正常运行,但是token速度极其缓慢
也尝试了 export VLLM_DISABLE_COMPILE=1 但是无效依旧无法正常运行。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions