[model]feat: Qwen3.5 is compatible with NPU by wang-hua-2019 · Pull Request #600 · ByteDance-Seed/VeOmni

wang-hua-2019 · 2026-03-23T15:19:34Z

What does this PR do?

Qwen3.5模型适配昇腾NPU卡

Checklist Before Starting

Search for relative PRs/issues and link here: ...
PR title follows [{modules}] {type}: {description} format
- {modules}: misc, ci, config, docs, data, dist, omni, logging, model, optim, ckpt, release, task, perf, ops, parallel, trainer
- {type}: feat, fix, refactor, chore, test
- Breaking changes: prepend [BREAKING] — e.g. [BREAKING][parallel, model] feat: dynamic batching

Test

Validation results (training curves, eval metrics) for changes not covered by CI.

API and Usage Example

Show API changes and usage examples if applicable.

Design & Code Changes

参考GPU patch方案，增加NPU patch代码，主要更改点：1、rmsnorm融合算子接入mojo；2、解决fused_moe为true报错问题

Checklist Before Submitting

Read the Contribute Guide
Applied pre-commit checks
Added/updated documentation
If tasks/ training scripts were moved or renamed: updated docs/ examples and verified python3 scripts/ci/check_doc_task_paths.py passes (also enforced by the Check doc task paths CI workflow)
Added tests to CI workflow (or explained why not feasible)

gemini-code-assist

Code Review

This pull request introduces NPU compatibility for the Qwen3.5 and Qwen3.5 MoE models. Key changes include conditional imports for NPU-specific patched models, integration of mojo_opset for optimized RMSNorm and causal convolution, and extensive modifications to support Ulysses Sequence Parallelism (SP) and FSDP-safe multimodal processing. The Qwen3_5GatedDeltaNet and Qwen3_5MoeGatedDeltaNet forward passes have been updated to handle variable-length sequences and SP-aware weight sharding. Additionally, the MoE expert dispatch now supports a fused implementation, and several vision model methods have been optimized for performance and distributed training compatibility. The explicit NotImplementedErrors indicate areas where NPU support is still under development for specific execution paths, and the ValueError for multimodal inputs in the MoE version clarifies current limitations.

gemini-code-assist · 2026-03-23T15:25:54Z

+        # Modification: use out-of-place add instead of `expert_output += shared_expert_output`
+        # to avoid "Output of MergedFc1TritonFusedMoeExpertFunctionBackward is a view and is
+        # being modified inplace" RuntimeError from PyTorch autograd.
+        expert_output = expert_output + shared_expert_output


Changing expert_output += shared_expert_output to expert_output = expert_output + shared_expert_output is a critical fix to avoid in-place modification errors with PyTorch's autograd system. This prevents potential runtime crashes related to view operations on custom autograd function outputs.

gemini-code-assist · 2026-03-23T15:25:55Z

+            # Modification: keep this disabled until FLA causal_conv1d_update decode path is validated.
+            raise NotImplementedError("use_precomputed_states=True is not supported yet for causal_conv1d_update now.")


This NotImplementedError indicates that the use_precomputed_states=True path for causal_conv1d_update is not yet supported for NPU. This could lead to runtime failures if this specific decoding path is triggered in an NPU environment. Consider prioritizing the implementation of this path or providing a clear warning in the documentation about this limitation.

gemini-code-assist · 2026-03-23T15:25:55Z

+-        mixed_qkv = mixed_qkv.transpose(1, 2)
+                raise NotImplementedError("This path is not supported yet because it can't process varlen now.")


This NotImplementedError suggests that the fallback path (when self.causal_conv1d_fn is None) does not support variable-length sequences on NPU. This means that if the mojo_causal_conv1d is not available or if this specific path is taken, varlen processing will fail. It's crucial to ensure that mojo_causal_conv1d is always available or to implement a robust fallback for varlen processing.

gemini-code-assist · 2026-03-23T15:25:55Z

+        cu_seq_lens_q = kwargs.get("cu_seq_lens_q", None)
+        assert cu_seq_lens_q is not None, (
+            "cu_seq_lens_q must be provided to support varlen Flash Linear Attention, varlen Conv1D,"
+            "and to remove the full Flash Attention CPU-NPU sync."
+        )


The assertion cu_seq_lens_q is not None makes cu_seq_lens_q a mandatory argument for Qwen3_5DecoderLayer.forward when using varlen Flash Linear Attention or Conv1D. While this enforces correct usage, it's important to ensure that all upstream callers consistently provide this argument to prevent runtime crashes. If there are scenarios where cu_seq_lens_q might legitimately be None, a more graceful handling (e.g., falling back to a non-varlen path if possible) might be considered.

gemini-code-assist · 2026-03-23T15:25:55Z

+            # Modification: keep this disabled until FLA causal_conv1d_update decode path is validated.
+            raise NotImplementedError("use_precomputed_states=True is not supported yet for causal_conv1d_update now.")


Similar to the non-MoE version, this NotImplementedError indicates that use_precomputed_states=True is not supported for causal_conv1d_update in the MoE model on NPU. This is a critical limitation for certain decoding scenarios and should be addressed or clearly documented.

gemini-code-assist · 2026-03-23T15:25:55Z

+                )[0]
+             else:


This NotImplementedError in the MoE model's Qwen3_5MoeGatedDeltaNet.forward indicates that the fallback path for causal_conv1d_fn does not support variable-length sequences. This is a high-severity issue as it can lead to crashes if the optimized NPU mojo_causal_conv1d is not used or available, and varlen inputs are provided.

gemini-code-assist · 2026-03-23T15:25:55Z

+        cu_seq_lens_q = kwargs.get("cu_seq_lens_q", None)
+        assert cu_seq_lens_q is not None, (
+            "cu_seq_lens_q must be provided to support varlen Flash Linear Attention, varlen Conv1D,"
+            "and to remove the full Flash Attention CPU-GPU sync."
+        )


Similar to the non-MoE version, the assertion cu_seq_lens_q is not None in Qwen3_5MoeDecoderLayer.forward is a strong requirement. Ensure that cu_seq_lens_q is always provided by callers when varlen linear attention or Conv1D is expected, or consider a more robust error handling/fallback mechanism.

gemini-code-assist · 2026-03-23T15:25:55Z

+        if pixel_values is not None or pixel_values_videos is not None:
+            raise ValueError(
+                "Qwen3_5MoeForConditionalGeneration currently supports text-only inputs in VeOmni; "
+                "`pixel_values` and `pixel_values_videos` are not supported yet."
+            )


The ValueError explicitly states that Qwen3_5MoeForConditionalGeneration currently supports text-only inputs in VeOmni and does not support pixel_values or pixel_values_videos. This is a clear and important limitation. While it prevents incorrect usage, it highlights an area for future development if multimodal MoE is desired.

Qwen3.5 is compatible with NPU

6a92a6d

github-actions Bot added the ascend everything about Ascend support label Mar 23, 2026

gemini-code-assist Bot reviewed Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model]feat: Qwen3.5 is compatible with NPU#600

[model]feat: Qwen3.5 is compatible with NPU#600
wang-hua-2019 wants to merge 1 commit intoByteDance-Seed:mainfrom
wang-hua-2019:main

wang-hua-2019 commented Mar 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		+ # Modification: keep this disabled until FLA causal_conv1d_update decode path is validated.
		+ raise NotImplementedError("use_precomputed_states=True is not supported yet for causal_conv1d_update now.")

		- mixed_qkv = mixed_qkv.transpose(1, 2)
		+ raise NotImplementedError("This path is not supported yet because it can't process varlen now.")

Conversation

wang-hua-2019 commented Mar 23, 2026

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant