[model]feat: Qwen3.5 is compatible with NPU#600
[model]feat: Qwen3.5 is compatible with NPU#600wang-hua-2019 wants to merge 1 commit intoByteDance-Seed:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces NPU compatibility for the Qwen3.5 and Qwen3.5 MoE models. Key changes include conditional imports for NPU-specific patched models, integration of mojo_opset for optimized RMSNorm and causal convolution, and extensive modifications to support Ulysses Sequence Parallelism (SP) and FSDP-safe multimodal processing. The Qwen3_5GatedDeltaNet and Qwen3_5MoeGatedDeltaNet forward passes have been updated to handle variable-length sequences and SP-aware weight sharding. Additionally, the MoE expert dispatch now supports a fused implementation, and several vision model methods have been optimized for performance and distributed training compatibility. The explicit NotImplementedErrors indicate areas where NPU support is still under development for specific execution paths, and the ValueError for multimodal inputs in the MoE version clarifies current limitations.
| + # Modification: use out-of-place add instead of `expert_output += shared_expert_output` | ||
| + # to avoid "Output of MergedFc1TritonFusedMoeExpertFunctionBackward is a view and is | ||
| + # being modified inplace" RuntimeError from PyTorch autograd. | ||
| + expert_output = expert_output + shared_expert_output |
There was a problem hiding this comment.
| + # Modification: keep this disabled until FLA causal_conv1d_update decode path is validated. | ||
| + raise NotImplementedError("use_precomputed_states=True is not supported yet for causal_conv1d_update now.") |
There was a problem hiding this comment.
This NotImplementedError indicates that the use_precomputed_states=True path for causal_conv1d_update is not yet supported for NPU. This could lead to runtime failures if this specific decoding path is triggered in an NPU environment. Consider prioritizing the implementation of this path or providing a clear warning in the documentation about this limitation.
| - mixed_qkv = mixed_qkv.transpose(1, 2) | ||
| + raise NotImplementedError("This path is not supported yet because it can't process varlen now.") |
There was a problem hiding this comment.
This NotImplementedError suggests that the fallback path (when self.causal_conv1d_fn is None) does not support variable-length sequences on NPU. This means that if the mojo_causal_conv1d is not available or if this specific path is taken, varlen processing will fail. It's crucial to ensure that mojo_causal_conv1d is always available or to implement a robust fallback for varlen processing.
| + cu_seq_lens_q = kwargs.get("cu_seq_lens_q", None) | ||
| + assert cu_seq_lens_q is not None, ( | ||
| + "cu_seq_lens_q must be provided to support varlen Flash Linear Attention, varlen Conv1D," | ||
| + "and to remove the full Flash Attention CPU-NPU sync." | ||
| + ) |
There was a problem hiding this comment.
The assertion cu_seq_lens_q is not None makes cu_seq_lens_q a mandatory argument for Qwen3_5DecoderLayer.forward when using varlen Flash Linear Attention or Conv1D. While this enforces correct usage, it's important to ensure that all upstream callers consistently provide this argument to prevent runtime crashes. If there are scenarios where cu_seq_lens_q might legitimately be None, a more graceful handling (e.g., falling back to a non-varlen path if possible) might be considered.
| + # Modification: keep this disabled until FLA causal_conv1d_update decode path is validated. | ||
| + raise NotImplementedError("use_precomputed_states=True is not supported yet for causal_conv1d_update now.") |
There was a problem hiding this comment.
| + )[0] | ||
| else: |
There was a problem hiding this comment.
This NotImplementedError in the MoE model's Qwen3_5MoeGatedDeltaNet.forward indicates that the fallback path for causal_conv1d_fn does not support variable-length sequences. This is a high-severity issue as it can lead to crashes if the optimized NPU mojo_causal_conv1d is not used or available, and varlen inputs are provided.
| + cu_seq_lens_q = kwargs.get("cu_seq_lens_q", None) | ||
| + assert cu_seq_lens_q is not None, ( | ||
| + "cu_seq_lens_q must be provided to support varlen Flash Linear Attention, varlen Conv1D," | ||
| + "and to remove the full Flash Attention CPU-GPU sync." | ||
| + ) |
There was a problem hiding this comment.
| + if pixel_values is not None or pixel_values_videos is not None: | ||
| + raise ValueError( | ||
| + "Qwen3_5MoeForConditionalGeneration currently supports text-only inputs in VeOmni; " | ||
| + "`pixel_values` and `pixel_values_videos` are not supported yet." | ||
| + ) |
There was a problem hiding this comment.
The ValueError explicitly states that Qwen3_5MoeForConditionalGeneration currently supports text-only inputs in VeOmni and does not support pixel_values or pixel_values_videos. This is a clear and important limitation. While it prevents incorrect usage, it highlights an area for future development if multimodal MoE is desired.
What does this PR do?
Checklist Before Starting
[{modules}] {type}: {description}format{modules}:misc,ci,config,docs,data,dist,omni,logging,model,optim,ckpt,release,task,perf,ops,parallel,trainer{type}:feat,fix,refactor,chore,test[BREAKING]— e.g.[BREAKING][parallel, model] feat: dynamic batchingTest
API and Usage Example
Design & Code Changes
Checklist Before Submitting
tasks/training scripts were moved or renamed: updateddocs/examples and verifiedpython3 scripts/ci/check_doc_task_paths.pypasses (also enforced by the Check doc task paths CI workflow)