Skip to content

feat: Add npu megatron support#380

Open
UsernameFull wants to merge 4 commits intoalibaba:mainfrom
UsernameFull:npu_megatron
Open

feat: Add npu megatron support#380
UsernameFull wants to merge 4 commits intoalibaba:mainfrom
UsernameFull:npu_megatron

Conversation

@UsernameFull
Copy link
Copy Markdown
Contributor

@UsernameFull UsernameFull commented Mar 16, 2026

Summary

This PR adds support for Huawei Ascend NPU devices with Megatron-Core backend, enabling ROLL framework to run reinforcement learning training on NPU hardware.

Key Changes

1. Platform Detection Priority

File: roll/platforms/__init__.py

Changes: Reordered platform detection to check NPU before CUDA.

Reason: NPU devices were incorrectly falling back to CUDA platform. Prioritizing NPU detection ensures NpuPlatform is properly initialized when torch_npu is available.

2. Device-Agnostic Operations

File: roll/pipeline/base_worker.py

Changes:

  • Replaced "cuda" with current_platform.device_type
  • Replaced torch.cuda.memory_allocated() with current_platform.memory_allocated()

3. MindSpeed Integration

File: mcore_adapter/src/mcore_adapter/training_args.py

Changes: Added optional import of mindspeed.megatron_adaptor .

Reason: MindSpeed is Huawei's library providing NPU-specific Megatron optimizations. The adaptor patches Megatron-Core for NPU compatibility while maintaining GPU compatibility via try-except.

4. NPU Attention Mask Format

File: roll/distributed/strategy/megatron_strategy.py

Changes: Added NPU-specific attention mask transformation to 4D format.

Reason: NPU requires 4D attention masks [B, 1, S, S] instead of standard 2D [B, S] . This hardware-specific requirement ensures correct attention computation on NPU.

if hasattr(torch, "npu") and torch.npu.is_available() and attention_mask is 
not None:
    attention_mask = attention_mask.bool()
    attention_mask = attention_mask[:, None, None, :].expand(B, 1, S, S)

5. Optimizer Compatibility

File: roll/third_party/megatron/optimizer.py

Changes: Added support for no_weight_decay_cond , scale_lr_cond , lr_mult parameters.

6. Example Configurations

Files:

- examples/ascend_examples/qwen3_4B_dpo_megatron.yaml
- examples/ascend_examples/qwen3_8b_rlvr_deepspeed.yaml
- examples/ascend_examples/run_dpo_pipeline.sh

Reason: Provides ready-to-use NPU training examples demonstrating proper device mapping and strategy configuration for both DPO and RLVR pipelines.

Impact

Benefits:

  • Enables megatron on Huawei Ascend NPU hardware
  • Maintains full backward compatibility with GPU systems
  • Follows existing platform abstraction patterns

Requirements

  • Huawei Ascend NPU with torch_npu installed
  • MindSpeed(v0.15.3) library for NPU Megatron support

@UsernameFull UsernameFull force-pushed the npu_megatron branch 2 times, most recently from fb2e7dc to acfad89 Compare March 17, 2026 06:54
@UsernameFull UsernameFull changed the title [WIP]Add npu megatron support feat: Add npu megatron support Apr 2, 2026
UsernameFull and others added 2 commits April 2, 2026 14:40
# Conflicts:
#	roll/pipeline/sft/sft_pipeline.py
# Conflicts:
#	roll/configs/worker_config.py

feat: fix ascend example

fix: ascend rlvr yaml fix

fix: megatron fix
@UsernameFull UsernameFull force-pushed the npu_megatron branch 2 times, most recently from e6d042f to df7d186 Compare April 2, 2026 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants