[hardware] feat: add Intel XPU device support by kahlun · Pull Request #695 · ByteDance-Seed/VeOmni

kahlun · 2026-04-27T14:07:00Z

Summary

Add Intel XPU (GPU) device detection and backend support to VeOmni, enabling training on Intel Arc/Data Center GPUs.

Changes (4 files, +13 / -3 lines)

File	Change	Bug ID
`veomni/utils/device.py`	Add `IS_XPU_AVAILABLE` flag, XPU branch in `get_device_type()`, `get_dist_comm_backend()` (`xccl`), and `stream_synchronize()`	B1, B2
`veomni/ops/kernels/moe/_kernels/utils/device.py`	Guard `torch.cuda.get_device_capability()` for non-CUDA devices	B3
`veomni/distributed/moe/moe_layer.py`	Guard `group_gemm` import (Triton CUDA kernels) for CUDA-only	B3
`veomni/distributed/torch_parallelize.py`	Accept `"xpu"` as valid `init_device` in non-FSDP path	B5

Testing

Tested on Intel Arc Pro B70 (Battlemage BMG-G31, 32 GB VRAM):

1-GPU: 7/7 pass (import, model build, FSDP2 parallelize, optimizer, fwd+bwd, 3-step train loop, CPU offload round-trip)
2-GPU: 8/8 pass (with CCL_ATL_SHM=1)
Model: Qwen2.5-0.5B (494M params, bf16)

Context

Follow-up to #648 (src_data_rank=None fix for scatter hang). These are the remaining XPU device detection gaps that block VeOmni from running on Intel GPUs.

The corresponding veRL integration PR (1-line EngineRegistry registration) is staged separately.

gemini-code-assist

Code Review

This pull request introduces support for XPU devices across various modules, including device detection, communication backends, and synchronization utilities. Feedback indicates that the new guard in get_device_key incorrectly returns "unknown" for XPU devices, bypassing device-specific configurations. Furthermore, the conditional import of group_gemm kernels in moe_layer.py is likely to cause a NameError on XPU systems when those kernels are referenced at runtime.

gemini-code-assist · 2026-04-27T14:08:55Z

+    if not torch.cuda.is_available():
+        return "unknown"


This guard returns "unknown" for XPU devices, which prevents the function from reaching the get_device_name() call at the end of the function. This will cause MoE kernels to use fallback configurations instead of device-specific ones. Since this PR aims to add XPU support, the logic should be updated to allow XPU devices to fall through to the device name detection while still guarding the CUDA-specific get_device_capability() calls to avoid crashes.

gemini-code-assist · 2026-04-27T14:08:55Z



-if not is_torch_npu_available():
+if not is_torch_npu_available() and torch.cuda.is_available():


Guarding the import of group_gemm kernels with torch.cuda.is_available() will cause a NameError at runtime on XPU devices when EPGroupGemm or EPMergedFc1GroupGemm are used, as these classes refer to the imported functions in their methods. If MoE is not yet supported on XPU, it would be better to provide a clear error message (e.g., a RuntimeError in the forward method) or ensure the functions are defined as None and checked before use, rather than allowing a NameError to occur.

- Add IS_XPU_AVAILABLE flag, XPU branch in get_device_type(), get_dist_comm_backend() (xccl), and stream_synchronize() - Guard torch.cuda.get_device_capability() for non-CUDA devices; let XPU fall through to get_device_name() instead of returning 'unknown' - Guard group_gemm import (CUDA-only Triton kernels) with None defaults to avoid NameError on non-CUDA devices - Accept 'xpu' as valid init_device in non-FSDP path Tested on Intel Arc Pro B60 (Battlemage BMG-G21, 24 GB VRAM): - 1-GPU standalone: 7/7 pass - 2-GPU FSDP2: 8/8 pass (with CCL_ATL_SHM=1) - veRL e2e GRPO (VeOmni engine + vLLM rollout): PASS - Model: Qwen2.5-0.5B-Instruct (494M params, bf16)

- device.py: keep explicit get_device_key fallback for non-CUDA devices - moe_layer.py: raise actionable RuntimeError when fused MoE group_gemm is unavailable (e.g., XPU), guiding users to moe_implementation=eager - tests/special_xpu/test_fsdp2_simple_xpu.py: rewritten to true FSDP2 via fully_shard (no FSDPv1 API) - tests/special_xpu/test_fsdp2_simple_xpu.py: removed dead train_step helper - tests/special_xpu/run_veomni_e2e_sft_xpu.sh: clarify this trainer path is fsdp1 smoke; FSDP2 coverage comes from test_fsdp2_simple_xpu.py Validated: - 2-GPU XPU FSDP2 smoke passes with fully_shard (loss decreases) - FSDP2 wrappers present at runtime (FSDPModule count > 0)

- Add 'xpu' to EngineRegistry.register device list for VeOmniEngineWithLMHead - Add GRPO VeOmni XPU e2e test script (tests/special_xpu/run_grpo_veomni_xpu.sh) Depends on upstream VeOmni XPU patches: ByteDance-Seed/VeOmni#695

gemini-code-assist Bot reviewed Apr 27, 2026

View reviewed changes

kahlun mentioned this pull request Apr 28, 2026

feat(xpu): VeOmni engine support for Intel XPU kahlun/verl#20

Draft

kahlun force-pushed the xpu/device-support branch from 36b72a1 to b212058 Compare April 28, 2026 08:28

kahlun force-pushed the xpu/device-support branch from 2c68ee9 to f893cad Compare April 30, 2026 04:11

kahlun added 2 commits April 30, 2026 00:52

tests(xpu): remove standalone fsdp2 smoke; generalize moe guard

7a73397

configs(xpu): move qwen2_5_xpu yaml into dedicated xpu folder

4fde11d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hardware] feat: add Intel XPU device support#695

[hardware] feat: add Intel XPU device support#695
kahlun wants to merge 4 commits intoByteDance-Seed:mainfrom
kahlun:xpu/device-support

kahlun commented Apr 27, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant



		if not is_torch_npu_available():
		if not is_torch_npu_available() and torch.cuda.is_available():

Conversation

kahlun commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes (4 files, +13 / -3 lines)

Testing

Context

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kahlun commented Apr 27, 2026 •

edited

Loading