feat(xpu): VeOmni engine support for Intel XPU by kahlun · Pull Request #20 · kahlun/verl

kahlun · 2026-04-28T03:13:20Z

Summary

Register VeOmni training engine for XPU device in veRL.

Changes

Add xpu to EngineRegistry.register device list for VeOmniEngineWithLMHead
Add GRPO VeOmni XPU e2e test script

Dependencies

Upstream VeOmni XPU patches: [hardware] feat: add Intel XPU device support ByteDance-Seed/VeOmni#695 (B1/B2/B3/B5 device abstraction)

Test

bash tests/special_xpu/run_grpo_veomni_xpu.sh

- Add 'xpu' to EngineRegistry.register device list for VeOmniEngineWithLMHead - Add GRPO VeOmni XPU e2e test script (tests/special_xpu/run_grpo_veomni_xpu.sh) Depends on upstream VeOmni XPU patches: ByteDance-Seed/VeOmni#695

…dules oneCCL (xccl) doesn't support ReduceOp.AVG in reduce_scatter. The FSDP engine already calls set_force_sum_reduction_for_comms(True) on the root module, but VeOmni wraps each layer with fully_shard independently, so the flag must be set on ALL FSDPModule submodules.

… on XPU - Changed hasattr soft check to explicit guard with RuntimeError - Prevents silent gradient corruption from wrong ReduceOp.AVG on oneCCL - Users now see clear error mentioning PyTorch 2.5+ requirement instead of wrong gradients - Pattern matches fsdp_utils.py hard-fail approach This is critical for correctness: a missing ReduceOp fix silent-fails to wrong gradients, worse than a hard crash.

kahlun force-pushed the xpu/pr-g-veomni-xpu branch from d3f5166 to 12c2e11 Compare April 30, 2026 04:07

kahlun force-pushed the xpu/e2e-clean branch from f50f160 to f2b7850 Compare April 30, 2026 07:50

kahlun force-pushed the xpu/pr-g-veomni-xpu branch from 12c2e11 to 2b67bc1 Compare April 30, 2026 08:15

kahlun added 3 commits April 30, 2026 01:17

feat(xpu): register VeOmni engine for XPU device

cba42cc

- Add 'xpu' to EngineRegistry.register device list for VeOmniEngineWithLMHead - Add GRPO VeOmni XPU e2e test script (tests/special_xpu/run_grpo_veomni_xpu.sh) Depends on upstream VeOmni XPU patches: ByteDance-Seed/VeOmni#695

kahlun force-pushed the xpu/pr-g-veomni-xpu branch from 2b67bc1 to 7b47958 Compare April 30, 2026 08:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(xpu): VeOmni engine support for Intel XPU#20

feat(xpu): VeOmni engine support for Intel XPU#20
kahlun wants to merge 3 commits intoxpu/e2e-cleanfrom
xpu/pr-g-veomni-xpu

kahlun commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kahlun commented Apr 28, 2026

Summary

Changes

Dependencies

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant