[hardware] feat: add XCCL backend workarounds for Intel XPU by kahlun · Pull Request #13 · kahlun/verl

kahlun · 2026-04-07T11:07:21Z

Summary

Temporary XCCL workarounds — will be removed when oneCCL adds native AVG support.

Composite backend cpu:gloo,xpu:xccl for mixed CPU+XPU tensors
all_reduce_avg(): XCCL lacks ReduceOp.AVG → SUM + manual division
FSDP2 set_force_sum_reduction_for_comms(True) for reduce_scatter

Depends on: device detection PR

Test plan

No change when XPU not available (workarounds gated by is_xpu_available)

- Add is_torch_xpu_available() and is_xpu_available flag - Extend get_device_name() to return "xpu" when XPU is available - Extend get_nccl_backend() to return "xccl" for XPU - Extend get_resource_name() to return "xpu" for Ray resources - Add get_default_attention_implementation() → "eager" for XPU (flash_attn package is CUDA-only; XPU uses PyTorch SDPA instead) - Extend get_torch_device() to return torch.xpu namespace - Extend is_support_ipc() to return False for XPU (no SYCL IPC yet) No behavioral change for existing CUDA/NPU paths.

- Composite backend cpu:gloo,xpu:xccl for mixed tensor support - all_reduce_avg(): XCCL lacks ReduceOp.AVG, use SUM + divide - FSDP2 set_force_sum_reduction_for_comms(True) for reduce_scatter - Temporary workarounds — removed when oneCCL adds native AVG

F.sdpa dispatches to Intel SYCL-TLA Flash kernel on XPU (10-22x faster). Benchmarked on Arc Pro B60, PyTorch 2.11, bf16.

Hard blocks fixed: - #1: Replace shell heredoc source-patching with verl/utils/vllm/xpu_patches.py that does import-level monkey-patching; no source modification, version-stable - #2: Remove hardcoded ONEAPI_DEVICE_SELECTOR='level_zero:0,1' from module-level PPO_RAY_RUNTIME_ENV dict; gate propagation to XPU hosts only in get_ppo_ray_runtime_env() - #3: Gate torch.xpu.synchronize() behind VERL_XPU_SYNC_MICROBATCH env var; document the actual root cause (oneCCL non-re-entrancy during FSDP collectives) - #4: Document torch-xpu-ops#3020 in all_gather workaround; add warning at world_size>8 Strong objections fixed: - #6: Add logger.warning in is_support_ipc() when XPU falls back to shared memory - #7: Fix Dockerfile sitecustomize path — use site.getsitepackages() not hardcoded /usr/local/lib/python3.12/ so it works regardless of Python prefix - #8: Add hasattr guard for set_force_sum_reduction_for_comms with fallback warning - #13 (fix #2 side-effect): Restore blank lines in constants_ppo.py Moderate concerns fixed: - #11: Improve list() comment — explain oneCCL non-re-entrancy as actual root cause - #12: Remove numpy<2.0.0 pin (dpctl 0.21.1 does not require it) - #14: Change 'from None' to 'from e' in create_engine_config exception chain

kahlun added 2 commits April 7, 2026 03:56

kahlun force-pushed the xpu/pr-a0-device-detection branch from 07d47d8 to f381a03 Compare April 7, 2026 11:08

kahlun mentioned this pull request Apr 7, 2026

[hardware] feat: add Intel XPU device abstraction and distributed support #2

Closed

4 tasks

kahlun force-pushed the xpu/pr-a0-device-detection branch from f381a03 to 99b68bf Compare April 7, 2026 11:09

kahlun force-pushed the xpu/pr-a1-xccl-workarounds branch from 2e3513a to 201cd50 Compare April 7, 2026 11:09

fix: change XPU attention default from "eager" to "sdpa"

c9de9bc

F.sdpa dispatches to Intel SYCL-TLA Flash kernel on XPU (10-22x faster). Benchmarked on Arc Pro B60, PyTorch 2.11, bf16.

kahlun force-pushed the xpu/pr-a0-device-detection branch 6 times, most recently from 0097030 to fa549d1 Compare April 10, 2026 04:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hardware] feat: add XCCL backend workarounds for Intel XPU#13

[hardware] feat: add XCCL backend workarounds for Intel XPU#13
kahlun wants to merge 3 commits intoxpu/pr-a0-device-detectionfrom
xpu/pr-a1-xccl-workarounds

kahlun commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kahlun commented Apr 7, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant