[ray, single_controller] feat: add XPU resource mapping for Ray#14
Open
kahlun wants to merge 3 commits intoxpu/pr-a0-device-detectionfrom
Open
[ray, single_controller] feat: add XPU resource mapping for Ray#14kahlun wants to merge 3 commits intoxpu/pr-a0-device-detectionfrom
kahlun wants to merge 3 commits intoxpu/pr-a0-device-detectionfrom
Conversation
- Add is_torch_xpu_available() and is_xpu_available flag - Extend get_device_name() to return "xpu" when XPU is available - Extend get_nccl_backend() to return "xccl" for XPU - Extend get_resource_name() to return "xpu" for Ray resources - Add get_default_attention_implementation() → "eager" for XPU (flash_attn package is CUDA-only; XPU uses PyTorch SDPA instead) - Extend get_torch_device() to return torch.xpu namespace - Extend is_support_ipc() to return False for XPU (no SYCL IPC yet) No behavioral change for existing CUDA/NPU paths.
- Resource detection: check xpu custom resource in node info
- Placement group: {xpu: num_gpus} for XPU workers
- Worker local_rank from RANK % LOCAL_WORLD_SIZE for XPU
07d47d8 to
f381a03
Compare
4 tasks
f381a03 to
99b68bf
Compare
961f46d to
6bc7e34
Compare
F.sdpa dispatches to Intel SYCL-TLA Flash kernel on XPU (10-22x faster). Benchmarked on Arc Pro B60, PyTorch 2.11, bf16.
0097030 to
fa549d1
Compare
kahlun
added a commit
that referenced
this pull request
Apr 16, 2026
Hard blocks fixed: - #1: Replace shell heredoc source-patching with verl/utils/vllm/xpu_patches.py that does import-level monkey-patching; no source modification, version-stable - #2: Remove hardcoded ONEAPI_DEVICE_SELECTOR='level_zero:0,1' from module-level PPO_RAY_RUNTIME_ENV dict; gate propagation to XPU hosts only in get_ppo_ray_runtime_env() - #3: Gate torch.xpu.synchronize() behind VERL_XPU_SYNC_MICROBATCH env var; document the actual root cause (oneCCL non-re-entrancy during FSDP collectives) - #4: Document torch-xpu-ops#3020 in all_gather workaround; add warning at world_size>8 Strong objections fixed: - #6: Add logger.warning in is_support_ipc() when XPU falls back to shared memory - #7: Fix Dockerfile sitecustomize path — use site.getsitepackages() not hardcoded /usr/local/lib/python3.12/ so it works regardless of Python prefix - #8: Add hasattr guard for set_force_sum_reduction_for_comms with fallback warning - #13 (fix #2 side-effect): Restore blank lines in constants_ppo.py Moderate concerns fixed: - #11: Improve list() comment — explain oneCCL non-re-entrancy as actual root cause - #12: Remove numpy<2.0.0 pin (dpctl 0.21.1 does not require it) - #14: Change 'from None' to 'from e' in create_engine_config exception chain
kahlun
added a commit
that referenced
this pull request
Apr 16, 2026
Hard blocks fixed: - #1: Replace shell heredoc source-patching with verl/utils/vllm/xpu_patches.py that does import-level monkey-patching; no source modification, version-stable - #2: Remove hardcoded ONEAPI_DEVICE_SELECTOR='level_zero:0,1' from module-level PPO_RAY_RUNTIME_ENV dict; gate propagation to XPU hosts only in get_ppo_ray_runtime_env() - #3: Gate torch.xpu.synchronize() behind VERL_XPU_SYNC_MICROBATCH env var; document the actual root cause (oneCCL non-re-entrancy during FSDP collectives) - #4: Document torch-xpu-ops#3020 in all_gather workaround; add warning at world_size>8 Strong objections fixed: - #6: Add logger.warning in is_support_ipc() when XPU falls back to shared memory - #7: Fix Dockerfile sitecustomize path — use site.getsitepackages() not hardcoded /usr/local/lib/python3.12/ so it works regardless of Python prefix - #8: Add hasattr guard for set_force_sum_reduction_for_comms with fallback warning - #13 (fix #2 side-effect): Restore blank lines in constants_ppo.py Moderate concerns fixed: - #11: Improve list() comment — explain oneCCL non-re-entrancy as actual root cause - #12: Remove numpy<2.0.0 pin (dpctl 0.21.1 does not require it) - #14: Change 'from None' to 'from e' in create_engine_config exception chain
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Teach Ray's resource system about Intel XPU devices.
node_info.get("xpu", 0)alongside GPU/NPU{"xpu": num_gpus}for XPU worker schedulingRANK % LOCAL_WORLD_SIZEwhen Ray doesn't recognize XPUDepends on: device detection PR (parallel with XCCL workarounds)
Test plan