| About Ascend | Documentation |
ray-ascend is a community-maintained hardware plugin that supports advanced
Ray features on Ascend NPU accelerators.
By default, Ray natively supports Ascend NPU as a predefined resource type for binding
actors and tasks (see
Ray Accelerator Support).
As an enhancement, ray-ascend provides Ascend-native features on Ray, such as
collective communication via
Huawei Collective Communication Library (HCCL),
Ray Direct Transport (RDT),
and more.
For performance benchmarks, see the Performance Benchmark Report.
- Architecture: aarch64, x86
- OS Kernel: Linux
- Python Dependencies:
- python >= 3.10, <= 3.11
- CANN >= 8.2.rc1
- torch >= 2.7.1; torch-npu >= 2.7.1.post2
- torch and torch-npu versions must be compatible with each other.
- ray >= 2.55.0
pip install "ray-ascend[yr]"import ray
from ray.util import collective
from ray_ascend import register_hccl_collective_backend
register_hccl_collective_backend()
@ray.remote(resources={"NPU": 1})
class RayActor:
def __init__(self):
register_hccl_collective_backend()
collective.create_collective_group(
actors,
len(actors),
list(range(0, len(actors))),
backend="HCCL",
group_name="my_group",
)
# Each actor broadcasts in SPMD manner
collective.broadcast(tensor, src_rank=0, group_name="my_group")Transport Ascend NPU Tensors via HCCS
import ray
import torch
from ray.util.collective import create_collective_group
from ray_ascend import register_hccl_tensor_transport
register_hccl_tensor_transport()
@ray.remote(resources={"NPU": 1})
class RayActor:
def __init__(self):
register_hccl_tensor_transport()
@ray.method(tensor_transport="HCCL")
def random_tensor(self):
return torch.zeros(1024, device="npu")
def sum(self, tensor: torch.Tensor):
return torch.sum(tensor)
sender, receiver = RayActor.remote(), RayActor.remote()
group = create_collective_group([sender, receiver], backend="HCCL")
tensor = sender.random_tensor.remote()
result = receiver.sum.remote(tensor)
ray.get(result)Transport Ascend NPU Tensors via HCCS and CPU Tensors via RDMA
OpenYuanrong DataSystem
(YR) allows users to transport NPU tensors (via HCCS) and CPU tensors (via RDMA if
provided) using Ray objects.
import ray
from ray_ascend import register_yr_tensor_transport
register_yr_tensor_transport(["npu", "cpu"])
@ray.remote(resources={"NPU": 1})
class RayActor:
def __init__(self):
register_yr_tensor_transport(["npu", "cpu"])
@ray.method(tensor_transport="YR")
def transfer_npu_tensor_via_hccs(self):
return torch.zeros(1024, device="npu")
@ray.method(tensor_transport="YR")
def transfer_cpu_tensor_via_rdma(self):
return torch.zeros(1024)
sender = RayActor.remote()
npu_tensor = ray.get(sender.transfer_npu_tensor_via_hccs.remote())
cpu_tensor = ray.get(sender.transfer_cpu_tensor_via_rdma.remote())| Ray Version | YR Transport | HCCL Collective | HCCL Tensor Transport (RDT) |
|---|---|---|---|
| >=2.55, <2.56 | ✅ | ❌ | ❌ |
| >= 2.56 | ✅ | ✅ | ✅ |
See CONTRIBUTING and developer guide for more details—a step-by-step guide to help you set up your development environment, build, and test. Please let us know if you find a bug or request a feature by filing an issue.
Apache License 2.0. See LICENSE file.