Add: SDMA workspace overlay + async completion demo on a5 onboard by jvjhfhg · Pull Request #1179 · hw-native-sys/simpler

jvjhfhg · 2026-06-27T08:10:50Z

Layers the host-side SDMA workspace allocation on top of the comm backend from the previous commit. Until CANN exposes the missing SDMA primitives on a5, this overlay is the only piece of comm work that fails on real a5 silicon -- aclnnShmemSdmaStarsQuery raises an AICPU exception (InnerCode=0x715002a) that aborts the entire ACL thread context. Dropping this commit therefore unblocks the non-SDMA comm demos (async_notify_demo etc.) without touching the deferred-completion runtime, which is already SDMA-aware on the kernel side (dormant until a kernel registers an SDMA condition).

Wire SdmaWorkspaceManager into comm_alloc_windows under SIMPLER_ENABLE_PTO_SDMA_WORKSPACE: pre-allocates the per-rank workspace via aclnnShmemSdmaStarsQuery and overlays the result into CommContext.workSpace/.workSpaceSize. On CANN 8.5 the dlsym fails by design and we demote to "no workspace" rather than failing comm_init.
a5 onboard CMakeLists forces SIMPLER_ENABLE_PTO_SDMA_WORKSPACE ON, requires PTO_ISA_ROOT (with FATAL_ERROR message pointing to the workspace coupling), adds pto-isa headers to the include path, and links libnnopbase.
runtime_compiler._init_a5 enforces the same PTO_ISA_ROOT env contract as _init_a2a3.
Migrate sdma_async_completion_demo to examples/a5/ (kernels + orch byte-identical with the a2a3 version; test.py platform- renamed).

coderabbitai · 2026-06-27T08:11:05Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c52ac82d-b10f-40a5-8c07-c412d2acb3a9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds an A5 SDMA async-completion demo: new device kernels and orchestration code, host-side workspace support, updated A5 build/runtime checks for PTO_ISA_ROOT, and a smoke test that builds, runs, and validates the two-device flow.

Changes

SDMA async completion demo

Layer / File(s)	Summary
A5 build and runtime contract `simpler_setup/runtime_compiler.py`, `src/a5/platform/onboard/host/CMakeLists.txt`	A5 host setup requires `PTO_ISA_ROOT`, adds its include path, and forces `SIMPLER_ENABLE_PTO_SDMA_WORKSPACE` into `host_runtime` compile and link settings.
HCCL workspace ownership `src/a5/platform/onboard/host/comm_hccl.cpp`	The HCCL host handle conditionally includes and owns `SdmaWorkspaceManager` under `SIMPLER_ENABLE_PTO_SDMA_WORKSPACE`, and the nearby window-allocation comment is updated.
Producer and consumer kernels `examples/a5/tensormap_and_ringbuffer/sdma_async_completion_demo/kernels/aiv/kernel_sdma_tget_async.cpp`, `examples/a5/tensormap_and_ringbuffer/sdma_async_completion_demo/kernels/aiv/kernel_consumer.cpp`	Adds the peer-window SDMA `kernel_entry` and the tile-processing consumer kernel entrypoint used by the demo.
Orchestration and smoke test `examples/a5/tensormap_and_ringbuffer/sdma_async_completion_demo/kernels/orchestration/sdma_async_completion_orch.cpp`, `examples/a5/tensormap_and_ringbuffer/sdma_async_completion_demo/test_sdma_async_completion_demo.py`	Adds orchestration entrypoints with four-argument validation and producer/consumer task submission, plus the Python smoke test that builds the chip callable, runs on two devices, and checks `out` and `result`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

hw-native-sys/simpler#823 — Adds the a5 comm port and deferred-completion/SDMA backend that this demo and workspace plumbing build on.
hw-native-sys/simpler#1166 — Updates SDMA/PTO-ISA header and include-path handling that matches the new workspace-gated host and kernel code here.

Poem

A bunny hopped through tiles so neat,
With SDMA drums beneath my feet.
One peer-window carrot, bright and plain,
Made peer_input + 1 shine again.
Thump-thump — the test passed cleanly! 🐇

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the SDMA workspace overlay and a5 async completion demo migration.
Description check	✅ Passed	The description is on-topic and matches the comm backend overlay, a5 build updates, and demo migration.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

gemini-code-assist

Code Review

This pull request introduces an SDMA deferred completion demo for the onboard a5 platform, adding consumer and async SDMA TGET kernels, orchestration logic, and a Python smoke test. It also enables the PTO-ISA async SDMA workspace pre-allocation by default, making PTO_ISA_ROOT a hard requirement for the a5 onboard host runtime. The review feedback suggests tightening argument validation in the orchestration code to prevent potential out-of-bounds access and robustly checking for empty PTO_ISA_ROOT environment variables in CMake.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-27T08:12:21Z

+    if (orch_args.tensor_count() + orch_args.scalar_count() != 4) {
+        LOG_ERROR("sdma_async_completion_demo: expected 4 args");
+        return;
+    }


The current check only verifies that the sum of tensor_count() and scalar_count() is 4. If the orchestrator is invoked with an unexpected combination of arguments (e.g., 2 tensors and 2 scalars), accessing orch_args.tensor(2) or other indices will result in an out-of-bounds access and potentially crash. It is safer to explicitly validate that tensor_count() is exactly 3 and scalar_count() is exactly 1.

Suggested change

if (orch_args.tensor_count() + orch_args.scalar_count() != 4) {

LOG_ERROR("sdma_async_completion_demo: expected 4 args");

return;

}

if (orch_args.tensor_count() != 3 || orch_args.scalar_count() != 1) {

LOG_ERROR("sdma_async_completion_demo: expected 3 tensors and 1 scalar");

return;

}

References

Ensure that index-based accessors perform bounds checks to prevent undefined behavior or out-of-bounds memory access.

gemini-code-assist · 2026-06-27T08:12:21Z

+if(NOT DEFINED ENV{PTO_ISA_ROOT})
+    message(FATAL_ERROR
+        "a5 onboard host_runtime requires PTO_ISA_ROOT "
+        "(SIMPLER_ENABLE_PTO_SDMA_WORKSPACE is forced ON; needs pto-isa headers + CANN 9.0+)")
+endif()


Checking NOT DEFINED ENV{PTO_ISA_ROOT} does not protect against cases where the environment variable is defined but empty. If PTO_ISA_ROOT is empty, appending it to CMAKE_CUSTOM_INCLUDE_DIRS will result in /include, which is incorrect. Checking against an empty string using STREQUAL "" and quoting the variable expansion handles both undefined and empty cases robustly.

if("$ENV{PTO_ISA_ROOT}" STREQUAL "") message(FATAL_ERROR "a5 onboard host_runtime requires PTO_ISA_ROOT " "(SIMPLER_ENABLE_PTO_SDMA_WORKSPACE is forced ON; needs pto-isa headers + CANN 9.0+)") endif()

References

In CMake, when checking if a string variable is empty or non-empty, explicitly check against an empty string using STREQUAL "" and quote the variable expansion.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@examples/a5/tensormap_and_ringbuffer/sdma_async_completion_demo/kernels/orchestration/sdma_async_completion_orch.cpp`:
- Around line 31-39: The current guard in sdma_async_completion_orch.cpp only
checks the total argument count, so a bad tensor/scalar mix can still reach
tensor(2) and scalar(0). Update the validation around the orchestration argument
parsing to verify the exact split expected by Tensor accessors and the comm_ctx
scalar, not just orch_args.tensor_count() + orch_args.scalar_count(). Keep the
existing error handling in the same flow so invalid inputs are rejected before
from_tensor_arg() and reinterpret_cast<CommContext *> are used.

In
`@examples/a5/tensormap_and_ringbuffer/sdma_async_completion_demo/test_sdma_async_completion_demo.py`:
- Around line 27-37: The test is still importing and using the deprecated
task-interface alias ContinuousTensor instead of the renamed Tensor type, so
update the import list in the sdma_async_completion demo test to use Tensor and
replace any ContinuousTensor references in the test setup with Tensor. Keep the
rest of the task-interface imports unchanged and ensure the test only depends on
the current public symbol from simpler.task_interface.
- Around line 66-86: The child callables built in the loop around
CoreCallable.build are advertising the wrong ABI because both entries reuse the
parent’s 4-arg signature. Update each child metadata entry to match the actual
kernel interface for kernel_sdma_tget_async.cpp and kernel_consumer.cpp, so the
producer and consumer callables each expose their real argument
directions/counts instead of the parent signature.

In `@src/a5/platform/onboard/host/CMakeLists.txt`:
- Around line 44-49: The current PTO_ISA_ROOT check in the host CMake logic only
verifies that the environment variable is defined, so an empty or nonexistent
path still reaches the include path append and fails later. Update the
validation near the CMakeLists.txt guard around the `PTO_ISA_ROOT` handling to
also reject empty values and paths that do not exist before `list(APPEND
CMAKE_CUSTOM_INCLUDE_DIRS ...)`, and keep the fatal error in the same
`host_runtime` setup path so configuration fails immediately with a clear
message.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 120b4b64-2bb8-4a14-88f8-98090fe3ab51

📥 Commits

Reviewing files that changed from the base of the PR and between 47a411c and 781c4e2.

📒 Files selected for processing (7)

examples/a5/tensormap_and_ringbuffer/sdma_async_completion_demo/kernels/aiv/kernel_consumer.cpp
examples/a5/tensormap_and_ringbuffer/sdma_async_completion_demo/kernels/aiv/kernel_sdma_tget_async.cpp
examples/a5/tensormap_and_ringbuffer/sdma_async_completion_demo/kernels/orchestration/sdma_async_completion_orch.cpp
examples/a5/tensormap_and_ringbuffer/sdma_async_completion_demo/test_sdma_async_completion_demo.py
simpler_setup/runtime_compiler.py
src/a5/platform/onboard/host/CMakeLists.txt
src/a5/platform/onboard/host/comm_hccl.cpp

coderabbitai · 2026-06-27T08:17:21Z

+    if (orch_args.tensor_count() + orch_args.scalar_count() != 4) {
+        LOG_ERROR("sdma_async_completion_demo: expected 4 args");
+        return;
+    }
+
+    Tensor input = from_tensor_arg(orch_args.tensor(0));
+    Tensor out = from_tensor_arg(orch_args.tensor(1));
+    Tensor result = from_tensor_arg(orch_args.tensor(2));
+    auto *comm_ctx = reinterpret_cast<CommContext *>(static_cast<uintptr_t>(orch_args.scalar(0)));


🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Validate the tensor/scalar split, not just total arg count.

A call with 4 args but the wrong mix can pass this guard and still fail when accessing tensor(2) or scalar(0).

Proposed fix

- if (orch_args.tensor_count() + orch_args.scalar_count() != 4) { - LOG_ERROR("sdma_async_completion_demo: expected 4 args"); + if (orch_args.tensor_count() != 3 || orch_args.scalar_count() != 1) { + LOG_ERROR("sdma_async_completion_demo: expected 3 tensor args and 1 scalar arg"); return; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (orch_args.tensor_count() + orch_args.scalar_count() != 4) {

LOG_ERROR("sdma_async_completion_demo: expected 4 args");

return;

}

Tensor input = from_tensor_arg(orch_args.tensor(0));

Tensor out = from_tensor_arg(orch_args.tensor(1));

Tensor result = from_tensor_arg(orch_args.tensor(2));

auto *comm_ctx = reinterpret_cast<CommContext *>(static_cast<uintptr_t>(orch_args.scalar(0)));

if (orch_args.tensor_count() != 3 || orch_args.scalar_count() != 1) {

LOG_ERROR("sdma_async_completion_demo: expected 3 tensor args and 1 scalar arg");

return;

}

Tensor input = from_tensor_arg(orch_args.tensor(0));

Tensor out = from_tensor_arg(orch_args.tensor(1));

Tensor result = from_tensor_arg(orch_args.tensor(2));

auto *comm_ctx = reinterpret_cast<CommContext *>(static_cast<uintptr_t>(orch_args.scalar(0)));

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/a5/tensormap_and_ringbuffer/sdma_async_completion_demo/kernels/orchestration/sdma_async_completion_orch.cpp` around lines 31 - 39, The current guard in sdma_async_completion_orch.cpp only checks the total argument count, so a bad tensor/scalar mix can still reach tensor(2) and scalar(0). Update the validation around the orchestration argument parsing to verify the exact split expected by Tensor accessors and the comm_ctx scalar, not just orch_args.tensor_count() + orch_args.scalar_count(). Keep the existing error handling in the same flow so invalid inputs are rejected before from_tensor_arg() and reinterpret_cast<CommContext *> are used.

coderabbitai · 2026-06-27T08:17:21Z

+from simpler.task_interface import (
+    ArgDirection,
+    CallConfig,
+    ChipCallable,
+    CommBufferSpec,
+    ContinuousTensor,
+    CoreCallable,
+    DataType,
+    TaskArgs,
+    TensorArgType,
+)


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Use the renamed Tensor task-interface type.

This new test reintroduces ContinuousTensor; switch to the current hard-renamed symbol to avoid depending on an old alias. Based on learnings, renamed public Python types/classes are hard renames and old names such as ContinuousTensor should be removed across the repo.

Proposed fix

from simpler.task_interface import ( ArgDirection, CallConfig, ChipCallable, CommBufferSpec, - ContinuousTensor, CoreCallable, DataType, TaskArgs, + Tensor, TensorArgType, ) @@ - ContinuousTensor.make( + Tensor.make( data=domain.buffer_ptrs["input_window"], shapes=(N,), dtype=DataType.FLOAT32,

Also applies to: 162-168

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/a5/tensormap_and_ringbuffer/sdma_async_completion_demo/test_sdma_async_completion_demo.py` around lines 27 - 37, The test is still importing and using the deprecated task-interface alias ContinuousTensor instead of the renamed Tensor type, so update the import list in the sdma_async_completion demo test to use Tensor and replace any ContinuousTensor references in the test setup with Tensor. Keep the rest of the task-interface imports unchanged and ensure the test only depends on the current public symbol from simpler.task_interface.

Source: Learnings

coderabbitai · 2026-06-27T08:17:21Z

+    children = []
+    for func_id, rel in [
+        (0, "kernels/aiv/kernel_sdma_tget_async.cpp"),
+        (1, "kernels/aiv/kernel_consumer.cpp"),
+    ]:
+        kernel = kc.compile_incore(
+            source_path=os.path.join(HERE, rel),
+            core_type="aiv",
+            pto_isa_root=pto_isa_root,
+            extra_include_dirs=extra_includes,
+        )
+        if not platform.endswith("sim"):
+            kernel = extract_text_section(kernel)
+        children.append(
+            (
+                func_id,
+                CoreCallable.build(
+                    signature=[ArgDirection.IN, ArgDirection.OUT, ArgDirection.OUT, ArgDirection.IN],
+                    binary=kernel,
+                ),
+            )


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Give each child callable its actual ABI signature.

The producer kernel is submitted with 3 args and the consumer with 2 args, but both child metadata entries advertise the parent’s 4-arg signature. If runtime validation uses this metadata, child dispatch can reject valid submissions or misdescribe the callable ABI.

Proposed fix

- for func_id, rel in [ - (0, "kernels/aiv/kernel_sdma_tget_async.cpp"), - (1, "kernels/aiv/kernel_consumer.cpp"), + for func_id, rel, signature in [ + (0, "kernels/aiv/kernel_sdma_tget_async.cpp", [ArgDirection.IN, ArgDirection.OUT, ArgDirection.IN]), + (1, "kernels/aiv/kernel_consumer.cpp", [ArgDirection.IN, ArgDirection.OUT]), ]: @@ func_id, CoreCallable.build( - signature=[ArgDirection.IN, ArgDirection.OUT, ArgDirection.OUT, ArgDirection.IN], + signature=signature, binary=kernel, ),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

children = []

for func_id, rel in [

(0, "kernels/aiv/kernel_sdma_tget_async.cpp"),

(1, "kernels/aiv/kernel_consumer.cpp"),

]:

kernel = kc.compile_incore(

source_path=os.path.join(HERE, rel),

core_type="aiv",

pto_isa_root=pto_isa_root,

extra_include_dirs=extra_includes,

)

if not platform.endswith("sim"):

kernel = extract_text_section(kernel)

children.append(

(

func_id,

CoreCallable.build(

signature=[ArgDirection.IN, ArgDirection.OUT, ArgDirection.OUT, ArgDirection.IN],

binary=kernel,

),

)

children = []

for func_id, rel, signature in [

(0, "kernels/aiv/kernel_sdma_tget_async.cpp", [ArgDirection.IN, ArgDirection.OUT, ArgDirection.IN]),

(1, "kernels/aiv/kernel_consumer.cpp", [ArgDirection.IN, ArgDirection.OUT]),

]:

kernel = kc.compile_incore(

source_path=os.path.join(HERE, rel),

core_type="aiv",

pto_isa_root=pto_isa_root,

extra_include_dirs=extra_includes,

)

if not platform.endswith("sim"):

kernel = extract_text_section(kernel)

children.append(

(

func_id,

CoreCallable.build(

signature=signature,

binary=kernel,

),

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/a5/tensormap_and_ringbuffer/sdma_async_completion_demo/test_sdma_async_completion_demo.py` around lines 66 - 86, The child callables built in the loop around CoreCallable.build are advertising the wrong ABI because both entries reuse the parent’s 4-arg signature. Update each child metadata entry to match the actual kernel interface for kernel_sdma_tget_async.cpp and kernel_consumer.cpp, so the producer and consumer callables each expose their real argument directions/counts instead of the parent signature.

coderabbitai · 2026-06-27T08:17:21Z

+if(NOT DEFINED ENV{PTO_ISA_ROOT})
+    message(FATAL_ERROR
+        "a5 onboard host_runtime requires PTO_ISA_ROOT "
+        "(SIMPLER_ENABLE_PTO_SDMA_WORKSPACE is forced ON; needs pto-isa headers + CANN 9.0+)")
+endif()
+list(APPEND CMAKE_CUSTOM_INCLUDE_DIRS "$ENV{PTO_ISA_ROOT}/include")


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Fail fast when PTO_ISA_ROOT is empty or invalid.

DEFINED ENV{PTO_ISA_ROOT} still passes for an empty or nonexistent path, then line 49 appends a broken include directory and defers the failure to compilation.

Proposed fix

-if(NOT DEFINED ENV{PTO_ISA_ROOT}) +if(NOT DEFINED ENV{PTO_ISA_ROOT} + OR "$ENV{PTO_ISA_ROOT}" STREQUAL "" + OR NOT EXISTS "$ENV{PTO_ISA_ROOT}/include") message(FATAL_ERROR - "a5 onboard host_runtime requires PTO_ISA_ROOT " + "a5 onboard host_runtime requires PTO_ISA_ROOT to point to a valid pto-isa checkout " "(SIMPLER_ENABLE_PTO_SDMA_WORKSPACE is forced ON; needs pto-isa headers + CANN 9.0+)") endif()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if(NOT DEFINED ENV{PTO_ISA_ROOT})

message(FATAL_ERROR

"a5 onboard host_runtime requires PTO_ISA_ROOT "

"(SIMPLER_ENABLE_PTO_SDMA_WORKSPACE is forced ON; needs pto-isa headers + CANN 9.0+)")

endif()

list(APPEND CMAKE_CUSTOM_INCLUDE_DIRS "$ENV{PTO_ISA_ROOT}/include")

if(NOT DEFINED ENV{PTO_ISA_ROOT}

OR "$ENV{PTO_ISA_ROOT}" STREQUAL ""

OR NOT EXISTS "$ENV{PTO_ISA_ROOT}/include")

message(FATAL_ERROR

"a5 onboard host_runtime requires PTO_ISA_ROOT to point to a valid pto-isa checkout "

"(SIMPLER_ENABLE_PTO_SDMA_WORKSPACE is forced ON; needs pto-isa headers + CANN 9.0+)")

endif()

list(APPEND CMAKE_CUSTOM_INCLUDE_DIRS "$ENV{PTO_ISA_ROOT}/include")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/a5/platform/onboard/host/CMakeLists.txt` around lines 44 - 49, The current PTO_ISA_ROOT check in the host CMake logic only verifies that the environment variable is defined, so an empty or nonexistent path still reaches the include path append and fails later. Update the validation near the CMakeLists.txt guard around the `PTO_ISA_ROOT` handling to also reject empty values and paths that do not exist before `list(APPEND CMAKE_CUSTOM_INCLUDE_DIRS ...)`, and keep the fatal error in the same `host_runtime` setup path so configuration fails immediately with a clear message.

Layers the host-side SDMA workspace allocation on top of the comm backend from the previous commit. Until CANN exposes the missing SDMA primitives on a5, this overlay is the only piece of comm work that fails on real a5 silicon -- aclnnShmemSdmaStarsQuery raises an AICPU exception (InnerCode=0x715002a) that aborts the entire ACL thread context. Dropping this commit therefore unblocks the non-SDMA comm demos (async_notify_demo etc.) without touching the deferred-completion runtime, which is already SDMA-aware on the kernel side (dormant until a kernel registers an SDMA condition). - Wire SdmaWorkspaceManager into comm_alloc_windows under SIMPLER_ENABLE_PTO_SDMA_WORKSPACE: pre-allocates the per-rank workspace via aclnnShmemSdmaStarsQuery and overlays the result into CommContext.workSpace/.workSpaceSize. On CANN 8.5 the dlsym fails by design and we demote to "no workspace" rather than failing comm_init. - a5 onboard CMakeLists forces SIMPLER_ENABLE_PTO_SDMA_WORKSPACE ON, requires PTO_ISA_ROOT (with FATAL_ERROR message pointing to the workspace coupling), adds pto-isa headers to the include path, and links libnnopbase. - runtime_compiler._init_a5 enforces the same PTO_ISA_ROOT env contract as _init_a2a3. - Migrate sdma_async_completion_demo to examples/a5/ (kernels + orch byte-identical with the a2a3 version; test.py platform- renamed).

jvjhfhg changed the title ~~[WIP] Add: SDMA workspace overlay + async completion demo on a5 onboard~~ Add: SDMA workspace overlay + async completion demo on a5 onboard Jun 27, 2026

gemini-code-assist Bot reviewed Jun 27, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 27, 2026

View reviewed changes

jvjhfhg force-pushed the feat/comm-a5-sdma branch 2 times, most recently from 28fceb4 to c0829e5 Compare June 29, 2026 03:34

jvjhfhg force-pushed the feat/comm-a5-sdma branch from c0829e5 to 901b2e3 Compare June 30, 2026 01:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add: SDMA workspace overlay + async completion demo on a5 onboard#1179

Add: SDMA workspace overlay + async completion demo on a5 onboard#1179
jvjhfhg wants to merge 1 commit into
hw-native-sys:mainfrom
jvjhfhg:feat/comm-a5-sdma

jvjhfhg commented Jun 27, 2026

Uh oh!

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 27, 2026

Uh oh!

gemini-code-assist Bot Jun 27, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Uh oh!

coderabbitai Bot Jun 27, 2026

Uh oh!

coderabbitai Bot Jun 27, 2026

Uh oh!

coderabbitai Bot Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jvjhfhg commented Jun 27, 2026

Uh oh!

coderabbitai Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading