Skip to content

Refactor: split PTO2RuntimeArenaLayout into sizing + offsets#1219

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:refactor/split-arena-layout-sizing-offsets
Jul 1, 2026
Merged

Refactor: split PTO2RuntimeArenaLayout into sizing + offsets#1219
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:refactor/split-arena-layout-sizing-offsets

Conversation

@ChaoWao

@ChaoWao ChaoWao commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

PTO2RuntimeArenaLayout mixed two semantics in one flat struct: the layout-defining capacities (input) and the computed sub-region offsets (output). This splits it into two named halves so each side reads as what it is:

  • ArenaSizingKeytask_window_sizes / heap_sizes / dep_pool_capacities + scheduler_timeout_ms. The input to runtime_reserve_layout, re-read at AICPU boot.
  • ArenaOffsetsoff_sm_handle / orch / sched / off_runtime / off_mailbox + arena_size. The output, consumed by init_data_from_layout + wire_arena_pointers (the AICPU re-wires arena-internal pointers from these after rtMemcpy).

PTO2RuntimeArenaLayout now composes the two as .sizing / .offsets. All field-access sites are updated across both arches: pto_runtime2.h, pto_runtime2_init.cpp, aicpu_executor.cpp, scheduler_dispatch.cpp, runtime_maker.cpp.

The PTO2SchedulerLayout / PTO2OrchestratorLayout sibling structs have same-named fields (task_window_sizes, dep_pool_capacities) and are deliberately left untouched — only the three PTO2RuntimeArenaLayout functions were migrated.

⚠️ ABI change

The layout is embedded in PTO2Runtime::prebuilt_layout, rtMemcpy'd to device, and re-read at AICPU boot — so this field reorder is a host↔device ABI change. Sim cannot catch an offset drift (host + device share one address space); only onboard validates it.

Stacked on #1215

Depends on #1215 (B1: bind_callable lifecycle split) — B2's runtime_maker.cpp edits land inside the helpers #1215 introduces. The diff will narrow to this commit once #1215 merges.

Testing

  • a2a3sim trb ST — 30 passed, 1 skipped
  • a5sim trb ST — 20 passed
  • a2a3 onboard trb ST — 33 passed, 1 skipped (the decisive ABI test; reorganized layout rtMemcpy'd to device, re-read at boot, correct values, no hangs)
  • a5 onboard — NOT run (this box is a2a3 silicon; a5 onboard refused). Required pre-merge on a5 hardware / CI. a5 changes are structurally identical to a2a3, build clean, and pass a5 sim.

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e6736685-7010-446b-898c-a6eecec1b1f1

📥 Commits

Reviewing files that changed from the base of the PR and between f4d14bf and 46431c5.

📒 Files selected for processing (12)
  • src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2.h
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/shared/pto_runtime2_init.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2.h
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/shared/pto_runtime2_init.cpp
  • tests/ut/cpp/a2a3/test_shared_memory.cpp
  • tests/ut/cpp/a5/test_shared_memory.cpp

📝 Walkthrough

Walkthrough

PTO2RuntimeArenaLayout is refactored in both a2a3 and a5 runtimes by extracting two new sub-structs: ArenaSizingKey (capacity arrays and scheduler timeout) and ArenaOffsets (computed sub-region offsets and arena size). All consumers—init, host maker, executor, scheduler, and tests—are updated to use the nested sizing.* and offsets.* fields.

Changes

PTO2RuntimeArenaLayout struct split and consumer updates

Layer / File(s) Summary
ArenaSizingKey and ArenaOffsets struct definitions
src/a5/runtime/.../pto_runtime2.h, src/a2a3/runtime/.../pto_runtime2.h
Introduces ArenaSizingKey (capacity arrays + timeout) and ArenaOffsets (sub-region offsets + arena size); PTO2RuntimeArenaLayout now nests both instead of holding flat fields.
runtime_reserve_layout and runtime_init/wire updates
src/a5/runtime/.../shared/pto_runtime2_init.cpp, src/a2a3/runtime/.../shared/pto_runtime2_init.cpp
runtime_reserve_layout writes through layout.sizing.* and layout.offsets.*; runtime_init_data_from_layout and runtime_wire_arena_pointers read pointers and wiring offsets from the same nested fields.
Host runtime_maker arena setup and launch-state binding
src/a5/runtime/.../host/runtime_maker.cpp, src/a2a3/runtime/.../host/runtime_maker.cpp
ensure_static_arenas, build_runtime_image, and bind_launch_state updated to use layout.offsets.arena_size, layout.offsets.off_runtime, and layout.sizing.scheduler_timeout_ms.
AicpuExecutor and SchedulerContext consumer updates
src/a5/.../aicpu/aicpu_executor.cpp, src/a2a3/.../aicpu/aicpu_executor.cpp, src/a5/.../scheduler/scheduler_dispatch.cpp, src/a2a3/.../scheduler/scheduler_dispatch.cpp
SM size computation, init_per_ring arguments, profiling capacity calls, and scheduler timeout derivation all updated to use prebuilt_layout.sizing.*.
Unit test updates
tests/ut/cpp/a5/test_shared_memory.cpp, tests/ut/cpp/a2a3/test_shared_memory.cpp
Assertions and init_data_from_layout call arguments updated to reflect the new layout.sizing.* and layout.offsets.orch paths.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • hw-native-sys/simpler#846: Introduced runtime_reserve_layout/init/wire/finalize and the same core prebuilt_layout wiring callsites that this PR now updates.
  • hw-native-sys/simpler#1099: Modified per-ring runtime arena sizing data flow in PTO2RuntimeArenaLayout and the same AicpuExecutor::run / SchedulerContext callsites.

Poem

🐇 A layout once flat and wide,
Now nests its fields deep inside.
sizing holds what the rings require,
offsets tells where arenas conspire.
With two tidy structs, the wiring's clean—
The tidiest refactor a bunny's seen! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: splitting PTO2RuntimeArenaLayout into sizing and offsets.
Description check ✅ Passed The description is directly aligned with the refactor and accurately explains the layout split and affected files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the runtime-arena layout and initialization logic across the a2a3 and a5 platforms, splitting PTO2RuntimeArenaLayout into separate sizing and offsets structures and modularizing bind_callable_to_runtime_impl into distinct helper functions. Feedback on these changes highlights the need to prevent signed integer overflow by accumulating window sizes using a 64-bit integer before casting to int32_t in pto_runtime2_init.cpp. Additionally, it is recommended to validate that the signed integer sig_count is non-negative in stage_device_args to avoid potential out-of-bounds array indexing.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
Comment thread src/a5/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
@ChaoWao ChaoWao force-pushed the refactor/split-arena-layout-sizing-offsets branch from 6d84f7b to 7c3060d Compare June 30, 2026 14:49
PTO2RuntimeArenaLayout mixed two semantics in one flat struct: the
layout-defining capacities (input) and the computed sub-region offsets
(output). Split into two named halves so each side reads as what it is:

- ArenaSizingKey: task_window/heap/dep_pool sizes + scheduler_timeout_ms
  — the input to runtime_reserve_layout, re-read at AICPU boot
- ArenaOffsets:   off_sm_handle/orch/sched/off_runtime/off_mailbox +
  arena_size — the output, consumed by init_data + wire (the AICPU
  re-wires arena-internal pointers from these after rtMemcpy)

PTO2RuntimeArenaLayout now composes the two as `.sizing` / `.offsets`.
All field-access sites are updated across both arches: pto_runtime2.h,
pto_runtime2_init.cpp (the three RuntimeArenaLayout functions only — the
PTO2SchedulerLayout/PTO2OrchestratorLayout siblings have same-named
fields and are deliberately left untouched), aicpu_executor.cpp,
scheduler_dispatch.cpp, runtime_maker.cpp, and the cpp unit test
tests/ut/cpp/{a2a3,a5}/test_shared_memory.cpp.

This is a host<->device ABI change: the layout is embedded in
PTO2Runtime::prebuilt_layout, rtMemcpy'd to device, and re-read at AICPU
boot, so the field reorder must be validated on hardware (sim shares one
address space and cannot catch an offset drift).

Verified: a2a3sim 30 passed/1 skipped, a5sim 20 passed, a2a3 onboard
33 passed/1 skipped, cpp UT test_shared_memory compiles clean. a5 onboard
covered by CI (this box is a2a3 silicon).
@ChaoWao ChaoWao force-pushed the refactor/split-arena-layout-sizing-offsets branch from 7c3060d to 46431c5 Compare June 30, 2026 15:31
@ChaoWao ChaoWao merged commit 62adb13 into hw-native-sys:main Jul 1, 2026
16 checks passed
@ChaoWao ChaoWao deleted the refactor/split-arena-layout-sizing-offsets branch July 1, 2026 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant