Skip to content

Refactor: split trb bind_callable into lifecycle helpers#1215

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:refactor/split-bind-callable-lifecycle-helpers
Jun 30, 2026
Merged

Refactor: split trb bind_callable into lifecycle helpers#1215
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:refactor/split-bind-callable-lifecycle-helpers

Conversation

@ChaoWao

@ChaoWao ChaoWao commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

The ~200-line bind_callable_to_runtime_impl (trb host side) folded three distinct lifecycles into one function. This splits it into named steps so the entry point reads as the lifecycles it orchestrates:

  • resolve_arena_sizing (per-config) — ring sizing + derived heap/SM sizes + scheduler timeout (the layout input half, pure host arithmetic)
  • stage_device_args (per-run) — the only signature-aware step: H2D copy / pure-OUTPUT zeroing / copy-back recording
  • apply_orch_sched_env_flags (per-run) — latch the orch→sched env gates
  • ensure_static_arenas (per-config) — reserve + acquire the static pools
  • build_runtime_image (per-config) — pure host image build, no device touch (the hook a later image-cache stage can memoize)
  • bind_launch_state (per-run) — publish args + rtMemcpy + record device base

bind_callable_to_runtime_impl collapses to a ~45-line orchestrator.

Behavior is byte-identical: all TIMING: logs, the simpler_run.bind.{args,prebuilt} STRACE spans (consumed by pypto-serving), log ordering, and error paths are preserved. The host DeviceArena stays a caller-owned local passed by reference (it is non-copyable/non-movable), so the image outlives the call until upload.

Also re-syncs the drifted a2a3/a5 copies: a5 adopts the STRACE markers, common/strace.h include, and pto2_-prefixed naming that were pure drift — the two runtime_maker.cpp files are now byte-identical.

This is the host-side function-split groundwork; the device-ABI layout split, static host_api relocation, and register-time image caching are separate follow-up stages.

Testing

  • Simulation tests pass — tests/st/{a2a3,a5}/tensormap_and_ringbuffer: a2a3sim 30 passed / 1 skipped, a5sim 20 passed
  • Hardware tests (not required: host-only change, behavior byte-identical, sim-verified)

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@ChaoWao, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 32 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1906599c-4f85-4caf-91ac-ca62fa86206e

📥 Commits

Reviewing files that changed from the base of the PR and between b0b1976 and f4ba6c9.

📒 Files selected for processing (2)
  • src/a2a3/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors bind_callable_to_runtime_impl in both the a2a3 and a5 runtime makers by extracting several modular helper functions to handle arena sizing, device arguments staging, environment flags, static arenas, runtime image building, and launch state binding. Feedback highlights a potential state leakage issue in both implementations where the orch_to_sched flag is set to true if the PTO2_ORCH_TO_SCHED environment variable is truthy, but is never reset to false if the variable is unset or falsy, which could persist incorrect configurations across multiple runs.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp Outdated
Comment thread src/a5/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp Outdated
The ~200-line bind_callable_to_runtime_impl folded three distinct
lifecycles into one function. Split it into named steps so the entry
point reads as the lifecycles it orchestrates:

- resolve_arena_sizing  (per-config): ring sizing + derived heap/SM
  sizes + scheduler timeout — the layout input half, host arithmetic
- stage_device_args     (per-run): the only signature-aware step —
  H2D copy / pure-OUTPUT zeroing / copy-back recording
- apply_orch_sched_env_flags (per-run): latch the orch->sched env gates
- ensure_static_arenas  (per-config): reserve + acquire the static pools
- build_runtime_image   (per-config): pure host image build, no device
  touch — the hook a later image-cache stage can memoize
- bind_launch_state     (per-run): publish args + rtMemcpy + record base

Behavior is byte-identical: TIMING logs, the simpler_run.bind.{args,
prebuilt} STRACE spans, log ordering, and error paths are preserved.
The host DeviceArena stays a caller-owned local passed by reference
(it is non-copyable/non-movable), so the image outlives the call until
upload.

Also re-syncs the drifted a2a3/a5 runtime_maker copies: a5 adopts the
STRACE markers, common/strace.h include, and pto2_-prefixed naming that
were pure drift, leaving the two files byte-identical.

Verified on sim (behavior unchanged): a2a3sim trb ST 30 passed/1
skipped, a5sim trb ST 20 passed.
@ChaoWao ChaoWao force-pushed the refactor/split-bind-callable-lifecycle-helpers branch from aa8fd1e to f4ba6c9 Compare June 30, 2026 13:13
@ChaoWao ChaoWao merged commit f4d14bf into hw-native-sys:main Jun 30, 2026
16 checks passed
@ChaoWao ChaoWao deleted the refactor/split-bind-callable-lifecycle-helpers branch June 30, 2026 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant