Refactor: remove PTO2_ORCH_TO_SCHED feature and dead transition branches by ChaoWao · Pull Request #1217 · hw-native-sys/simpler

ChaoWao · 2026-06-30T12:48:34Z

Summary

PTO2_ORCH_TO_SCHED was a default-off env flag in the
tensormap_and_ringbuffer runtime. When set, orchestrator AICPU threads
transitioned into scheduler threads after the task graph was built, driving
a core-reassignment handshake. The flag was read by no test, CI job, or
build script, so the entire transition path was permanent dead weight under
the default — pure configuration debt per the repo's env-macro-gating rule.

This removes the flag and the core-transition machinery that was
reachable only when it was set. Applied symmetrically to both a2a3 and
a5 runtimes.

Removed

Flag plumbing: the getenv("PTO2_ORCH_TO_SCHED") read in
runtime_maker.cpp, the Runtime::orch_to_sched member, its init reset,
and the orch_to_sched_ executor member + dispatch gate
(thread_idx < sched_thread_num_ || orch_to_sched_ →
thread_idx < sched_thread_num_).
Core-transition machinery (dead once the flag is gone):
handle_core_transition(), reassign_cores_for_all_threads(), the
transition_requested_ / wait_reassign_ / reassigned_ atomics, the
cores_released dispatch-loop branch, and the transition block in
on_orchestration_done(). The two orch_to_sched_ ternaries collapse to
their default arms.
Stale docs/comments: profiling_levels.md, RUNTIME_LOGIC.md,
SUBMIT_BY_CLUSTER.md, dynamic-linking.md, and the swimlane-collector
comment clauses.

Out of scope

The separate serial_orch_sched / PTO2_SERIAL_ORCH_SCHED flag is left
untouched.

Net: 25 files, +62 / −426.

Testing

Both runtimes build clean on a2a3 and a5 (AICore/AICPU/Host)
Simulation tests pass — dummy_task + batch_paged_attention on
a2a3sim, dummy_task on a5sim
Hardware tests (CI)

coderabbitai · 2026-06-30T12:48:43Z

📝 Walkthrough

Walkthrough

Removes the orch_to_sched_ flag and associated core-transition machinery (handle_core_transition, reassign_cores_for_all_threads, transition_requested_, wait_reassign_, reassigned_) from SchedulerContext and AicpuExecutor across both a2a3 and a5 targets. The PTO2_ORCH_TO_SCHED env-var parsing is removed, dispatcher eligibility is narrowed to scheduler threads only, and documentation/comments are updated throughout.

Changes

Remove orch_to_sched scheduling mode

Layer / File(s)	Summary
SchedulerContext public contract `src/a2a3/runtime/.../scheduler/scheduler_context.h`, `src/a5/runtime/.../scheduler/scheduler_context.h`	Removes `bool orch_to_sched` from `init(...)`, removes member fields `transition_requested_`, `wait_reassign_`, `reassigned_`, `orch_to_sched_`, removes `reassign_cores_for_all_threads()` and `handle_core_transition` declarations, and adds `handle_orchestrator_exit`/`check_idle_fatal_error` (a5).
Runtime constructor and env-flag cleanup `src/a2a3/runtime/.../runtime/runtime.h`, `src/a2a3/runtime/.../shared/runtime.cpp`, `src/a2a3/runtime/.../host/runtime_maker.cpp`, `src/a5/runtime/.../runtime/runtime.h`, `src/a5/runtime/.../shared/runtime.cpp`, `src/a5/runtime/.../host/runtime_maker.cpp`	Removes `dev.orch_to_sched = false` from Runtime constructor and `PTO2_ORCH_TO_SCHED` env-var parsing from `apply_orch_sched_env_flags` in both targets.
AicpuExecutor init / run / deinit `src/a2a3/runtime/.../aicpu/aicpu_executor.cpp`, `src/a5/runtime/.../aicpu/aicpu_executor.cpp`	Replaces stored `orch_to_sched_` with `serial_orch_sched_`; updates `init` to read from `runtime->dev.serial_orch_sched` and drop the flag from `sched_ctx_.init(...)`; narrows dispatch gate in `run` to `thread_idx < sched_thread_num_` only; updates `deinit` resets accordingly.
SchedulerContext cold-path implementation `src/a2a3/runtime/.../scheduler/scheduler_cold_path.cpp`, `src/a5/runtime/.../scheduler/scheduler_cold_path.cpp`	Removes `handle_core_transition` and `reassign_cores_for_all_threads` bodies; updates `init` signature/body (drops `orch_to_sched_` assignment, simplifies L2 swimlane `sched_phase_threads`, updates `dump_args_init`); cleans `deinit` transition-state resets; updates `on_orchestration_done` (removes core-transition control flow, marks `thread_idx` `[[maybe_unused]]`); adds `check_idle_fatal_error` (a5 only).
scheduler_dispatch.cpp core-transition removal `src/a2a3/runtime/.../scheduler/scheduler_dispatch.cpp`, `src/a5/runtime/.../scheduler/scheduler_dispatch.cpp`	Removes `cores_released` local variable and the `if (!cores_released && orch_to_sched_)` block that called `handle_core_transition` in `resolve_and_dispatch`.
L2 swimlane API comments `src/a2a3/platform/include/aicpu/l2_swimlane_collector_aicpu.h`, `src/a2a3/platform/shared/aicpu/l2_swimlane_collector_aicpu.cpp`, `src/a5/platform/include/aicpu/l2_swimlane_collector_aicpu.h`, `src/a5/platform/shared/aicpu/l2_swimlane_collector_aicpu.cpp`	Removes `orch_to_sched` mode qualifiers from Doxygen comments for `l2_swimlane_aicpu_init_phase` and `l2_swimlane_aicpu_set_orch_thread_idx`; simplifies inline orch-phase pool cache comment.
Documentation updates `docs/dynamic-linking.md`, `src/a2a3/runtime/.../docs/`, `src/a5/runtime/.../docs/`	Updates `RUNTIME_LOGIC.md` (init signature, `on_orchestration_done` description, cold-path responsibilities), `SUBMIT_BY_CLUSTER.md` (threading baseline, cluster ownership), `profiling_levels.md` (Level 1 log walkthrough, `LOG_INFO_V9` table, `PTO2_PROFILING` note), and `dynamic-linking.md` (deinit responsibilities).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

hw-native-sys/simpler#939: Touches scheduler_cold_path.cpp L2 swimlane init around l2_swimlane_aicpu_init_phase thread-count selection using orch_to_sched_, directly affected by this PR's removal of that flag.
hw-native-sys/simpler#1083: Removes PTO2_ORCH_TO_SCHED env-variable references in runtime_maker.cpp, profiling_levels.md, and runtime.h comments—overlapping with the same files changed here.
hw-native-sys/simpler#1176: Adds serial_orch_sched_ start-gate logic using orchestrator_done_ in the same SchedulerContext/AicpuExecutor code paths modified by this PR.

Poem

🐇 Hop, hop, away we throw
The flag that made the orch thread flow!
orch_to_sched_ — goodbye, farewell,
Scheduler threads now hold the spell.
Cleaner queues and simpler state,
One less flag to contemplate! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.04% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: removing PTO2_ORCH_TO_SCHED and related transition branches.
Description check	✅ Passed	The description matches the changeset, covering removal of the flag, transition machinery, and docs updates.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

gemini-code-assist

Code Review

This pull request removes the orchestrator-to-scheduler transition feature (orch_to_sched / PTO2_ORCH_TO_SCHED) from both the a2a3 and a5 runtimes. This cleanup eliminates the dynamic core reassignment logic (reassign_cores_for_all_threads, handle_core_transition), simplifies thread coordination, and updates associated documentation and profiling logs to reflect that orchestrator threads now exit immediately after building the task graph. No review comments were provided, so there is no additional feedback.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

The default-off PTO2_ORCH_TO_SCHED env flag made orchestrator AICPU threads transition into scheduler threads after the task graph was built, via a core-reassignment handshake. The flag was exercised by no test, CI job, or build script, so the whole transition path was permanent dead weight under the default. Removed across both a2a3 and a5 tensormap_and_ringbuffer runtimes: - Flag plumbing: the getenv("PTO2_ORCH_TO_SCHED") read in runtime_maker.cpp, the Runtime::orch_to_sched member, its init reset, and the orch_to_sched_ executor member + dispatch gate. - Core-transition machinery, reachable only when the flag was set (now dead): handle_core_transition(), reassign_cores_for_all_threads(), the transition_requested_/wait_reassign_/reassigned_ atomics, the cores_released dispatch-loop branch, and the transition block in on_orchestration_done(). The two orch_to_sched_ ternaries collapse to their default arms. - Stale doc/comment references (profiling_levels, RUNTIME_LOGIC, SUBMIT_BY_CLUSTER, dynamic-linking, swimlane collector comments). on_orchestration_done's thread_idx is now used only inside unconditional use), so it is marked [[maybe_unused]] to keep the PTO2_PROFILING=0 build warning-clean under -Werror=unused-parameter. The separate serial_orch_sched (PTO2_SERIAL_ORCH_SCHED) flag is left untouched. Builds clean on a2a3 and a5 across all profiling-flag combos; tensormap sim tests pass on a2a3sim and a5sim.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp (1)
875-901: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reset the cached L2 swimlane level before the enable branch.

Both cold-path init paths still reset l2_swimlane_level_ only in the disabled branch. Make the reset unconditional before is_l2_swimlane_enabled() so no prior launch state can leak through another branch.

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L875-L901: set l2_swimlane_level_ = L2SwimlaneLevel::DISABLED; immediately before the enable check.

src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L878-L908: apply the same unconditional reset.

Based on learnings, SchedulerContext::init() should reset l2_swimlane_level_ unconditionally before conditional logic to prevent cross-launch state leaks.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp`
around lines 875 - 901, Reset l2_swimlane_level_ unconditionally at the start of
SchedulerContext::init() before the is_l2_swimlane_enabled() check so stale
state cannot leak between launches. Apply this in
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L875-L901
and the matching
src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L878-L908
path, keeping the enable/disable branch logic intact after the reset.
Source: Learnings
docs/dynamic-linking.md (1)
224-235: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Finish removing stale transition/executor symbols from docs.

The docs cleanup is incomplete in two spots and still references removed or mismatched symbols.

docs/dynamic-linking.md#L224-L235: align the executor-owned field list with the current executor (aicpu_thread_num_, serial_orch_sched_, preserved orch_so_table_, etc.) instead of stale names like thread_num_, orch_func_, orch_so_handle_, and orch_so_path_.

src/a2a3/runtime/tensormap_and_ringbuffer/docs/RUNTIME_LOGIC.md#L555-L561: remove the remaining reassign_* mention now that the reassignment path is deleted.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/dynamic-linking.md` around lines 224 - 235, Update the stale
documentation symbols in both affected docs: in docs/dynamic-linking.md replace
the executor-owned field list in AicpuExecutor::deinit() with the current names
from the executor implementation (for example aicpu_thread_num_,
serial_orch_sched_, and preserved orch_so_table_), and remove the outdated
thread_num_ / orch_func_ / orch_so_handle_ / orch_so_path_ references; in
src/a2a3/runtime/tensormap_and_ringbuffer/docs/RUNTIME_LOGIC.md remove the
leftover reassign_* mention since that reassignment path no longer exists.
src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp (1)
205-216: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Reject 1-thread runs now that the orchestrator never dispatches.

Line 207 can still derive sched_thread_num_ == 0, but Line 730 now hard-stops dispatch to thread_idx < sched_thread_num_. In that case the only thread takes the orchestrator path on Line 383 and no thread ever calls resolve_and_dispatch(...), so submitted work can be left undrained. Either require aicpu_thread_num_ >= 2 here, or restore an explicit supported single-thread dispatch path.
Proposed fix
-    if (aicpu_thread_num_ < 1 || aicpu_thread_num_ > MAX_AICPU_THREADS) {
+    if (aicpu_thread_num_ < 2 || aicpu_thread_num_ > MAX_AICPU_THREADS) {
         LOG_ERROR("Invalid aicpu_thread_num: %d", aicpu_thread_num_);
         init_failed_.store(true, std::memory_order_release);
         return -1;
     }
Also applies to: 729-730
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp` around
lines 205 - 216, The current initialization in aicpu_executor.cpp still allows
aicpu_thread_num_ to become 1, which leaves sched_thread_num_ at 0 and prevents
any thread from reaching resolve_and_dispatch while the orchestrator path
handles everything; fix this in the AicpuExecutor init path by either rejecting
single-thread configurations up front or adding a supported single-thread
dispatch flow. Update the validation around aicpu_thread_num_,
sched_thread_num_, and the dispatch gating logic that uses thread_idx <
sched_thread_num_ so the runtime cannot initialize into an undrained state.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/a2a3/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md`:
- Around line 95-99: The Level 1 profiling documentation is counting orch-side
logs that are actually gated by PTO2_ORCH_PROFILING, so the PTO2_PROFILING-only
totals are too high. In
src/a2a3/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md at 95-99 and
404-410, remove the orch-side log term from the Level 1 formula/count or move it
to Level 3 so it matches what AICPUExecutor emits under PTO2_PROFILING alone;
make the same correction in
src/a5/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md at 95-99 and
374-380. Use AICPUExecutor and the orch_start/orch_end/orch_cost plus PTO2 total
submitted tasks logs as the reference points for the counts.

---

Outside diff comments:
In `@docs/dynamic-linking.md`:
- Around line 224-235: Update the stale documentation symbols in both affected
docs: in docs/dynamic-linking.md replace the executor-owned field list in
AicpuExecutor::deinit() with the current names from the executor implementation
(for example aicpu_thread_num_, serial_orch_sched_, and preserved
orch_so_table_), and remove the outdated thread_num_ / orch_func_ /
orch_so_handle_ / orch_so_path_ references; in
src/a2a3/runtime/tensormap_and_ringbuffer/docs/RUNTIME_LOGIC.md remove the
leftover reassign_* mention since that reassignment path no longer exists.

In `@src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp`:
- Around line 205-216: The current initialization in aicpu_executor.cpp still
allows aicpu_thread_num_ to become 1, which leaves sched_thread_num_ at 0 and
prevents any thread from reaching resolve_and_dispatch while the orchestrator
path handles everything; fix this in the AicpuExecutor init path by either
rejecting single-thread configurations up front or adding a supported
single-thread dispatch flow. Update the validation around aicpu_thread_num_,
sched_thread_num_, and the dispatch gating logic that uses thread_idx <
sched_thread_num_ so the runtime cannot initialize into an undrained state.

In
`@src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp`:
- Around line 875-901: Reset l2_swimlane_level_ unconditionally at the start of
SchedulerContext::init() before the is_l2_swimlane_enabled() check so stale
state cannot leak between launches. Apply this in
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L875-L901
and the matching
src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L878-L908
path, keeping the enable/disable branch logic intact after the reset.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: faa87919-593c-42fa-8113-872a43a92dec

📥 Commits

Reviewing files that changed from the base of the PR and between f4d14bf and 02ac7e1.

📒 Files selected for processing (25)

docs/dynamic-linking.md
src/a2a3/platform/include/aicpu/l2_swimlane_collector_aicpu.h
src/a2a3/platform/shared/aicpu/l2_swimlane_collector_aicpu.cpp
src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp
src/a2a3/runtime/tensormap_and_ringbuffer/docs/RUNTIME_LOGIC.md
src/a2a3/runtime/tensormap_and_ringbuffer/docs/SUBMIT_BY_CLUSTER.md
src/a2a3/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md
src/a2a3/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/runtime.h
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_context.h
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/shared/runtime.cpp
src/a5/platform/include/aicpu/l2_swimlane_collector_aicpu.h
src/a5/platform/shared/aicpu/l2_swimlane_collector_aicpu.cpp
src/a5/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp
src/a5/runtime/tensormap_and_ringbuffer/docs/RUNTIME_LOGIC.md
src/a5/runtime/tensormap_and_ringbuffer/docs/SUBMIT_BY_CLUSTER.md
src/a5/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md
src/a5/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
src/a5/runtime/tensormap_and_ringbuffer/runtime/runtime.h
src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp
src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_context.h
src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
src/a5/runtime/tensormap_and_ringbuffer/runtime/shared/runtime.cpp

💤 Files with no reviewable changes (8)

src/a5/runtime/tensormap_and_ringbuffer/runtime/shared/runtime.cpp
src/a2a3/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
src/a5/runtime/tensormap_and_ringbuffer/runtime/runtime.h
src/a5/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/runtime.h
src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/shared/runtime.cpp
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp

gemini-code-assist Bot reviewed Jun 30, 2026

View reviewed changes

ChaoWao force-pushed the refactor/remove-pto2-orch-to-sched branch 2 times, most recently from e7a8a95 to 708bf8b Compare June 30, 2026 13:15

ChaoWao force-pushed the refactor/remove-pto2-orch-to-sched branch from 708bf8b to 02ac7e1 Compare June 30, 2026 14:26

coderabbitai Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md

ChaoWao merged commit fb72ad2 into hw-native-sys:main Jul 1, 2026
16 checks passed

ChaoWao deleted the refactor/remove-pto2-orch-to-sched branch July 1, 2026 00:44

ChaoWao mentioned this pull request Jul 1, 2026

Refactor: gate Scheduler summary log behind PTO2_SCHED_PROFILING #1225

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor: remove PTO2_ORCH_TO_SCHED feature and dead transition branches#1217

Refactor: remove PTO2_ORCH_TO_SCHED feature and dead transition branches#1217
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:refactor/remove-pto2-orch-to-sched

ChaoWao commented Jun 30, 2026

Uh oh!

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ChaoWao commented Jun 30, 2026

Summary

Removed

Out of scope

Testing

Uh oh!

coderabbitai Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading