Skip to content

Refactor: remove PTO2_ORCH_TO_SCHED feature and dead transition branches#1217

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:refactor/remove-pto2-orch-to-sched
Jul 1, 2026
Merged

Refactor: remove PTO2_ORCH_TO_SCHED feature and dead transition branches#1217
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:refactor/remove-pto2-orch-to-sched

Conversation

@ChaoWao

@ChaoWao ChaoWao commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

PTO2_ORCH_TO_SCHED was a default-off env flag in the
tensormap_and_ringbuffer runtime. When set, orchestrator AICPU threads
transitioned into scheduler threads after the task graph was built, driving
a core-reassignment handshake. The flag was read by no test, CI job, or
build script, so the entire transition path was permanent dead weight under
the default — pure configuration debt per the repo's env-macro-gating rule.

This removes the flag and the core-transition machinery that was
reachable only when it was set. Applied symmetrically to both a2a3 and
a5 runtimes.

Removed

  • Flag plumbing: the getenv("PTO2_ORCH_TO_SCHED") read in
    runtime_maker.cpp, the Runtime::orch_to_sched member, its init reset,
    and the orch_to_sched_ executor member + dispatch gate
    (thread_idx < sched_thread_num_ || orch_to_sched_
    thread_idx < sched_thread_num_).
  • Core-transition machinery (dead once the flag is gone):
    handle_core_transition(), reassign_cores_for_all_threads(), the
    transition_requested_ / wait_reassign_ / reassigned_ atomics, the
    cores_released dispatch-loop branch, and the transition block in
    on_orchestration_done(). The two orch_to_sched_ ternaries collapse to
    their default arms.
  • Stale docs/comments: profiling_levels.md, RUNTIME_LOGIC.md,
    SUBMIT_BY_CLUSTER.md, dynamic-linking.md, and the swimlane-collector
    comment clauses.

Out of scope

The separate serial_orch_sched / PTO2_SERIAL_ORCH_SCHED flag is left
untouched.

Net: 25 files, +62 / −426.

Testing

  • Both runtimes build clean on a2a3 and a5 (AICore/AICPU/Host)
  • Simulation tests pass — dummy_task + batch_paged_attention on
    a2a3sim, dummy_task on a5sim
  • Hardware tests (CI)

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Removes the orch_to_sched_ flag and associated core-transition machinery (handle_core_transition, reassign_cores_for_all_threads, transition_requested_, wait_reassign_, reassigned_) from SchedulerContext and AicpuExecutor across both a2a3 and a5 targets. The PTO2_ORCH_TO_SCHED env-var parsing is removed, dispatcher eligibility is narrowed to scheduler threads only, and documentation/comments are updated throughout.

Changes

Remove orch_to_sched scheduling mode

Layer / File(s) Summary
SchedulerContext public contract
src/a2a3/runtime/.../scheduler/scheduler_context.h, src/a5/runtime/.../scheduler/scheduler_context.h
Removes bool orch_to_sched from init(...), removes member fields transition_requested_, wait_reassign_, reassigned_, orch_to_sched_, removes reassign_cores_for_all_threads() and handle_core_transition declarations, and adds handle_orchestrator_exit/check_idle_fatal_error (a5).
Runtime constructor and env-flag cleanup
src/a2a3/runtime/.../runtime/runtime.h, src/a2a3/runtime/.../shared/runtime.cpp, src/a2a3/runtime/.../host/runtime_maker.cpp, src/a5/runtime/.../runtime/runtime.h, src/a5/runtime/.../shared/runtime.cpp, src/a5/runtime/.../host/runtime_maker.cpp
Removes dev.orch_to_sched = false from Runtime constructor and PTO2_ORCH_TO_SCHED env-var parsing from apply_orch_sched_env_flags in both targets.
AicpuExecutor init / run / deinit
src/a2a3/runtime/.../aicpu/aicpu_executor.cpp, src/a5/runtime/.../aicpu/aicpu_executor.cpp
Replaces stored orch_to_sched_ with serial_orch_sched_; updates init to read from runtime->dev.serial_orch_sched and drop the flag from sched_ctx_.init(...); narrows dispatch gate in run to thread_idx < sched_thread_num_ only; updates deinit resets accordingly.
SchedulerContext cold-path implementation
src/a2a3/runtime/.../scheduler/scheduler_cold_path.cpp, src/a5/runtime/.../scheduler/scheduler_cold_path.cpp
Removes handle_core_transition and reassign_cores_for_all_threads bodies; updates init signature/body (drops orch_to_sched_ assignment, simplifies L2 swimlane sched_phase_threads, updates dump_args_init); cleans deinit transition-state resets; updates on_orchestration_done (removes core-transition control flow, marks thread_idx [[maybe_unused]]); adds check_idle_fatal_error (a5 only).
scheduler_dispatch.cpp core-transition removal
src/a2a3/runtime/.../scheduler/scheduler_dispatch.cpp, src/a5/runtime/.../scheduler/scheduler_dispatch.cpp
Removes cores_released local variable and the if (!cores_released && orch_to_sched_) block that called handle_core_transition in resolve_and_dispatch.
L2 swimlane API comments
src/a2a3/platform/include/aicpu/l2_swimlane_collector_aicpu.h, src/a2a3/platform/shared/aicpu/l2_swimlane_collector_aicpu.cpp, src/a5/platform/include/aicpu/l2_swimlane_collector_aicpu.h, src/a5/platform/shared/aicpu/l2_swimlane_collector_aicpu.cpp
Removes orch_to_sched mode qualifiers from Doxygen comments for l2_swimlane_aicpu_init_phase and l2_swimlane_aicpu_set_orch_thread_idx; simplifies inline orch-phase pool cache comment.
Documentation updates
docs/dynamic-linking.md, src/a2a3/runtime/.../docs/*, src/a5/runtime/.../docs/*
Updates RUNTIME_LOGIC.md (init signature, on_orchestration_done description, cold-path responsibilities), SUBMIT_BY_CLUSTER.md (threading baseline, cluster ownership), profiling_levels.md (Level 1 log walkthrough, LOG_INFO_V9 table, PTO2_PROFILING note), and dynamic-linking.md (deinit responsibilities).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • hw-native-sys/simpler#939: Touches scheduler_cold_path.cpp L2 swimlane init around l2_swimlane_aicpu_init_phase thread-count selection using orch_to_sched_, directly affected by this PR's removal of that flag.
  • hw-native-sys/simpler#1083: Removes PTO2_ORCH_TO_SCHED env-variable references in runtime_maker.cpp, profiling_levels.md, and runtime.h comments—overlapping with the same files changed here.
  • hw-native-sys/simpler#1176: Adds serial_orch_sched_ start-gate logic using orchestrator_done_ in the same SchedulerContext/AicpuExecutor code paths modified by this PR.

Poem

🐇 Hop, hop, away we throw
The flag that made the orch thread flow!
orch_to_sched_ — goodbye, farewell,
Scheduler threads now hold the spell.
Cleaner queues and simpler state,
One less flag to contemplate! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.04% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: removing PTO2_ORCH_TO_SCHED and related transition branches.
Description check ✅ Passed The description matches the changeset, covering removal of the flag, transition machinery, and docs updates.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the orchestrator-to-scheduler transition feature (orch_to_sched / PTO2_ORCH_TO_SCHED) from both the a2a3 and a5 runtimes. This cleanup eliminates the dynamic core reassignment logic (reassign_cores_for_all_threads, handle_core_transition), simplifies thread coordination, and updates associated documentation and profiling logs to reflect that orchestrator threads now exit immediately after building the task graph. No review comments were provided, so there is no additional feedback.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@ChaoWao ChaoWao force-pushed the refactor/remove-pto2-orch-to-sched branch 2 times, most recently from e7a8a95 to 708bf8b Compare June 30, 2026 13:15
The default-off PTO2_ORCH_TO_SCHED env flag made orchestrator AICPU
threads transition into scheduler threads after the task graph was
built, via a core-reassignment handshake. The flag was exercised by
no test, CI job, or build script, so the whole transition path was
permanent dead weight under the default.

Removed across both a2a3 and a5 tensormap_and_ringbuffer runtimes:

- Flag plumbing: the getenv("PTO2_ORCH_TO_SCHED") read in
  runtime_maker.cpp, the Runtime::orch_to_sched member, its init
  reset, and the orch_to_sched_ executor member + dispatch gate.
- Core-transition machinery, reachable only when the flag was set
  (now dead): handle_core_transition(), reassign_cores_for_all_threads(),
  the transition_requested_/wait_reassign_/reassigned_ atomics, the
  cores_released dispatch-loop branch, and the transition block in
  on_orchestration_done(). The two orch_to_sched_ ternaries collapse
  to their default arms.
- Stale doc/comment references (profiling_levels, RUNTIME_LOGIC,
  SUBMIT_BY_CLUSTER, dynamic-linking, swimlane collector comments).

on_orchestration_done's thread_idx is now used only inside
unconditional use), so it is marked [[maybe_unused]] to keep the
PTO2_PROFILING=0 build warning-clean under -Werror=unused-parameter.

The separate serial_orch_sched (PTO2_SERIAL_ORCH_SCHED) flag is left
untouched. Builds clean on a2a3 and a5 across all profiling-flag
combos; tensormap sim tests pass on a2a3sim and a5sim.
@ChaoWao ChaoWao force-pushed the refactor/remove-pto2-orch-to-sched branch from 708bf8b to 02ac7e1 Compare June 30, 2026 14:26

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp (1)

875-901: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reset the cached L2 swimlane level before the enable branch.

Both cold-path init paths still reset l2_swimlane_level_ only in the disabled branch. Make the reset unconditional before is_l2_swimlane_enabled() so no prior launch state can leak through another branch.

  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L875-L901: set l2_swimlane_level_ = L2SwimlaneLevel::DISABLED; immediately before the enable check.
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L878-L908: apply the same unconditional reset.

Based on learnings, SchedulerContext::init() should reset l2_swimlane_level_ unconditionally before conditional logic to prevent cross-launch state leaks.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp`
around lines 875 - 901, Reset l2_swimlane_level_ unconditionally at the start of
SchedulerContext::init() before the is_l2_swimlane_enabled() check so stale
state cannot leak between launches. Apply this in
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L875-L901
and the matching
src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L878-L908
path, keeping the enable/disable branch logic intact after the reset.

Source: Learnings

docs/dynamic-linking.md (1)

224-235: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Finish removing stale transition/executor symbols from docs.

The docs cleanup is incomplete in two spots and still references removed or mismatched symbols.

  • docs/dynamic-linking.md#L224-L235: align the executor-owned field list with the current executor (aicpu_thread_num_, serial_orch_sched_, preserved orch_so_table_, etc.) instead of stale names like thread_num_, orch_func_, orch_so_handle_, and orch_so_path_.
  • src/a2a3/runtime/tensormap_and_ringbuffer/docs/RUNTIME_LOGIC.md#L555-L561: remove the remaining reassign_* mention now that the reassignment path is deleted.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/dynamic-linking.md` around lines 224 - 235, Update the stale
documentation symbols in both affected docs: in docs/dynamic-linking.md replace
the executor-owned field list in AicpuExecutor::deinit() with the current names
from the executor implementation (for example aicpu_thread_num_,
serial_orch_sched_, and preserved orch_so_table_), and remove the outdated
thread_num_ / orch_func_ / orch_so_handle_ / orch_so_path_ references; in
src/a2a3/runtime/tensormap_and_ringbuffer/docs/RUNTIME_LOGIC.md remove the
leftover reassign_* mention since that reassignment path no longer exists.
src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp (1)

205-216: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Reject 1-thread runs now that the orchestrator never dispatches.

Line 207 can still derive sched_thread_num_ == 0, but Line 730 now hard-stops dispatch to thread_idx < sched_thread_num_. In that case the only thread takes the orchestrator path on Line 383 and no thread ever calls resolve_and_dispatch(...), so submitted work can be left undrained. Either require aicpu_thread_num_ >= 2 here, or restore an explicit supported single-thread dispatch path.

Proposed fix
-    if (aicpu_thread_num_ < 1 || aicpu_thread_num_ > MAX_AICPU_THREADS) {
+    if (aicpu_thread_num_ < 2 || aicpu_thread_num_ > MAX_AICPU_THREADS) {
         LOG_ERROR("Invalid aicpu_thread_num: %d", aicpu_thread_num_);
         init_failed_.store(true, std::memory_order_release);
         return -1;
     }

Also applies to: 729-730

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp` around
lines 205 - 216, The current initialization in aicpu_executor.cpp still allows
aicpu_thread_num_ to become 1, which leaves sched_thread_num_ at 0 and prevents
any thread from reaching resolve_and_dispatch while the orchestrator path
handles everything; fix this in the AicpuExecutor init path by either rejecting
single-thread configurations up front or adding a supported single-thread
dispatch flow. Update the validation around aicpu_thread_num_,
sched_thread_num_, and the dispatch gating logic that uses thread_idx <
sched_thread_num_ so the runtime cannot initialize into an undrained state.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/a2a3/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md`:
- Around line 95-99: The Level 1 profiling documentation is counting orch-side
logs that are actually gated by PTO2_ORCH_PROFILING, so the PTO2_PROFILING-only
totals are too high. In
src/a2a3/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md at 95-99 and
404-410, remove the orch-side log term from the Level 1 formula/count or move it
to Level 3 so it matches what AICPUExecutor emits under PTO2_PROFILING alone;
make the same correction in
src/a5/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md at 95-99 and
374-380. Use AICPUExecutor and the orch_start/orch_end/orch_cost plus PTO2 total
submitted tasks logs as the reference points for the counts.

---

Outside diff comments:
In `@docs/dynamic-linking.md`:
- Around line 224-235: Update the stale documentation symbols in both affected
docs: in docs/dynamic-linking.md replace the executor-owned field list in
AicpuExecutor::deinit() with the current names from the executor implementation
(for example aicpu_thread_num_, serial_orch_sched_, and preserved
orch_so_table_), and remove the outdated thread_num_ / orch_func_ /
orch_so_handle_ / orch_so_path_ references; in
src/a2a3/runtime/tensormap_and_ringbuffer/docs/RUNTIME_LOGIC.md remove the
leftover reassign_* mention since that reassignment path no longer exists.

In `@src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp`:
- Around line 205-216: The current initialization in aicpu_executor.cpp still
allows aicpu_thread_num_ to become 1, which leaves sched_thread_num_ at 0 and
prevents any thread from reaching resolve_and_dispatch while the orchestrator
path handles everything; fix this in the AicpuExecutor init path by either
rejecting single-thread configurations up front or adding a supported
single-thread dispatch flow. Update the validation around aicpu_thread_num_,
sched_thread_num_, and the dispatch gating logic that uses thread_idx <
sched_thread_num_ so the runtime cannot initialize into an undrained state.

In
`@src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp`:
- Around line 875-901: Reset l2_swimlane_level_ unconditionally at the start of
SchedulerContext::init() before the is_l2_swimlane_enabled() check so stale
state cannot leak between launches. Apply this in
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L875-L901
and the matching
src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L878-L908
path, keeping the enable/disable branch logic intact after the reset.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: faa87919-593c-42fa-8113-872a43a92dec

📥 Commits

Reviewing files that changed from the base of the PR and between f4d14bf and 02ac7e1.

📒 Files selected for processing (25)
  • docs/dynamic-linking.md
  • src/a2a3/platform/include/aicpu/l2_swimlane_collector_aicpu.h
  • src/a2a3/platform/shared/aicpu/l2_swimlane_collector_aicpu.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/docs/RUNTIME_LOGIC.md
  • src/a2a3/runtime/tensormap_and_ringbuffer/docs/SUBMIT_BY_CLUSTER.md
  • src/a2a3/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md
  • src/a2a3/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/runtime.h
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_context.h
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/shared/runtime.cpp
  • src/a5/platform/include/aicpu/l2_swimlane_collector_aicpu.h
  • src/a5/platform/shared/aicpu/l2_swimlane_collector_aicpu.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/docs/RUNTIME_LOGIC.md
  • src/a5/runtime/tensormap_and_ringbuffer/docs/SUBMIT_BY_CLUSTER.md
  • src/a5/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md
  • src/a5/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/runtime.h
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_context.h
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/shared/runtime.cpp
💤 Files with no reviewable changes (8)
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/shared/runtime.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/runtime.h
  • src/a5/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/runtime.h
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/shared/runtime.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp

Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md
@ChaoWao ChaoWao merged commit fb72ad2 into hw-native-sys:main Jul 1, 2026
16 checks passed
@ChaoWao ChaoWao deleted the refactor/remove-pto2-orch-to-sched branch July 1, 2026 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant