Refactor: remove PTO2_ORCH_TO_SCHED feature and dead transition branches#1217
Conversation
📝 WalkthroughWalkthroughRemoves the ChangesRemove orch_to_sched scheduling mode
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request removes the orchestrator-to-scheduler transition feature (orch_to_sched / PTO2_ORCH_TO_SCHED) from both the a2a3 and a5 runtimes. This cleanup eliminates the dynamic core reassignment logic (reassign_cores_for_all_threads, handle_core_transition), simplifies thread coordination, and updates associated documentation and profiling logs to reflect that orchestrator threads now exit immediately after building the task graph. No review comments were provided, so there is no additional feedback.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
e7a8a95 to
708bf8b
Compare
The default-off PTO2_ORCH_TO_SCHED env flag made orchestrator AICPU
threads transition into scheduler threads after the task graph was
built, via a core-reassignment handshake. The flag was exercised by
no test, CI job, or build script, so the whole transition path was
permanent dead weight under the default.
Removed across both a2a3 and a5 tensormap_and_ringbuffer runtimes:
- Flag plumbing: the getenv("PTO2_ORCH_TO_SCHED") read in
runtime_maker.cpp, the Runtime::orch_to_sched member, its init
reset, and the orch_to_sched_ executor member + dispatch gate.
- Core-transition machinery, reachable only when the flag was set
(now dead): handle_core_transition(), reassign_cores_for_all_threads(),
the transition_requested_/wait_reassign_/reassigned_ atomics, the
cores_released dispatch-loop branch, and the transition block in
on_orchestration_done(). The two orch_to_sched_ ternaries collapse
to their default arms.
- Stale doc/comment references (profiling_levels, RUNTIME_LOGIC,
SUBMIT_BY_CLUSTER, dynamic-linking, swimlane collector comments).
on_orchestration_done's thread_idx is now used only inside
unconditional use), so it is marked [[maybe_unused]] to keep the
PTO2_PROFILING=0 build warning-clean under -Werror=unused-parameter.
The separate serial_orch_sched (PTO2_SERIAL_ORCH_SCHED) flag is left
untouched. Builds clean on a2a3 and a5 across all profiling-flag
combos; tensormap sim tests pass on a2a3sim and a5sim.
708bf8b to
02ac7e1
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp (1)
875-901: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winReset the cached L2 swimlane level before the enable branch.
Both cold-path init paths still reset
l2_swimlane_level_only in the disabled branch. Make the reset unconditional beforeis_l2_swimlane_enabled()so no prior launch state can leak through another branch.
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L875-L901: setl2_swimlane_level_ = L2SwimlaneLevel::DISABLED;immediately before the enable check.src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L878-L908: apply the same unconditional reset.Based on learnings,
SchedulerContext::init()should resetl2_swimlane_level_unconditionally before conditional logic to prevent cross-launch state leaks.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp` around lines 875 - 901, Reset l2_swimlane_level_ unconditionally at the start of SchedulerContext::init() before the is_l2_swimlane_enabled() check so stale state cannot leak between launches. Apply this in src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L875-L901 and the matching src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L878-L908 path, keeping the enable/disable branch logic intact after the reset.Source: Learnings
docs/dynamic-linking.md (1)
224-235: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winFinish removing stale transition/executor symbols from docs.
The docs cleanup is incomplete in two spots and still references removed or mismatched symbols.
docs/dynamic-linking.md#L224-L235: align the executor-owned field list with the current executor (aicpu_thread_num_,serial_orch_sched_, preservedorch_so_table_, etc.) instead of stale names likethread_num_,orch_func_,orch_so_handle_, andorch_so_path_.src/a2a3/runtime/tensormap_and_ringbuffer/docs/RUNTIME_LOGIC.md#L555-L561: remove the remainingreassign_*mention now that the reassignment path is deleted.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/dynamic-linking.md` around lines 224 - 235, Update the stale documentation symbols in both affected docs: in docs/dynamic-linking.md replace the executor-owned field list in AicpuExecutor::deinit() with the current names from the executor implementation (for example aicpu_thread_num_, serial_orch_sched_, and preserved orch_so_table_), and remove the outdated thread_num_ / orch_func_ / orch_so_handle_ / orch_so_path_ references; in src/a2a3/runtime/tensormap_and_ringbuffer/docs/RUNTIME_LOGIC.md remove the leftover reassign_* mention since that reassignment path no longer exists.src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp (1)
205-216: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winReject 1-thread runs now that the orchestrator never dispatches.
Line 207 can still derive
sched_thread_num_ == 0, but Line 730 now hard-stops dispatch tothread_idx < sched_thread_num_. In that case the only thread takes the orchestrator path on Line 383 and no thread ever callsresolve_and_dispatch(...), so submitted work can be left undrained. Either requireaicpu_thread_num_ >= 2here, or restore an explicit supported single-thread dispatch path.Proposed fix
- if (aicpu_thread_num_ < 1 || aicpu_thread_num_ > MAX_AICPU_THREADS) { + if (aicpu_thread_num_ < 2 || aicpu_thread_num_ > MAX_AICPU_THREADS) { LOG_ERROR("Invalid aicpu_thread_num: %d", aicpu_thread_num_); init_failed_.store(true, std::memory_order_release); return -1; }Also applies to: 729-730
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp` around lines 205 - 216, The current initialization in aicpu_executor.cpp still allows aicpu_thread_num_ to become 1, which leaves sched_thread_num_ at 0 and prevents any thread from reaching resolve_and_dispatch while the orchestrator path handles everything; fix this in the AicpuExecutor init path by either rejecting single-thread configurations up front or adding a supported single-thread dispatch flow. Update the validation around aicpu_thread_num_, sched_thread_num_, and the dispatch gating logic that uses thread_idx < sched_thread_num_ so the runtime cannot initialize into an undrained state.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/a2a3/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md`:
- Around line 95-99: The Level 1 profiling documentation is counting orch-side
logs that are actually gated by PTO2_ORCH_PROFILING, so the PTO2_PROFILING-only
totals are too high. In
src/a2a3/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md at 95-99 and
404-410, remove the orch-side log term from the Level 1 formula/count or move it
to Level 3 so it matches what AICPUExecutor emits under PTO2_PROFILING alone;
make the same correction in
src/a5/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md at 95-99 and
374-380. Use AICPUExecutor and the orch_start/orch_end/orch_cost plus PTO2 total
submitted tasks logs as the reference points for the counts.
---
Outside diff comments:
In `@docs/dynamic-linking.md`:
- Around line 224-235: Update the stale documentation symbols in both affected
docs: in docs/dynamic-linking.md replace the executor-owned field list in
AicpuExecutor::deinit() with the current names from the executor implementation
(for example aicpu_thread_num_, serial_orch_sched_, and preserved
orch_so_table_), and remove the outdated thread_num_ / orch_func_ /
orch_so_handle_ / orch_so_path_ references; in
src/a2a3/runtime/tensormap_and_ringbuffer/docs/RUNTIME_LOGIC.md remove the
leftover reassign_* mention since that reassignment path no longer exists.
In `@src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp`:
- Around line 205-216: The current initialization in aicpu_executor.cpp still
allows aicpu_thread_num_ to become 1, which leaves sched_thread_num_ at 0 and
prevents any thread from reaching resolve_and_dispatch while the orchestrator
path handles everything; fix this in the AicpuExecutor init path by either
rejecting single-thread configurations up front or adding a supported
single-thread dispatch flow. Update the validation around aicpu_thread_num_,
sched_thread_num_, and the dispatch gating logic that uses thread_idx <
sched_thread_num_ so the runtime cannot initialize into an undrained state.
In
`@src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp`:
- Around line 875-901: Reset l2_swimlane_level_ unconditionally at the start of
SchedulerContext::init() before the is_l2_swimlane_enabled() check so stale
state cannot leak between launches. Apply this in
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L875-L901
and the matching
src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp#L878-L908
path, keeping the enable/disable branch logic intact after the reset.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: faa87919-593c-42fa-8113-872a43a92dec
📒 Files selected for processing (25)
docs/dynamic-linking.mdsrc/a2a3/platform/include/aicpu/l2_swimlane_collector_aicpu.hsrc/a2a3/platform/shared/aicpu/l2_swimlane_collector_aicpu.cppsrc/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cppsrc/a2a3/runtime/tensormap_and_ringbuffer/docs/RUNTIME_LOGIC.mdsrc/a2a3/runtime/tensormap_and_ringbuffer/docs/SUBMIT_BY_CLUSTER.mdsrc/a2a3/runtime/tensormap_and_ringbuffer/docs/profiling_levels.mdsrc/a2a3/runtime/tensormap_and_ringbuffer/host/runtime_maker.cppsrc/a2a3/runtime/tensormap_and_ringbuffer/runtime/runtime.hsrc/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cppsrc/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_context.hsrc/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cppsrc/a2a3/runtime/tensormap_and_ringbuffer/runtime/shared/runtime.cppsrc/a5/platform/include/aicpu/l2_swimlane_collector_aicpu.hsrc/a5/platform/shared/aicpu/l2_swimlane_collector_aicpu.cppsrc/a5/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cppsrc/a5/runtime/tensormap_and_ringbuffer/docs/RUNTIME_LOGIC.mdsrc/a5/runtime/tensormap_and_ringbuffer/docs/SUBMIT_BY_CLUSTER.mdsrc/a5/runtime/tensormap_and_ringbuffer/docs/profiling_levels.mdsrc/a5/runtime/tensormap_and_ringbuffer/host/runtime_maker.cppsrc/a5/runtime/tensormap_and_ringbuffer/runtime/runtime.hsrc/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cppsrc/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_context.hsrc/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cppsrc/a5/runtime/tensormap_and_ringbuffer/runtime/shared/runtime.cpp
💤 Files with no reviewable changes (8)
- src/a5/runtime/tensormap_and_ringbuffer/runtime/shared/runtime.cpp
- src/a2a3/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
- src/a5/runtime/tensormap_and_ringbuffer/runtime/runtime.h
- src/a5/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
- src/a2a3/runtime/tensormap_and_ringbuffer/runtime/runtime.h
- src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
- src/a2a3/runtime/tensormap_and_ringbuffer/runtime/shared/runtime.cpp
- src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
Summary
PTO2_ORCH_TO_SCHEDwas a default-off env flag in thetensormap_and_ringbufferruntime. When set, orchestrator AICPU threadstransitioned into scheduler threads after the task graph was built, driving
a core-reassignment handshake. The flag was read by no test, CI job, or
build script, so the entire transition path was permanent dead weight under
the default — pure configuration debt per the repo's env-macro-gating rule.
This removes the flag and the core-transition machinery that was
reachable only when it was set. Applied symmetrically to both
a2a3anda5runtimes.Removed
getenv("PTO2_ORCH_TO_SCHED")read inruntime_maker.cpp, theRuntime::orch_to_schedmember, its init reset,and the
orch_to_sched_executor member + dispatch gate(
thread_idx < sched_thread_num_ || orch_to_sched_→thread_idx < sched_thread_num_).handle_core_transition(),reassign_cores_for_all_threads(), thetransition_requested_/wait_reassign_/reassigned_atomics, thecores_releaseddispatch-loop branch, and the transition block inon_orchestration_done(). The twoorch_to_sched_ternaries collapse totheir default arms.
profiling_levels.md,RUNTIME_LOGIC.md,SUBMIT_BY_CLUSTER.md,dynamic-linking.md, and the swimlane-collectorcomment clauses.
Out of scope
The separate
serial_orch_sched/PTO2_SERIAL_ORCH_SCHEDflag is leftuntouched.
Net: 25 files, +62 / −426.
Testing
a2a3anda5(AICore/AICPU/Host)dummy_task+batch_paged_attentionona2a3sim,dummy_taskona5sim