Refactor: gate Scheduler summary log behind PTO2_SCHED_PROFILING#1225
Conversation
The per-thread "Scheduler summary" device log was emitted under plain PTO2_PROFILING=1 (Level 1), while every other per-thread profiling line (orch_* / sched_* timing) is gated by its Level 2/3 sub-macro. That made Level 1 the only level with a per-thread device log and left the profiling_levels.md counts inconsistent. Move the `Scheduler summary` line (and the `sched_total` computation that only feeds it) into the `#if PTO2_SCHED_PROFILING` block in log_l2_swimlane_summary() for both a2a3 and a5. Level 1 now produces no per-thread device logs — it only feeds the host-side Orch/Sched [STRACE] windows via aicpu_phase_set_window. `cur_thread_completed` is now used only inside the SCHED block, so it is marked [[maybe_unused]] to keep the Level 1 (PTO2_SCHED_PROFILING=0) build warning-clean under -Werror=unused-parameter. Rewrite profiling_levels.md (both arches) accordingly: Level 1 documents zero device logs; Level 2 gains the Scheduler summary (no longer a "replaced" line); the stale per-level LOG_INFO_V9 count formulas (`N_sched*2 + N_orch*1 + 1`, "11 debug + 2 basic + ...", table count 7) are replaced with per-thread descriptions matching the actual macro gating. Builds clean on a2a3 and a5 across all profiling-flag combos (pto2-off / sched / orch / orch-sched / all-on); tensormap sim tests pass on a2a3sim and a5sim.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning Review limit reached
Next review available in: 42 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Follow-up to #1217. The per-thread
Scheduler summarydevice log wasemitted under plain
PTO2_PROFILING=1(Level 1), while every otherper-thread profiling line (
orch_*/sched_*timing) is gated by itsLevel 2/3 sub-macro. That made Level 1 the only level producing a
per-thread device log, and left the
profiling_levels.mdper-level countsinconsistent (CodeRabbit flagged this on #1217).
This moves
Scheduler summary(and thesched_totalcomputation that onlyfeeds it) into the
#if PTO2_SCHED_PROFILINGblock inlog_l2_swimlane_summary()for both a2a3 and a5. Level 1 now emits noper-thread device logs — it only feeds the host-side
Orch/Sched[STRACE]windows viaaicpu_phase_set_window.Changes
scheduler_cold_path.cpp, a2a3 + a5): gateScheduler summaryand
sched_totalbehindPTO2_SCHED_PROFILING; markcur_thread_completed[[maybe_unused]](now used only inside the SCHED block) to keep theLevel-1 build clean under
-Werror=unused-parameter.profiling_levels.md, a2a3 + a5): Level 1 documents zero devicelogs; Level 2 gains the
Scheduler summary; the stale per-levelLOG_INFO_V9count formulas (N_sched*2 + N_orch*1 + 1,11 debug + 2 basic + ..., table count7) are replaced with per-threaddescriptions matching the actual macro gating.
Testing
-Werror=unused-variable/parameter)profiling-flags-smokematrix reproduced locally:pto2-off/orch/orch-tensormap/sched/orch-sched/all-onall build on a2a3sim + a5simdummy_tasksim test passes on a2a3sim and a5sim