Category
Robustness (potential edge-case failure)
Component
Platform (a2a3 / a2a3sim)
Description
The AICPU profiling collectors (L2Swimlane, PMU, DepGen, TensorDump, ScopeStats) keep file-local static state that persists across launches, because the AICPU .so is loaded once and reused. This state includes:
- cached header pointers (
s_*_header),
- pool arrays (
s_*_pools[]),
- "current buffer" pointers (
s_current_*_buffers[] / current_buf_ptr),
- orch thread index (
s_orch_thread_idx),
- base / level globals (
g_*_base, g_*_level).
Hazard — enabled → disabled launch sequence: when a subsequent launch turns profiling off, it calls set_platform_*_base(0) and/or set_*_enabled(false) but does not run *_aicpu_init (init only runs when profiling is enabled). The statics therefore retain the previous launch's pointers, which may point at memory that has since been freed/reallocated. Any record/complete path that becomes reachable in that state can dereference a stale pointer (potential use-after-free).
This is a pre-existing / latent hazard, not introduced by any single change. The a2a3 l2_swimlane_aicpu_init comment already references the same class of bug and its prior fix (#936). There is no proven live crash path today — record sites are gated (s_phase_initialized, is_*_enabled) — so this is filed as defense-in-depth / robustness rather than an active bug.
Why now / symmetry: PR #1162 introduced a per-collector reset_*_cached_state() helper (nulls all cached statics) and wired it into the disabled/base=0 setters — but only for a5 (L2/PMU/DepGen) and common (tensor_dump; scope_stats partial). The a2a3 arch-specific L2/PMU/DepGen collectors, and scope_stats' set_*_enabled(false) path, were left uncovered. Per review, PR #1162 is dropping this hardening from its scope entirely (it is unrelated to that PR's buffer-drop-reduction goal) so it can land focused; this issue tracks doing the hardening properly and symmetrically across both arches and all five collectors as a standalone change.
Acceptance: each collector defines a reset_*_cached_state() that nulls header + pools + current-buffer pointers + orch idx + level globals, and it is invoked from both set_platform_*_base(0) and set_*_enabled(false), on both a2a3 and a5 (and once in common for tensor_dump / scope_stats). Add a small onboard/sim check that an enabled→disabled→(record attempted) sequence does not touch stale state.
Reference mainline commit where the hazard exists: c080089d (merge-base of #1162).
Location
src/a2a3/platform/shared/aicpu/l2_swimlane_collector_aicpu.cpp — set_platform_l2_swimlane_base / set_l2_swimlane_enabled (no reset)
src/a2a3/platform/shared/aicpu/pmu_collector_aicpu.cpp — set_platform_pmu_base / set_pmu_enabled (no reset)
src/a2a3/platform/shared/aicpu/dep_gen_collector_aicpu.cpp — set_platform_dep_gen_base / set_dep_gen_enabled (no reset)
src/common/platform/shared/aicpu/scope_stats_collector_aicpu.cpp — set_scope_stats_enabled(false) does not reset header/state
- (reference implementation to mirror)
src/a5/platform/shared/aicpu/{l2_swimlane_collector_aicpu,pmu_collector_aicpu,dep_gen_collector_aicpu}.cpp — reset_*_cached_state() + call sites
- (reference)
src/common/platform/shared/aicpu/tensor_dump_aicpu.cpp — reset_dump_runtime_state()
Proposed Fix
Mirror the a5 pattern onto a2a3 (and complete scope_stats): add reset_l2_swimlane_cached_state() / reset_pmu_cached_state() / reset_dep_gen_cached_state() to the a2a3 collectors and call them from both the base == 0 and !enable branches of the respective setters; route set_scope_stats_enabled(false) through the same reset. Keep the reset helper as the single source of truth (also usable from *_finalize, as a5 dep_gen does). Do this as one focused PR so a2a3 and a5 stay in lockstep (the framework was unified in #944 and is meant to be reviewed against both arches).
Priority
Medium (minor risk, should fix in next few releases)
Related: #1161, #1162
Category
Robustness (potential edge-case failure)
Component
Platform (a2a3 / a2a3sim)
Description
The AICPU profiling collectors (L2Swimlane, PMU, DepGen, TensorDump, ScopeStats) keep file-local static state that persists across launches, because the AICPU
.sois loaded once and reused. This state includes:s_*_header),s_*_pools[]),s_current_*_buffers[]/current_buf_ptr),s_orch_thread_idx),g_*_base,g_*_level).Hazard — enabled → disabled launch sequence: when a subsequent launch turns profiling off, it calls
set_platform_*_base(0)and/orset_*_enabled(false)but does not run*_aicpu_init(init only runs when profiling is enabled). The statics therefore retain the previous launch's pointers, which may point at memory that has since been freed/reallocated. Any record/complete path that becomes reachable in that state can dereference a stale pointer (potential use-after-free).This is a pre-existing / latent hazard, not introduced by any single change. The a2a3
l2_swimlane_aicpu_initcomment already references the same class of bug and its prior fix (#936). There is no proven live crash path today — record sites are gated (s_phase_initialized,is_*_enabled) — so this is filed as defense-in-depth / robustness rather than an active bug.Why now / symmetry: PR #1162 introduced a per-collector
reset_*_cached_state()helper (nulls all cached statics) and wired it into the disabled/base=0 setters — but only for a5 (L2/PMU/DepGen) and common (tensor_dump;scope_statspartial). The a2a3 arch-specific L2/PMU/DepGen collectors, andscope_stats'set_*_enabled(false)path, were left uncovered. Per review, PR #1162 is dropping this hardening from its scope entirely (it is unrelated to that PR's buffer-drop-reduction goal) so it can land focused; this issue tracks doing the hardening properly and symmetrically across both arches and all five collectors as a standalone change.Acceptance: each collector defines a
reset_*_cached_state()that nulls header + pools + current-buffer pointers + orch idx + level globals, and it is invoked from bothset_platform_*_base(0)andset_*_enabled(false), on both a2a3 and a5 (and once incommonfor tensor_dump / scope_stats). Add a small onboard/sim check that an enabled→disabled→(record attempted) sequence does not touch stale state.Reference mainline commit where the hazard exists:
c080089d(merge-base of #1162).Location
src/a2a3/platform/shared/aicpu/l2_swimlane_collector_aicpu.cpp—set_platform_l2_swimlane_base/set_l2_swimlane_enabled(no reset)src/a2a3/platform/shared/aicpu/pmu_collector_aicpu.cpp—set_platform_pmu_base/set_pmu_enabled(no reset)src/a2a3/platform/shared/aicpu/dep_gen_collector_aicpu.cpp—set_platform_dep_gen_base/set_dep_gen_enabled(no reset)src/common/platform/shared/aicpu/scope_stats_collector_aicpu.cpp—set_scope_stats_enabled(false)does not reset header/statesrc/a5/platform/shared/aicpu/{l2_swimlane_collector_aicpu,pmu_collector_aicpu,dep_gen_collector_aicpu}.cpp—reset_*_cached_state()+ call sitessrc/common/platform/shared/aicpu/tensor_dump_aicpu.cpp—reset_dump_runtime_state()Proposed Fix
Mirror the a5 pattern onto a2a3 (and complete scope_stats): add
reset_l2_swimlane_cached_state()/reset_pmu_cached_state()/reset_dep_gen_cached_state()to the a2a3 collectors and call them from both thebase == 0and!enablebranches of the respective setters; routeset_scope_stats_enabled(false)through the same reset. Keep the reset helper as the single source of truth (also usable from*_finalize, as a5 dep_gen does). Do this as one focused PR so a2a3 and a5 stay in lockstep (the framework was unified in #944 and is meant to be reviewed against both arches).Priority
Medium (minor risk, should fix in next few releases)
Related: #1161, #1162