Skip to content

Integrate TraCR as the runtime/kernel profiler for Simpler#1173

Open
noabauma wants to merge 29 commits into
hw-native-sys:mainfrom
huawei-csl:tracr
Open

Integrate TraCR as the runtime/kernel profiler for Simpler#1173
noabauma wants to merge 29 commits into
hw-native-sys:mainfrom
huawei-csl:tracr

Conversation

@noabauma

@noabauma noabauma commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Upstreams TraCR into Simpler as a
first-class profiler, replacing the separate development branch. No more
out-of-tree branch to rebase — TraCR lives in tools/tracr/ and is wired into
the tensormap_and_ringbuffer runtime build behind BUILD_TRACR=ON.

What is TraCR

A lightweight, nanoscale C++ instrumentation library. Markers store fixed
16-byte records into a lock-free per-thread ring buffer; on device the
timestamps come from the hardware counter, and the buffers are downloaded
after a run to ~/ascend/tracr_<N>/ and post-processed (tracr_process) into
Perfetto / Paraver timelines.

It captures the full on-device picture: the AICPU scheduler threads
(orchestration / scheduling / phase / drain markers) and per-core task
execution on the AICube / AIVector units.

Why TraCR

  • Extremely low overhead. Compile-time gated (zero-cost stubs when off),
    no locks, fixed-size records, HW-counter timestamps — designed to perturb the
    hot path as little as possible. Crucially, it does not write to the AICPU
    device_log on the hot path, so it avoids the op-serialization /
    op-execute-timeout pitfalls of log-based profiling — which is why it should be
    meaningfully lower-overhead than the current device-log (PTO2_PROFILING)
    approach.
  • Right level of detail. Profiles at both the Simpler runtime (scheduler
    orchestration) and kernel level — the visibility needed to profile PyPTO /
    Simpler kernels.
  • Easy to use. One build flag, run as usual, one post-process command → a
    Perfetto trace you open in ui.perfetto.dev.
  • Demand. Many users are asking to profile their kernels; this gives them a
    supported, in-tree tool instead of a side branch.

Where it's already been used

Future work

  • A5 platform support.
  • Multi-device profiling.

Repo: https://github.com/huawei-csl/TracR

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 10a9a727-1f18-455a-b23c-3276efc559b8

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates the TraCR profiling and tracing tool into the a2a3 platform, enabling device-side trace collection and host-side downloading. The changes include adding the TraCR submodule, CMake build configurations, API helpers, and instrumenting the orchestration and execution paths with tracing markers. The review feedback highlights several critical areas for improvement: correcting format specifiers for uint64_t variables to avoid undefined behavior, replacing manual memory management with std::vector to prevent leaks, adding defensive null-pointer and bounds checks, quoting CMake string variables, and adding spin hints to tight loops to reduce CPU consumption.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

PROF_INC(prof_make_count, 4);
CYCLE_COUNT_LAP(prof_make_tensor);

LOG_INFO_V0("Thread %d: Orch PTO2_SCOPE loop: #batch=%u, q_loop=%u", g_TraCR_thread_idx, batch, q_loop);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

batch and q_loop are of type uint64_t. Using %u as the format specifier can lead to undefined behavior, truncated output, or compilation errors under -Werror. Use " PRIu64 instead.

    LOG_INFO_V0("Thread %d: Orch PTO2_SCOPE loop: #batch=%" PRIu64 ", q_loop=%" PRIu64, g_TraCR_thread_idx, batch, q_loop);

Comment thread tools/tracr_simpler_api.hpp Outdated
Comment on lines +188 to +217
size_t size = sizeof(TraCR::Payload) * TraCR::CAPACITY * runtime.aicpu_thread_num;
TraCR::Payload* tracrData = reinterpret_cast<TraCR::Payload *>(std::malloc(size));
int rc = device_runner->copy_from_device(reinterpret_cast<void*>(tracrData),
reinterpret_cast<void*>(runtime.tracrData_), size);
if (rc != 0) {
LOG_ERROR("device_runner->copy_from_device 'tracrData' failed rc=%d", rc);
return rc;
}

// Download the tracrDataSizes_ from Device to Host
size = sizeof(size_t) * runtime.aicpu_thread_num;
size_t* tracrDataSizes = reinterpret_cast<size_t *>(std::malloc(size));
rc = device_runner->copy_from_device(reinterpret_cast<void*>(tracrDataSizes),
reinterpret_cast<void*>(runtime.tracrDataSizes_), size);
if (rc != 0) {
LOG_ERROR("device_runner->copy_from_device 'tracrDataSizes' failed rc=%d", rc);
return rc;
}

// Now, store the traces into '~/ascend/tracr/'
tracr_dir = "~/ascend/tracr_" + std::to_string(sampleID++) + "/proc." + std::to_string(1000 + device_runner->device_id());
rc = TracrData2BTS(tracrData, tracrDataSizes, runtime.aicpu_thread_num);
if (rc != 0) {
LOG_ERROR("TracrData2BTS() failed");
return rc;
}

// Free tmp Host TraCR data placeholders
std::free(reinterpret_cast<void *>(tracrData));
std::free(reinterpret_cast<void *>(tracrDataSizes));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using manual std::malloc and std::free is prone to memory leaks if any of the intermediate operations (like copy_from_device or TracrData2BTS) fail and return early. Using std::vector automatically manages the memory lifetime and prevents leaks on error paths.

    size_t size = sizeof(TraCR::Payload) * TraCR::CAPACITY * runtime.aicpu_thread_num;
    std::vector<TraCR::Payload> tracrData(TraCR::CAPACITY * runtime.aicpu_thread_num);
    int rc = device_runner->copy_from_device(reinterpret_cast<void*>(tracrData.data()), 
                                            reinterpret_cast<void*>(runtime.tracrData_), size);
    if (rc != 0) {
        LOG_ERROR("device_runner->copy_from_device 'tracrData' failed rc=%d", rc);
        return rc;
    }

    // Download the tracrDataSizes_ from Device to Host
    size = sizeof(size_t) * runtime.aicpu_thread_num;
    std::vector<size_t> tracrDataSizes(runtime.aicpu_thread_num);
    rc = device_runner->copy_from_device(reinterpret_cast<void*>(tracrDataSizes.data()), 
                                            reinterpret_cast<void*>(runtime.tracrDataSizes_), size);
    if (rc != 0) {
        LOG_ERROR("device_runner->copy_from_device 'tracrDataSizes' failed rc=%d", rc);
        return rc;
    }

    // Now, store the traces into '~/ascend/tracr/'
    tracr_dir = "~/ascend/tracr_" + std::to_string(sampleID++) + "/proc." + std::to_string(1000 + device_runner->device_id());
    rc = TracrData2BTS(tracrData.data(), tracrDataSizes.data(), runtime.aicpu_thread_num);
    if (rc != 0) {
        LOG_ERROR("TracrData2BTS() failed");
        return rc;
    }

    // Free device TraCR memory data placeholder
    device_runner->free_tensor(runtime.tracrData_);
    device_runner->free_tensor(runtime.tracrDataSizes_);

Comment thread tools/tracr_simpler_api.hpp Outdated
Comment on lines +247 to +251
runtime.tracrDataSizes_ = device_runner->allocate_tensor(runtime.aicpu_thread_num * sizeof(size_t));
if (runtime.tracrDataSizes_ == nullptr) {
LOG_ERROR("runtime.tracrDataSizes_: alloc %zu bytes failed", size);
return -1;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If the allocation of runtime.tracrDataSizes_ fails, the function returns -1 without freeing runtime.tracrData_, leading to a memory leak. Additionally, the log message prints size (which is the size of tracrData_) instead of the actual size of tracrDataSizes_.

    runtime.tracrDataSizes_ = device_runner->allocate_tensor(runtime.aicpu_thread_num * sizeof(size_t));
    if (runtime.tracrDataSizes_ == nullptr) {
        size_t sizes_bytes = runtime.aicpu_thread_num * sizeof(size_t);
        LOG_ERROR("runtime.tracrDataSizes_: alloc %zu bytes failed", sizes_bytes);
        device_runner->free_tensor(runtime.tracrData_);
        runtime.tracrData_ = nullptr;
        return -1;
    }

Comment on lines +820 to +832
if (tracrThread->_traceIdx > 0) {
TraCR::Payload* tracrData = reinterpret_cast<TraCR::Payload*>(runtime->tracrData_);
const size_t payload_size = tracrThread->_traceIdx * sizeof(TraCR::Payload);

std::memcpy(
&tracrData[g_TraCR_thread_idx * TraCR::CAPACITY],
tracrThread->_traces.data(),
payload_size
);
}

size_t* tracrDataSizes = reinterpret_cast<size_t*>(runtime->tracrDataSizes_);
tracrDataSizes[g_TraCR_thread_idx] = tracrThread->_traceIdx;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Defensive programming requires checking that runtime->tracrData_ and runtime->tracrDataSizes_ are not nullptr before dereferencing them. Additionally, g_TraCR_thread_idx must be validated to be within [0, runtime->aicpu_thread_num) to prevent out-of-bounds memory corruption.

    if (g_TraCR_thread_idx >= 0 && g_TraCR_thread_idx < runtime->aicpu_thread_num) {
        if (runtime->tracrData_ != nullptr && tracrThread->_traceIdx > 0) {
            TraCR::Payload* tracrData = reinterpret_cast<TraCR::Payload*>(runtime->tracrData_);
            const size_t payload_size = tracrThread->_traceIdx * sizeof(TraCR::Payload);

            std::memcpy(
                &tracrData[g_TraCR_thread_idx * TraCR::CAPACITY],
                tracrThread->_traces.data(),
                payload_size
            );
        }

        if (runtime->tracrDataSizes_ != nullptr) {
            size_t* tracrDataSizes = reinterpret_cast<size_t*>(runtime->tracrDataSizes_);
            tracrDataSizes[g_TraCR_thread_idx] = tracrThread->_traceIdx;
        }
    } else {
        LOG_ERROR("[TraCR] thread index %d out of bounds (max=%d)", g_TraCR_thread_idx, runtime->aicpu_thread_num);
    }
References
  1. When using a signed integer for a count or size parameter in functions that perform bounds checks and memory operations, always validate that the count is non-negative to prevent bypassing bounds checks.
  2. Ensure that index-based accessors using signed integers perform bounds checks to prevent undefined behavior, out-of-bounds memory access, or signed-to-unsigned conversion overflows.

Comment thread tools/tracr.cmake Outdated
# TraCR threads capacity (default is 1<<20 ~= 1 million traces per thread = ~17MB per thread buffer size)
set(TRACR_CAPACITY "" CACHE STRING "Optional TraCR buffer capacity (empty = use internal default)")

if(NOT TRACR_CAPACITY STREQUAL "")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to the general rules, when checking if a string variable is non-empty, you should quote the variable expansion to prevent unexpected behavior if the string value matches a boolean keyword or is numeric.

        if(NOT "${TRACR_CAPACITY}" STREQUAL "")
References
  1. In CMake, when checking if a string variable is non-empty, explicitly check against an empty string using NOT ... STREQUAL "" and quote the variable expansion.

Comment thread tools/tracr.cmake Outdated
TRACR_POLICY_IGNORE_IF_FULL
)

if(NOT TRACR_POLICY STREQUAL "")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to the general rules, when checking if a string variable is non-empty, you should quote the variable expansion to prevent unexpected behavior if the string value matches a boolean keyword or is numeric.

        if(NOT "${TRACR_POLICY}" STREQUAL "")
References
  1. In CMake, when checking if a string variable is non-empty, explicitly check against an empty string using NOT ... STREQUAL "" and quote the variable expansion.

Comment thread tools/tracr_simpler_api.hpp Outdated
Comment on lines +123 to +128
for(int i = 0; i < int(RUNTIME_MAX_WORKER/3); ++i) {
channel_names.push_back("AICube_" + std::to_string(i));
}
for(int i = 0; i < int(2*RUNTIME_MAX_WORKER/3); ++i) {
channel_names.push_back("AIVector_" + std::to_string(i));
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using the hardcoded RUNTIME_MAX_WORKER macro creates an implicit dependency on external headers and populates channel_names with inactive channels. Use the active runtime.worker_count instead to ensure the metadata accurately reflects the running configuration.

    for(int i = 0; i < int(runtime.worker_count/3); ++i) {
        channel_names.push_back("AICube_" + std::to_string(i));
    }
    for(int i = 0; i < int(2*runtime.worker_count/3); ++i) {
        channel_names.push_back("AIVector_" + std::to_string(i));
    }

#ifdef INDEP_ORCH
INSTRUMENTATION_MARK_SET(g_TraCR_thread_idx, Barrier, orchestrator_done_);
LOG_INFO_V9("[TraCR] Thread %d: Waiting before the Orch to finish: %d, orchestrator_done_=%d", g_TraCR_thread_idx, g_TraCR_thread_idx_counter.load(), orchestrator_done_);
while (!orchestrator_done_){};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to the general rules, performing a tight spin loop without a spin hint can waste CPU cycles and cause high power consumption. Use SPIN_WAIT_HINT() inside the loop to yield execution and prevent compiler hoisting under -O3.

    while (!orchestrator_done_) {
        SPIN_WAIT_HINT();
    }

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp (1)

882-893: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

init_failed_ exits leak the TraCR lifecycle.

After TRACR_START(), every return path needs to pair the reset/finalize sequence. The early init_failed_ return skips both, leaving the per-run counter and thread-local tracer state live for the next execution.

Also applies to: 909-911

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp` around
lines 882 - 893, The early return in aicpu_executor.cpp after
g_aicpu_executor.init() bypasses the TraCR cleanup, so make every init_failed_
exit in this init-wait loop pair TRACR_START() with the matching reset/finalize
sequence before returning. Update the error path around
g_aicpu_executor.init_done_ / init_failed_ so it performs the same TraCR
lifecycle teardown as the normal exit path, including the additional return site
referenced in the follow-up comment.
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_completion.cpp (1)

390-402: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Don't reset the core lane until the core is actually idle.

running_done can immediately flow into promote_pending_to_running(core) below, so clearing the marker here drops the remaining in-flight task from the trace. Any completion path that leaves one slot loaded should transition back to Running_Task_Single, not blank the lane.

Also applies to: 406-418

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_completion.cpp`
around lines 390 - 402, The completion path in scheduler_completion.cpp is
clearing the core lane too early in the `running_done` branch, which drops
in-flight work from the trace. Update the logic around `complete_slot_task(...)`
and the `promote_pending_to_running(core)` flow so the lane is only reset when
the core is truly idle; if one slot remains loaded, transition it back to
`Running_Task_Single` instead of blanking the lane. Keep the
`INSTRUMENTATION_MARK_RESET(...)` call aligned with the actual idle state in
this completion path.
🧹 Nitpick comments (1)
.gitmodules (1)

1-4: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Pin submodule to a specific branch or tag for reproducible builds.

The TraCR submodule lacks branch or tag pinning. Without this, builds are non-deterministic and may break if upstream pushes incompatible changes.

 [submodule "tools/tracr"]
 	path = tools/tracr
 	url = https://github.com/huawei-csl/TracR.git
+	branch = main
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.gitmodules around lines 1 - 4, The tools/tracr submodule entry is not
pinned, so update the submodule configuration to reference a specific branch or
tag for deterministic builds. Modify the existing submodule definition for
tools/tracr in .gitmodules to include explicit pinning metadata, and keep the
URL/path mapping intact so future checkouts resolve to the intended TraCR
version.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/a2a3/platform/onboard/host/device_runner.cpp`:
- Around line 270-278: The TraCR allocation/error path in
device_runner::StoreTracrData is not unwind-safe, so failed runs can skip
cleanup and leak buffers. Make the DevAllocTraCR()/runtime pointer setup
transactional or add a single all-exit cleanup path that always runs before
returning, ensuring the StoreTracrData free path and
teardown_shared_collectors_after_run() are executed even on failures. Keep the
final export-error return only after teardown has completed, and preserve the
existing error logging around DevAllocTraCR and later failure points.

In `@src/a2a3/platform/sim/aicpu/CMakeLists.txt`:
- Around line 77-79: The TraCR post-processing include is only registered from
the sim-only aicpu CMake target, so onboard builds never get the shared
post-process step. Move the include of tracr_postprocessing_script.cmake out of
the aicpu-specific CMakeLists and into a shared parent CMake entry that is
evaluated for all platforms, ensuring it is added exactly once for both sim and
onboard builds.

In `@src/a2a3/platform/sim/host/device_runner.cpp`:
- Around line 238-246: The TraCR setup in device_runner.cpp is not unwind-safe:
`DevAllocTraCR()` can partially initialize runtime state, and later early
returns in `StoreTracrData()` / the export path skip the only cleanup and
collector shutdown sequence. Refactor the `StoreTracrData()` flow to funnel all
exits through a single TraCR cleanup path that frees any allocated TraCR buffers
and always runs the collector stop/reconcile/export teardown before returning,
and defer propagating export failure until after that shutdown sequence
completes.

In `@src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp`:
- Around line 799-807: `TRACR_START()` is treating `g_TraCR_thread_idx == 0` as
the session owner, but that value only means “first arriver,” not “last to
finish.” Update the lifecycle in `TRACR_START()` and the corresponding teardown
path near the related `INSTRUMENTATION_END()` handling so the global trace
session is started once and ended only when all participating threads have
completed. Ensure `g_TraCR_thread_idx_counter` is not reset by thread 0 while
other `INSTRUMENTATION_THREAD_INIT()` threads may still be active.

In
`@src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp`:
- Around line 819-823: The INDEP_ORCH wait in scheduler_dispatch.cpp uses
orchestrator_done_ as an unsynchronized spin flag, which can lead to a stale
read and an infinite loop. Update the barrier in the scheduler dispatch path to
use a synchronized/atomic latch for orchestrator_done_ (or an existing atomic
equivalent), and make the polling loop in the INDEP_ORCH block use acquire
semantics together with SPIN_WAIT_HINT() so the wait is safe on weakly ordered
CPUs.

In `@tools/tracr`:
- Line 1: The TraCR submodule pin is incompatible with the header path expected
by the build, so update the gitlink in tools/tracr to a revision that provides
include/tracr/tracr.hpp or adjust tracr_enable/CMake helper to reference the
actual header location for the current TraCR commit. Use the existing
tracr_enable configuration logic and the tools/tracr submodule reference to
locate the mismatch and keep the pinned revision and header path consistent.

In `@tools/tracr_postprocessing_script.cmake`:
- Around line 11-29: The current `set(BUILD_DIR ...)`, `configure_file(...)`,
and `set_target_properties(tracr_process ...)` logic is hard-coding outputs
relative to `CMAKE_CURRENT_LIST_DIR`, which points at the source tree and can
collide across builds. Update `tracr_postprocessing_script.cmake` to derive the
output location from `CMAKE_CURRENT_BINARY_DIR` (or an injected binary output
variable) and use that for both the `configure_file` destination of `state.cfg`
and the `RUNTIME_OUTPUT_DIRECTORY` of `tracr_process`, so artifacts land in the
active binary tree instead of `tools/../build/output/bin/`.

In `@tools/tracr_simpler_api.hpp`:
- Around line 241-250: The allocation path in the runtime setup leaves
runtime.tracrData_ live if the follow-up allocate_tensor call for
runtime.tracrDataSizes_ fails; update the same function to explicitly release or
null out runtime.tracrData_ before returning -1 so the partially initialized
state is cleaned up consistently. Use the existing runtime.tracrData_ and
runtime.tracrDataSizes_ symbols in the device allocation block to keep the
failure path safe for retries and cleanup.
- Around line 188-227: The tracing download flow in the TraCR helper leaks both
host and device buffers on early returns. Update the block around the
copy_from_device calls, TracrData2BTS, and StoreTracrMetaData so cleanup is
centralized with RAII or a scope guard, and make sure both tracrData and
tracrDataSizes are checked for null before use. Also ensure runtime.tracrData_
and runtime.tracrDataSizes_ are always released even when any rc check fails,
not only on the success path.
- Around line 33-40: The header currently defines non-inline globals sampleID
and tracr_dir, which can cause multiple-definition linker errors and unsafe
static initialization; update the declarations in tracr_simpler_api.hpp so these
variables are either marked inline or moved to a .cpp file with extern
declarations in the header. For sampleID, prefer lazy inline initialization via
getSampleID() so std::stoul is not run during problematic static initialization
and remains tied to the existing helper.

In `@tools/tracr.cmake`:
- Around line 29-56: The TraCR header probe in tracr_enable() runs even when
BUILD_TRACR is off, causing unnecessary configure failures. Move the
TRACR_INCLUDE_DIR setup, tracr.hpp existence check, and
target_include_directories call inside the BUILD_TRACR branch in tracr_enable(),
or return early before probing when BUILD_TRACR is false, so downstream targets
can invoke it safely without requiring the submodule.

---

Outside diff comments:
In `@src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp`:
- Around line 882-893: The early return in aicpu_executor.cpp after
g_aicpu_executor.init() bypasses the TraCR cleanup, so make every init_failed_
exit in this init-wait loop pair TRACR_START() with the matching reset/finalize
sequence before returning. Update the error path around
g_aicpu_executor.init_done_ / init_failed_ so it performs the same TraCR
lifecycle teardown as the normal exit path, including the additional return site
referenced in the follow-up comment.

In
`@src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_completion.cpp`:
- Around line 390-402: The completion path in scheduler_completion.cpp is
clearing the core lane too early in the `running_done` branch, which drops
in-flight work from the trace. Update the logic around `complete_slot_task(...)`
and the `promote_pending_to_running(core)` flow so the lane is only reset when
the core is truly idle; if one slot remains loaded, transition it back to
`Running_Task_Single` instead of blanking the lane. Keep the
`INSTRUMENTATION_MARK_RESET(...)` call aligned with the actual idle state in
this completion path.

---

Nitpick comments:
In @.gitmodules:
- Around line 1-4: The tools/tracr submodule entry is not pinned, so update the
submodule configuration to reference a specific branch or tag for deterministic
builds. Modify the existing submodule definition for tools/tracr in .gitmodules
to include explicit pinning metadata, and keep the URL/path mapping intact so
future checkouts resolve to the intended TraCR version.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 88334310-c768-4931-80be-a1387c83c6b4

📥 Commits

Reviewing files that changed from the base of the PR and between a27d9f8 and 104c330.

📒 Files selected for processing (22)
  • .gitmodules
  • examples/a2a3/tensormap_and_ringbuffer/paged_attention/kernels/orchestration/paged_attention_orch.cpp
  • simpler_setup/kernel_compiler.py
  • src/a2a3/platform/onboard/aicpu/CMakeLists.txt
  • src/a2a3/platform/onboard/host/CMakeLists.txt
  • src/a2a3/platform/onboard/host/device_runner.cpp
  • src/a2a3/platform/sim/aicpu/CMakeLists.txt
  • src/a2a3/platform/sim/host/CMakeLists.txt
  • src/a2a3/platform/sim/host/device_runner.cpp
  • src/a2a3/runtime/host_build_graph/runtime/pto_runtime2_types.h
  • src/a2a3/runtime/host_build_graph/runtime/runtime.h
  • src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/runtime.h
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_completion.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
  • tools/tracr
  • tools/tracr.cmake
  • tools/tracr_postprocessing_script.cmake
  • tools/tracr_simpler_api.hpp
  • tools/tracr_simpler_markers.hpp

Comment on lines +270 to +278
// Initialize TraCR memory on the device
#ifdef ENABLE_TRACR
// LOG_INFO_V9("[TraCR] thread[%d] DevAllocTraCR device_id_=%d", sched_getcpu(), device_id_);
rc = DevAllocTraCR(this, runtime);
if (rc != 0) {
LOG_ERROR("DevAllocTraCR failed rc=%d", rc);
return rc;
}
#endif

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Make the TraCR lifecycle unwind-safe.

DevAllocTraCR() can fail after populating one runtime pointer, and every later return before Line 487 skips the only shown free path in StoreTracrData(). The new Line 490 return also bypasses teardown_shared_collectors_after_run() at Line 496. Failed runs will leak TraCR buffers and skip the collector stop/export path. Either guard the TraCR buffers with an all-exit cleanup path here or make the helper transactional, then return the export error only after teardown has run.

Also applies to: 485-493

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/a2a3/platform/onboard/host/device_runner.cpp` around lines 270 - 278, The
TraCR allocation/error path in device_runner::StoreTracrData is not unwind-safe,
so failed runs can skip cleanup and leak buffers. Make the
DevAllocTraCR()/runtime pointer setup transactional or add a single all-exit
cleanup path that always runs before returning, ensuring the StoreTracrData free
path and teardown_shared_collectors_after_run() are executed even on failures.
Keep the final export-error return only after teardown has completed, and
preserve the existing error logging around DevAllocTraCR and later failure
points.

Comment on lines +77 to +79
# TODO: move this somewhere such that EVERY platform launches this once. Placing this here is hacky...
include(${CMAKE_CURRENT_SOURCE_DIR}/../../../../../tools/tracr_postprocessing_script.cmake)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Hoist TraCR post-processing out of the sim-only target.

Lines 77-79 register tracr_postprocessing_script.cmake only from the sim aicpu target, and the TODO already notes this should run once for every platform. As written, onboard builds miss the advertised post-process step entirely. Move this include to a shared parent CMake entry so it is added exactly once for both sim and onboard builds.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/a2a3/platform/sim/aicpu/CMakeLists.txt` around lines 77 - 79, The TraCR
post-processing include is only registered from the sim-only aicpu CMake target,
so onboard builds never get the shared post-process step. Move the include of
tracr_postprocessing_script.cmake out of the aicpu-specific CMakeLists and into
a shared parent CMake entry that is evaluated for all platforms, ensuring it is
added exactly once for both sim and onboard builds.

Comment on lines +238 to +246
// Initialize TraCR memory on the device
#ifdef ENABLE_TRACR
rc = DevAllocTraCR(this, runtime);
if (rc != 0) {
LOG_ERROR("DevAllocTraCR failed rc=%d", rc);
return rc;
}
#endif

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Make the TraCR lifecycle unwind-safe.

DevAllocTraCR() can fail after populating one runtime pointer, and every later return before Line 529 skips the only shown free path in StoreTracrData(). The new Line 532 return also bypasses the collector stop/reconcile/export sequence at Lines 538-572. Failed runs will leak TraCR buffers and skip collector teardown. Add an all-exit TraCR cleanup path and defer the export failure until after the collector shutdown path has completed.

Also applies to: 527-535

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/a2a3/platform/sim/host/device_runner.cpp` around lines 238 - 246, The
TraCR setup in device_runner.cpp is not unwind-safe: `DevAllocTraCR()` can
partially initialize runtime state, and later early returns in
`StoreTracrData()` / the export path skip the only cleanup and collector
shutdown sequence. Refactor the `StoreTracrData()` flow to funnel all exits
through a single TraCR cleanup path that frees any allocated TraCR buffers and
always runs the collector stop/reconcile/export teardown before returning, and
defer propagating export failure until after that shutdown sequence completes.

Comment on lines +799 to +807
inline void TRACR_START() {
g_TraCR_thread_idx = g_TraCR_thread_idx_counter.fetch_add(1, std::memory_order_relaxed);

if (g_TraCR_thread_idx == 0) {
INSTRUMENTATION_START();
} else {
INSTRUMENTATION_THREAD_INIT();
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win

Don't end the trace session on “TraCR thread 0”.

TRACR_START() assigns g_TraCR_thread_idx in arrival order, so index 0 is just the first caller, not the last finisher. If that thread executes INSTRUMENTATION_END() and resets the global counter while siblings are still running, the remaining threads race session teardown and can reuse thread slots inside the same run.

Also applies to: 835-839

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp` around
lines 799 - 807, `TRACR_START()` is treating `g_TraCR_thread_idx == 0` as the
session owner, but that value only means “first arriver,” not “last to finish.”
Update the lifecycle in `TRACR_START()` and the corresponding teardown path near
the related `INSTRUMENTATION_END()` handling so the global trace session is
started once and ended only when all participating threads have completed.
Ensure `g_TraCR_thread_idx_counter` is not reset by thread 0 while other
`INSTRUMENTATION_THREAD_INIT()` threads may still be active.

Comment on lines +819 to +823
#ifdef INDEP_ORCH
INSTRUMENTATION_MARK_SET(g_TraCR_thread_idx, Barrier, orchestrator_done_);
LOG_INFO_V9("[TraCR] Thread %d: Waiting before the Orch to finish: %d, orchestrator_done_=%d", g_TraCR_thread_idx, g_TraCR_thread_idx_counter.load(), orchestrator_done_);
while (!orchestrator_done_){};
#endif

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win

The INDEP_ORCH barrier is a racy infinite-spin.

This wait reads orchestrator_done_ as an unsynchronized boolean, while the same flag is reset with plain assignment elsewhere. On weakly ordered cores the scheduler can spin forever on a stale false. Make this latch atomic (or reuse an existing synchronized one) and poll it with acquire semantics plus SPIN_WAIT_HINT().

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp`
around lines 819 - 823, The INDEP_ORCH wait in scheduler_dispatch.cpp uses
orchestrator_done_ as an unsynchronized spin flag, which can lead to a stale
read and an infinite loop. Update the barrier in the scheduler dispatch path to
use a synchronized/atomic latch for orchestrator_done_ (or an existing atomic
equivalent), and make the polling loop in the INDEP_ORCH block use acquire
semantics together with SPIN_WAIT_HINT() so the wait is safe on weakly ordered
CPUs.

Comment on lines +11 to +29
set(BUILD_DIR "${CMAKE_CURRENT_LIST_DIR}/../build/output/bin/")

message(STATUS "TraCR: REAL_SOURCE_DIR: '${CMAKE_CURRENT_LIST_DIR}'")

# Paraver format configuration file
configure_file(
${CMAKE_CURRENT_LIST_DIR}/tracr/postprocessing/paraver/state.cfg
${BUILD_DIR}/state.cfg
COPYONLY
)

add_executable(tracr_process ${CMAKE_CURRENT_LIST_DIR}/tracr/postprocessing/tracr_process.cpp)

tracr_enable(tracr_process)

# Set the output directory for the compiled executable
set_target_properties(tracr_process PROPERTIES
RUNTIME_OUTPUT_DIRECTORY ${BUILD_DIR}
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Emit state.cfg and tracr_process into the binary tree, not the source tree.

CMAKE_CURRENT_LIST_DIR resolves to tools/, so this hard-codes outputs under the repository checkout (../build/output/bin). Out-of-tree or parallel builds will stomp each other's artifacts, and the files won't follow the active build directory. Base this on CMAKE_CURRENT_BINARY_DIR (or a caller-provided binary output dir) instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/tracr_postprocessing_script.cmake` around lines 11 - 29, The current
`set(BUILD_DIR ...)`, `configure_file(...)`, and
`set_target_properties(tracr_process ...)` logic is hard-coding outputs relative
to `CMAKE_CURRENT_LIST_DIR`, which points at the source tree and can collide
across builds. Update `tracr_postprocessing_script.cmake` to derive the output
location from `CMAKE_CURRENT_BINARY_DIR` (or an injected binary output variable)
and use that for both the `configure_file` destination of `state.cfg` and the
`RUNTIME_OUTPUT_DIRECTORY` of `tracr_process`, so artifacts land in the active
binary tree instead of `tools/../build/output/bin/`.

Comment on lines +33 to +40
size_t getSampleID() {
const auto env = std::getenv("PYPTO_RUN_SAMPLE_ID");
return env ? std::stoul(env) : 0;
}
size_t sampleID = getSampleID();

std::string tracr_dir = "~/ascend/tracr/proc.1";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major

Move non-inline global variables out of the header or mark them inline.

Defining variables (sampleID, tracr_dir) in a header without inline creates external linkage. Including this header in multiple translation units will cause linker errors (multiple definition). Additionally, the initialization of sampleID uses std::stoul, which can throw an exception during static initialization, causing the program to crash before standard error handling is active.

Since C++17 is available, prefer declaring these as inline variables with lazy initialization to ensure single-definition and safe initialization:

Recommended fix
inline size_t sampleID = []() {
    const char* env = std::getenv("PYPTO_RUN_SAMPLE_ID");
    return env ? std::stoul(env) : 0;
}();
inline std::string tracr_dir = "~/ascend/tracr/proc.1";

Alternatively, move the definitions to a corresponding .cpp file and declare them extern in the header.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/tracr_simpler_api.hpp` around lines 33 - 40, The header currently
defines non-inline globals sampleID and tracr_dir, which can cause
multiple-definition linker errors and unsafe static initialization; update the
declarations in tracr_simpler_api.hpp so these variables are either marked
inline or moved to a .cpp file with extern declarations in the header. For
sampleID, prefer lazy inline initialization via getSampleID() so std::stoul is
not run during problematic static initialization and remains tied to the
existing helper.

Comment thread tools/tracr_simpler_api.hpp Outdated
Comment on lines +188 to +227
size_t size = sizeof(TraCR::Payload) * TraCR::CAPACITY * runtime.aicpu_thread_num;
TraCR::Payload* tracrData = reinterpret_cast<TraCR::Payload *>(std::malloc(size));
int rc = device_runner->copy_from_device(reinterpret_cast<void*>(tracrData),
reinterpret_cast<void*>(runtime.tracrData_), size);
if (rc != 0) {
LOG_ERROR("device_runner->copy_from_device 'tracrData' failed rc=%d", rc);
return rc;
}

// Download the tracrDataSizes_ from Device to Host
size = sizeof(size_t) * runtime.aicpu_thread_num;
size_t* tracrDataSizes = reinterpret_cast<size_t *>(std::malloc(size));
rc = device_runner->copy_from_device(reinterpret_cast<void*>(tracrDataSizes),
reinterpret_cast<void*>(runtime.tracrDataSizes_), size);
if (rc != 0) {
LOG_ERROR("device_runner->copy_from_device 'tracrDataSizes' failed rc=%d", rc);
return rc;
}

// Now, store the traces into '~/ascend/tracr/'
tracr_dir = "~/ascend/tracr_" + std::to_string(sampleID++) + "/proc." + std::to_string(1000 + device_runner->device_id());
rc = TracrData2BTS(tracrData, tracrDataSizes, runtime.aicpu_thread_num);
if (rc != 0) {
LOG_ERROR("TracrData2BTS() failed");
return rc;
}

// Free tmp Host TraCR data placeholders
std::free(reinterpret_cast<void *>(tracrData));
std::free(reinterpret_cast<void *>(tracrDataSizes));

// Free device TraCR memory data placeholder
device_runner->free_tensor(runtime.tracrData_);
device_runner->free_tensor(runtime.tracrDataSizes_);

rc = StoreTracrMetaData(runtime);
if (rc != 0) {
LOG_ERROR("StoreTracrMetaData failed: %d", rc);
return rc;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

These error paths leak both host buffers and device trace tensors.

Every return between the first malloc() and StoreTracrMetaData() skips some combination of tracrData, tracrDataSizes, runtime.tracrData_, and runtime.tracrDataSizes_. A failed copy or file write will leave tracing buffers allocated for the rest of the process. Please switch this block to RAII/scope-guard cleanup and reject null malloc() results before calling copy_from_device().

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/tracr_simpler_api.hpp` around lines 188 - 227, The tracing download
flow in the TraCR helper leaks both host and device buffers on early returns.
Update the block around the copy_from_device calls, TracrData2BTS, and
StoreTracrMetaData so cleanup is centralized with RAII or a scope guard, and
make sure both tracrData and tracrDataSizes are checked for null before use.
Also ensure runtime.tracrData_ and runtime.tracrDataSizes_ are always released
even when any rc check fails, not only on the success path.

Comment thread tools/tracr_simpler_api.hpp Outdated
Comment on lines +241 to +250
runtime.tracrData_ = device_runner->allocate_tensor(size);
if (runtime.tracrData_ == nullptr) {
LOG_ERROR("runtime.tracrData_: alloc %zu bytes failed", size);
return -1;
}
// LOG_INFO_V9("Device alloc start of size=%u, %p", size, runtime.tracrData_);
runtime.tracrDataSizes_ = device_runner->allocate_tensor(runtime.aicpu_thread_num * sizeof(size_t));
if (runtime.tracrDataSizes_ == nullptr) {
LOG_ERROR("runtime.tracrDataSizes_: alloc %zu bytes failed", size);
return -1;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Rollback runtime.tracrData_ if the second device allocation fails.

When allocate_tensor(runtime.aicpu_thread_num * sizeof(size_t)) fails, the function returns with the first device buffer still live in runtime.tracrData_. That leaks device memory and leaves runtime in a half-initialized state for any retry/error cleanup path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/tracr_simpler_api.hpp` around lines 241 - 250, The allocation path in
the runtime setup leaves runtime.tracrData_ live if the follow-up
allocate_tensor call for runtime.tracrDataSizes_ fails; update the same function
to explicitly release or null out runtime.tracrData_ before returning -1 so the
partially initialized state is cleaned up consistently. Use the existing
runtime.tracrData_ and runtime.tracrDataSizes_ symbols in the device allocation
block to keep the failure path safe for retries and cleanup.

Comment thread tools/tracr.cmake
Comment on lines +29 to +56
# Create the TraCR include directory path
set(TRACR_INCLUDE_DIR
${TRACR_ROOT_DIR}/tracr/include
)

# Check if it even exists
if (NOT EXISTS "${TRACR_INCLUDE_DIR}/tracr/tracr.hpp")
message(FATAL_ERROR
"tracr.hpp not found at ${TRACR_INCLUDE_DIR}/tracr/tracr.hpp"
)
endif()

# Append the nlohmann json path as well
set(TRACR_INCLUDE_DIR
${TRACR_INCLUDE_DIR}
${TRACR_ROOT_DIR}
${TRACR_ROOT_DIR}/tracr/extern
)

# --- include the directories ---
target_include_directories(${target} PRIVATE
${TRACR_INCLUDE_DIR}
)

# --- compiler flags of TraCR ---
if (BUILD_TRACR)
# Flag to enable/disable TraCR calls at compile time
target_compile_definitions(${target} PRIVATE ENABLE_TRACR)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win

Don't probe the TraCR submodule unless BUILD_TRACR is enabled.

tracr_enable() is invoked unconditionally by downstream targets, but the tracr.hpp existence check runs before the if(BUILD_TRACR) guard. That turns every default/off build into a hard configure failure instead of a no-op, which is exactly what CI is doing now. Move the include-dir setup and header validation inside the enabled branch, or return early when BUILD_TRACR is false.

Suggested fix
 function(tracr_enable target)
     message(STATUS "Enabling TraCR '${BUILD_TRACR}' for target: ${target}")

     if (NOT TARGET ${target})
         message(FATAL_ERROR "Target '${target}' does not exist.")
     endif()
+
+    if (NOT BUILD_TRACR)
+        return()
+    endif()

     # Create the TraCR include directory path
     set(TRACR_INCLUDE_DIR
         ${TRACR_ROOT_DIR}/tracr/include
     )
@@
-    # --- compiler flags of TraCR ---
-    if (BUILD_TRACR)
+    # --- compiler flags of TraCR ---
+    if (BUILD_TRACR)
         # Flag to enable/disable TraCR calls at compile time
         target_compile_definitions(${target} PRIVATE ENABLE_TRACR)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Create the TraCR include directory path
set(TRACR_INCLUDE_DIR
${TRACR_ROOT_DIR}/tracr/include
)
# Check if it even exists
if (NOT EXISTS "${TRACR_INCLUDE_DIR}/tracr/tracr.hpp")
message(FATAL_ERROR
"tracr.hpp not found at ${TRACR_INCLUDE_DIR}/tracr/tracr.hpp"
)
endif()
# Append the nlohmann json path as well
set(TRACR_INCLUDE_DIR
${TRACR_INCLUDE_DIR}
${TRACR_ROOT_DIR}
${TRACR_ROOT_DIR}/tracr/extern
)
# --- include the directories ---
target_include_directories(${target} PRIVATE
${TRACR_INCLUDE_DIR}
)
# --- compiler flags of TraCR ---
if (BUILD_TRACR)
# Flag to enable/disable TraCR calls at compile time
target_compile_definitions(${target} PRIVATE ENABLE_TRACR)
# Create the TraCR include directory path
set(TRACR_INCLUDE_DIR
${TRACR_ROOT_DIR}/tracr/include
)
# Check if it even exists
if (NOT EXISTS "${TRACR_INCLUDE_DIR}/tracr/tracr.hpp")
message(FATAL_ERROR
"tracr.hpp not found at ${TRACR_INCLUDE_DIR}/tracr/tracr.hpp"
)
endif()
# Append the nlohmann json path as well
set(TRACR_INCLUDE_DIR
${TRACR_INCLUDE_DIR}
${TRACR_ROOT_DIR}
${TRACR_ROOT_DIR}/tracr/extern
)
# --- include the directories ---
target_include_directories(${target} PRIVATE
${TRACR_INCLUDE_DIR}
)
# --- compiler flags of TraCR ---
if (BUILD_TRACR)
# Flag to enable/disable TraCR calls at compile time
target_compile_definitions(${target} PRIVATE ENABLE_TRACR)
🧰 Tools
🪛 GitHub Actions: CI / 12_pre-commit.txt

[error] 36-36: CMake Error: tracr.hpp not found at /home/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp (triggered by tracr_enable; Call Stack: CMakeLists.txt:75).


[error] 36-36: CMake Error: tracr.hpp not found at /home/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp (triggered by tracr_enable; Call Stack: CMakeLists.txt:80).

🪛 GitHub Actions: CI / 13_ut (ubuntu-latest, 3.10).txt

[error] 36-36: CMake error: tracr.hpp not found at /home/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp (triggered by tracr_enable).

🪛 GitHub Actions: CI / 4_profiling-flags-smoke.txt

[error] 36-36: CMake configuration failed: tracr.hpp not found at /home/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp.

🪛 GitHub Actions: CI / 7_packaging-matrix (ubuntu-latest, 3.10).txt

[error] 36-36: CMake Error: tracr.hpp not found at /home/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp (triggered by tracr_enable in CMakeLists.txt)

🪛 GitHub Actions: CI / 8_packaging-matrix (macos-latest, 3.10).txt

[error] 36-36: CMake configuration failed: tracr.hpp not found at /Users/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp (Call Stack: CMakeLists.txt:75 (tracr_enable)).


[error] 36-36: CMake configuration failed: tracr.hpp not found at /Users/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp (Call Stack: CMakeLists.txt:80 (tracr_enable)).

🪛 GitHub Actions: CI / 9_ut (macos-latest, 3.10).txt

[error] 36-36: CMake configuration failed: tracr.hpp not found at /Users/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp.

🪛 GitHub Actions: CI / packaging-matrix (macos-latest, 3.10)

[error] 36-36: CMake Error: tracr.hpp not found at /Users/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp (triggered by tracr_enable).

🪛 GitHub Actions: CI / packaging-matrix (ubuntu-latest, 3.10)

[error] 36-36: CMake Error: tracr.hpp not found at /home/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp (called from CMakeLists.txt:75 for aicpu_kernel and CMakeLists.txt:80 for host_runtime).

🪛 GitHub Actions: CI / pre-commit

[error] 36-36: CMake error: tracr.hpp not found at /home/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp (triggered from tracr_enable).

🪛 GitHub Actions: CI / profiling-flags-smoke

[error] 36-36: CMake Error: tracr.hpp not found at /home/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp (triggered by tracr_enable).

🪛 GitHub Actions: CI / st-onboard-a2a3

[error] 36-36: CMake error: tracr.hpp not found at /data/ci-runner/simpler-ci/npu-runner-1/_work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp (triggered via tracr_enable in CMakeLists.txt).

🪛 GitHub Actions: CI / st-onboard-a5

[error] 36-36: CMake configuration failed: tracr.hpp not found at /home/pyptouser/simpler_a5ci/actions-runner/_work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp (Call Stack: CMakeLists.txt -> tracr_enable)

🪛 GitHub Actions: CI / st-sim-a2a3 (macos-latest, 3.10)

[error] 36-36: CMake failed: tracr.hpp not found at /Users/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp. Call Stack: CMakeLists.txt:80 (tracr_enable) (HOST target).


[error] 36-36: CMake failed: tracr.hpp not found at /Users/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp. Call Stack: CMakeLists.txt:75 (tracr_enable) (AICPU target).

🪛 GitHub Actions: CI / st-sim-a2a3 (ubuntu-latest, 3.10)

[error] 36-36: CMake Error: tracr.hpp not found at /home/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp. (CMakeLists.txt:80 (tracr_enable))


[error] 36-36: CMake Error: tracr.hpp not found at /home/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp. (CMakeLists.txt:75 (tracr_enable))

🪛 GitHub Actions: CI / st-sim-a5 (ubuntu-latest, 3.10)

[error] 36-36: CMake configuration failed: tracr.hpp not found at /home/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp (CMake Error at tracr.cmake:36 (message)).

🪛 GitHub Actions: CI / ut (macos-latest, 3.10)

[error] 36-36: CMake Error: tracr.hpp not found at /Users/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp (tracr_enable).

🪛 GitHub Actions: CI / ut (ubuntu-latest, 3.10)

[error] 36-36: CMake configuration failed for AICPU (exit code 1): tracr.hpp not found at /home/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp


[error] 36-36: CMake configuration failed for HOST (exit code 1): tracr.hpp not found at /home/runner/work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp

🪛 GitHub Actions: CI / ut-a2a3

[error] 36-36: CMake Error: tracr.hpp not found at /data/ci-runner/simpler-ci/npu-runner-2/_work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp (via tracr_enable from CMakeLists.txt)

🪛 GitHub Actions: CI / ut-a5

[error] 36-36: CMake failed because required header is missing: "tracr.hpp not found at /home/pyptouser/simpler_a5ci/actions-runner/_work/simpler/simpler/tools/tracr/include/tracr/tracr.hpp"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/tracr.cmake` around lines 29 - 56, The TraCR header probe in
tracr_enable() runs even when BUILD_TRACR is off, causing unnecessary configure
failures. Move the TRACR_INCLUDE_DIR setup, tracr.hpp existence check, and
target_include_directories call inside the BUILD_TRACR branch in tracr_enable(),
or return early before probing when BUILD_TRACR is false, so downstream targets
can invoke it safely without requiring the submodule.

Source: Pipeline failures

@noabauma noabauma marked this pull request as draft June 29, 2026 07:52
noabauma and others added 6 commits June 29, 2026 11:53
Brings tracr (main+tracr) up to main f4d14bf. Only aicpu_executor.cpp
conflicted: kept main's refactors (RAII AicpuPhaseScope phases hw-native-sys#1183,
simpler_aicpu_register_callable rename hw-native-sys#1207/hw-native-sys#1210, nested arena-setup
scope) and re-wove the TraCR hooks (Allocating marker, TRACR_START/
FINALIZE, aicpu_execute entry/finalize, tracr_getcpu). All other TraCR
footprint files (scheduler .cpp markers, device_runner, runtime.h,
pto_runtime2_types.h) auto-merged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…1216)

main hw-native-sys#1216 split the trb Runtime into a device-copied DeviceRuntimeLaunchDesc
(`dev`) + host-only tail, so aicpu_thread_num / worker_count / the tracr
pointers are no longer direct `Runtime` members on trb (they live in `dev`,
reached via accessors); host_build_graph keeps them flat. The templated
shared TraCR code (tracr_simpler_api.hpp) and the trb TRACR_FINALIZE used the
old direct-field API and failed to compile under BUILD_TRACR=ON.

- Add get/set_tracr_data + get/set_tracr_data_sizes accessors to both Runtime
  variants (mirroring get_aicpu_thread_num), identical signatures so the
  shared header compiles against either.
- Switch tracr_simpler_api.hpp (DevAllocTraCR / StoreTracrData /
  StoreTracrMetaData) and aicpu_executor.cpp TRACR_FINALIZE to the accessors
  (get_aicpu_thread_num / get_worker_count / get/set_tracr_data*).

Verified: BUILD_TRACR=ON build is clean (simpler-cann9 docker).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Whitespace/brace-style only, applied via the repo's pinned pre-commit
clang-format hook (v21.1.0) scoped to the TraCR diff vs main. No
semantic changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
noabauma and others added 3 commits July 1, 2026 15:28
Formatting only, via the repo's pinned ruff hooks (v0.14.8).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
check-headers requires the standard "# Copyright (c) PyPTO Contributors."
header (PY_HEADER) for .cmake files; tracr.cmake and
tracr_postprocessing_script.cmake carried a different Huawei header with an
extra leading rule line. Also picks up trailing-whitespace / end-of-file
fixes on tracr.cmake.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
end-of-file-fixer requires a trailing newline; tracr_simpler_api.hpp and
tracr_simpler_markers.hpp were missing it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@noabauma noabauma marked this pull request as ready for review July 1, 2026 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant