Skip to content

Revitalize AppleTrace: faster hook, Perfetto-only, richer events#12

Merged
everettjf merged 17 commits into
masterfrom
claude/review-optimization-roadmap-fJ14l
May 22, 2026
Merged

Revitalize AppleTrace: faster hook, Perfetto-only, richer events#12
everettjf merged 17 commits into
masterfrom
claude/review-optimization-roadmap-fJ14l

Conversation

@everettjf

@everettjf everettjf commented May 20, 2026

Copy link
Copy Markdown
Owner

Summary

Repositions AppleTrace as an actively developed, lightweight, embeddable tracer
that produces Perfetto-ready traces. Drops the maintenance-mode / Messier
messaging and lands the high-value items from the new ROADMAP.md.

Performance (objc_msgSend hook)

  • Intern (Class, SEL) → formatted name (and the "do not trace" decision), so
    the hot path no longer does a malloc/snprintf per message.
  • Replace the per-call malloc'd linked-list trace stack with a zero-allocation
    per-thread array stack.
  • Fix a latent imbalance: filtered-out sends now push a placeholder so a later
    pop can't unwind an unrelated parent section.
  • Add runtime class-prefix allow/deny filtering
    (APPLETRACE_TRACE_CLASS_ALLOW / APPLETRACE_TRACE_CLASS_DENY).

Per-thread batched writing

  • Replace the per-event dispatch_async with a per-thread accumulation buffer:
    the hot path appends under that buffer's os_unfair_lock and ships a whole
    batch to the serial writer only when it crosses a threshold.
  • A registry + pthread-key destructor let APTFlush drain every thread and
    reclaim buffers at thread exit, preserving the flush contract.
  • LoggerManager::AddBlock splits a batch on line boundaries so fragment
    rollover never splits a JSON object. Design: docs/perf-batching-design.md.

Richer events

  • thread_name metadata so threads are labeled in Perfetto.
  • New public APIs: APTInstant, APTCounter, APTAsyncBegin/APTAsyncEnd.

Visualization — Perfetto only

  • Remove the deprecated Catapult/Chrome HTML pipeline (get_catapult.sh,
    sampledata/trace.html, sampledata/genhtml.sh, the html/all CLI
    subcommands).
  • go.sh and appletrace_cli.py open now merge and open ui.perfetto.dev.
  • merge.py streams output and defaults to X complete events (--raw opts
    out), roughly halving section-event count.

Platform

  • Scope explicitly to arm64 + arm64e (x86_64 dropped). Note: the arm64e
    auto-hook rebinds authenticated __auth_got entries and still needs
    on-device ptrauth validation.

Docs & housekeeping

  • Add the previously-missing CONTRIBUTING.md and ROADMAP.md.
  • Refresh README / README_CN / AGENT for Perfetto-only and arm64/arm64e.

Test plan

  • python3 -m pytest tests — 11 passing (merge streaming, X complete-event
    collapsing incl. nesting / per-thread isolation / async + metadata
    pass-through).
  • CI: Python test job green; native simulator smoke jobs re-run after the
    C++11 build fix (the project compiles as gnu++0x, so std::make_unique was
    replaced).
  • Mac: scripts/test_batching_stress.sh — builds appletrace.mm with a
    multi-threaded harness and asserts no events are lost/duplicated across
    threads + flush + thread-exit. Run a few times (concurrency).
  • Mac: Instruments before/after on TraceAllMsgDemo — confirm
    dispatch_async count and peak queue memory dropped.
  • arm64e device: validate the auto-hook under pointer authentication.

https://claude.ai/code/session_018QmSENiXvZHgJVBTnemWLX

claude added 5 commits May 20, 2026 20:00
Document hot-path performance opportunities (string interning, per-thread
ring buffers), visualization modernization (Perfetto over deprecated
Catapult), trace format improvements, a competitive comparison, and a
phased plan.

https://claude.ai/code/session_018QmSENiXvZHgJVBTnemWLX
Reposition the project as actively developed (drop maintenance-mode and
Messier migration messaging) and land the high-value roadmap items:

- objc_msgSend hook: intern (Class, SEL) names and use a zero-allocation
  per-thread call stack, removing per-message malloc/snprintf from the hot
  path; add runtime class-prefix allow/deny filtering.
- Runtime: emit thread_name metadata so threads are labeled, and add
  APTInstant and APTCounter event APIs.
- Tooling: stream merge.py output for large captures.
- Docs: Perfetto-first visualization, document new APIs/env vars, add
  CONTRIBUTING.md, and refresh README/README_CN/AGENT/ROADMAP.

https://claude.ai/code/session_018QmSENiXvZHgJVBTnemWLX
Add an opt-in --complete flag to merge.py and the unified CLI that collapses
matched begin/end pairs into X complete events (LIFO per pid/tid), passing
through metadata, counter, and instant events untouched and preserving
unmatched begins. Covered by new tests; documented in README/README_CN and
marked done in ROADMAP.

https://claude.ai/code/session_018QmSENiXvZHgJVBTnemWLX
Per project direction: visualization is now Perfetto-only and the platform
scope is arm64/arm64e.

- Remove the Catapult/Chrome HTML pipeline: delete get_catapult.sh and the
  sampledata HTML demo, drop the html/all CLI subcommands, and repurpose
  go.sh and `appletrace_cli.py open` to merge and open ui.perfetto.dev.
- Default merge.py to X complete events (--raw opts out); pass async events
  through unchanged.
- Add APTAsyncBegin/APTAsyncEnd nestable async events for work that crosses
  threads/queues, with tests.
- Clarify arm64/arm64e scope in the hook guard/comment (note arm64e ptrauth
  GOT rebinding needs on-device validation).
- Refresh README/README_CN/AGENT/CONTRIBUTING/ROADMAP for Perfetto-only and
  arm64/arm64e; remove x86_64 as a goal.

https://claude.ai/code/session_018QmSENiXvZHgJVBTnemWLX
@everettjf everettjf changed the title Add optimization and roadmap analysis Revitalize AppleTrace: faster hook, Perfetto-only, richer events May 20, 2026
claude and others added 12 commits May 20, 2026 20:56
Detail the Phase 2 hot-path change: per-thread accumulation buffers with a
global registry, locking discipline that keeps the hot path uncontended,
cross-thread APTFlush that preserves the current contract, thread-exit
handling, edge cases to verify on device, an optional binary-event follow-on,
and a macOS verification plan. Linked from ROADMAP.
Replace the per-event dispatch_async with a per-thread accumulation buffer:
the hot path appends a formatted line under that buffer's os_unfair_lock and
ships a whole batch to the serial writer queue only when it crosses a
threshold. A registry plus a pthread-key destructor let APTFlush drain every
thread and reclaim buffers at thread exit, preserving the flush contract.
LoggerManager gains AddBlock, which splits a batch on line boundaries so
fragment rollover never splits a JSON object.

Unverified: Objective-C/C++ was not compiled in this environment; needs a
macOS build and Instruments profiling per docs/perf-batching-design.md.
The framework compiles as gnu++0x (C++11) but the batching writer used
std::make_unique (C++14), failing the simulator build. Use a direct
unique_ptr construction and include <utility> for std::move.
scripts/test_batching_stress.sh compiles appletrace.mm with a multi-threaded
harness (tests/stress/stress_main.mm), runs it, merges the output, and asserts
that exactly threads*pairs "stress" complete events survive — verifying the
per-thread buffers, cross-thread APTFlush, and thread-exit drain lose or
duplicate nothing. Host-only (no Xcode/simulator), so it is independent of the
existing smoke jobs.
Fix leftover Chrome/HookZz references, repair the broken intro sentence,
modernize the hook-status and dynamic-hook sections (env var / in-app install
instead of the stale LLDB loader flow), align the Python version, and add the
batching stress test to the testing steps in README and README_CN.
Lock a compact binary fragment format (interned name table + fixed-layout
records) and implement the decoder in appletrace_binary.py, wired into
merge.py: fragments are detected by magic and decode to the same
Chrome/Perfetto JSON as the text path, feeding the X complete-event collapsing.
Tolerates crash zero-padding and truncation. Covered by tests; format and the
remaining native writer documented in docs/binary-fragment-format.md.
Emit fixed-layout binary records into the per-thread batch buffers instead of
JSON when APPLETRACE_BINARY=1, keeping all formatting off the hot path. Naming
is interned per thread (globally-unique ids from an atomic counter) so a thread
only references ids it defined — safe with batched, out-of-order flushing.
LoggerManager writes the magic+pid header per fragment, rolls over whole batches
so records are never split, and names fragments .appletracebin.

The exporter shares one name table across a run's fragments (a definition in an
earlier fragment resolves a reference in a later one after rollover). The text
path is unchanged and remains the default, so CI/smoke jobs are unaffected.
test_batching_stress.sh now verifies both text and binary modes.

Unverified: Objective-C/C++ not compiled here; validate on macOS via
scripts/test_batching_stress.sh.
The native binary writer (APPLETRACE_BINARY=1) now passes the host
stress test (scripts/test_batching_stress.sh) in binary mode through
200k cross-thread event pairs with no loss or duplication. Update the
format spec status from "pending macOS verification" accordingly;
arm64e on-device validation is still recommended.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comprehensive rewrite for clarity and accuracy:
- Add table of contents, "How It Works" diagram, and a platform/hook
  support matrix reflecting validated arm64 vs preview arm64e auto-hook.
- Fix the framework build instructions (cannot cd into a .xcodeproj;
  build from repo root) and add the arm64e override invocation.
- Correct the trace output path to <app sandbox>/Library/appletracedata.
- Document the binary fragment format, APTSyncWait/APTIsObjcMsgSendHook
  controls, and APPLETRACE_BINARY; fix an invalid badge color.
- Move the Star History chart to the very bottom (and drop the forced
  dark theme so it renders in both light and dark mode).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
arm64e auto-hooking would require rebinding pointer-authenticated GOT
entries (__DATA_CONST.__auth_got) and re-signing pointers with the
correct ptrauth context, which was never validated on device. Rather
than ship a preview-quality hook, drop arm64e entirely:

- hook_objc_msgSend.m now hard-errors (#error) when built for arm64e,
  so the unsupported configuration fails loudly instead of silently
  producing a broken hook. arm64 compiles and the framework builds as
  before.
- Update README, README_CN, AGENT, CONTRIBUTING, ROADMAP, and the
  binary-format doc to state arm64-only and remove the arm64e
  "preview / on-device validation" language.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a macOS CI job that runs scripts/test_batching_stress.sh, which
compiles appletrace.mm with the stress harness and verifies the
per-thread batched writer in both text and binary (APPLETRACE_BINARY=1)
modes — asserting no events are lost or duplicated across threads,
flushes, and thread exits. Needs only system python3 + clang++, matching
the existing smoke-test jobs (no pip deps).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pytest <9.0.3 uses a predictable /tmp/pytest-of-{user} directory that
lets local users cause a DoS or possibly escalate privileges
(GHSA-6w46-j5rx-g56g). The old pin (>=8.0,<9.0) sat entirely in the
vulnerable range. 9.0.3 is the patched release and passes the existing
suite (24 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@everettjf everettjf merged commit 5be7fdc into master May 22, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants