Revitalize AppleTrace: faster hook, Perfetto-only, richer events#12
Merged
Conversation
Document hot-path performance opportunities (string interning, per-thread ring buffers), visualization modernization (Perfetto over deprecated Catapult), trace format improvements, a competitive comparison, and a phased plan. https://claude.ai/code/session_018QmSENiXvZHgJVBTnemWLX
Reposition the project as actively developed (drop maintenance-mode and Messier migration messaging) and land the high-value roadmap items: - objc_msgSend hook: intern (Class, SEL) names and use a zero-allocation per-thread call stack, removing per-message malloc/snprintf from the hot path; add runtime class-prefix allow/deny filtering. - Runtime: emit thread_name metadata so threads are labeled, and add APTInstant and APTCounter event APIs. - Tooling: stream merge.py output for large captures. - Docs: Perfetto-first visualization, document new APIs/env vars, add CONTRIBUTING.md, and refresh README/README_CN/AGENT/ROADMAP. https://claude.ai/code/session_018QmSENiXvZHgJVBTnemWLX
Add an opt-in --complete flag to merge.py and the unified CLI that collapses matched begin/end pairs into X complete events (LIFO per pid/tid), passing through metadata, counter, and instant events untouched and preserving unmatched begins. Covered by new tests; documented in README/README_CN and marked done in ROADMAP. https://claude.ai/code/session_018QmSENiXvZHgJVBTnemWLX
Per project direction: visualization is now Perfetto-only and the platform scope is arm64/arm64e. - Remove the Catapult/Chrome HTML pipeline: delete get_catapult.sh and the sampledata HTML demo, drop the html/all CLI subcommands, and repurpose go.sh and `appletrace_cli.py open` to merge and open ui.perfetto.dev. - Default merge.py to X complete events (--raw opts out); pass async events through unchanged. - Add APTAsyncBegin/APTAsyncEnd nestable async events for work that crosses threads/queues, with tests. - Clarify arm64/arm64e scope in the hook guard/comment (note arm64e ptrauth GOT rebinding needs on-device validation). - Refresh README/README_CN/AGENT/CONTRIBUTING/ROADMAP for Perfetto-only and arm64/arm64e; remove x86_64 as a goal. https://claude.ai/code/session_018QmSENiXvZHgJVBTnemWLX
Detail the Phase 2 hot-path change: per-thread accumulation buffers with a global registry, locking discipline that keeps the hot path uncontended, cross-thread APTFlush that preserves the current contract, thread-exit handling, edge cases to verify on device, an optional binary-event follow-on, and a macOS verification plan. Linked from ROADMAP.
Replace the per-event dispatch_async with a per-thread accumulation buffer: the hot path appends a formatted line under that buffer's os_unfair_lock and ships a whole batch to the serial writer queue only when it crosses a threshold. A registry plus a pthread-key destructor let APTFlush drain every thread and reclaim buffers at thread exit, preserving the flush contract. LoggerManager gains AddBlock, which splits a batch on line boundaries so fragment rollover never splits a JSON object. Unverified: Objective-C/C++ was not compiled in this environment; needs a macOS build and Instruments profiling per docs/perf-batching-design.md.
The framework compiles as gnu++0x (C++11) but the batching writer used std::make_unique (C++14), failing the simulator build. Use a direct unique_ptr construction and include <utility> for std::move.
scripts/test_batching_stress.sh compiles appletrace.mm with a multi-threaded harness (tests/stress/stress_main.mm), runs it, merges the output, and asserts that exactly threads*pairs "stress" complete events survive — verifying the per-thread buffers, cross-thread APTFlush, and thread-exit drain lose or duplicate nothing. Host-only (no Xcode/simulator), so it is independent of the existing smoke jobs.
Fix leftover Chrome/HookZz references, repair the broken intro sentence, modernize the hook-status and dynamic-hook sections (env var / in-app install instead of the stale LLDB loader flow), align the Python version, and add the batching stress test to the testing steps in README and README_CN.
Lock a compact binary fragment format (interned name table + fixed-layout records) and implement the decoder in appletrace_binary.py, wired into merge.py: fragments are detected by magic and decode to the same Chrome/Perfetto JSON as the text path, feeding the X complete-event collapsing. Tolerates crash zero-padding and truncation. Covered by tests; format and the remaining native writer documented in docs/binary-fragment-format.md.
Emit fixed-layout binary records into the per-thread batch buffers instead of JSON when APPLETRACE_BINARY=1, keeping all formatting off the hot path. Naming is interned per thread (globally-unique ids from an atomic counter) so a thread only references ids it defined — safe with batched, out-of-order flushing. LoggerManager writes the magic+pid header per fragment, rolls over whole batches so records are never split, and names fragments .appletracebin. The exporter shares one name table across a run's fragments (a definition in an earlier fragment resolves a reference in a later one after rollover). The text path is unchanged and remains the default, so CI/smoke jobs are unaffected. test_batching_stress.sh now verifies both text and binary modes. Unverified: Objective-C/C++ not compiled here; validate on macOS via scripts/test_batching_stress.sh.
The native binary writer (APPLETRACE_BINARY=1) now passes the host stress test (scripts/test_batching_stress.sh) in binary mode through 200k cross-thread event pairs with no loss or duplication. Update the format spec status from "pending macOS verification" accordingly; arm64e on-device validation is still recommended. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comprehensive rewrite for clarity and accuracy: - Add table of contents, "How It Works" diagram, and a platform/hook support matrix reflecting validated arm64 vs preview arm64e auto-hook. - Fix the framework build instructions (cannot cd into a .xcodeproj; build from repo root) and add the arm64e override invocation. - Correct the trace output path to <app sandbox>/Library/appletracedata. - Document the binary fragment format, APTSyncWait/APTIsObjcMsgSendHook controls, and APPLETRACE_BINARY; fix an invalid badge color. - Move the Star History chart to the very bottom (and drop the forced dark theme so it renders in both light and dark mode). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
arm64e auto-hooking would require rebinding pointer-authenticated GOT entries (__DATA_CONST.__auth_got) and re-signing pointers with the correct ptrauth context, which was never validated on device. Rather than ship a preview-quality hook, drop arm64e entirely: - hook_objc_msgSend.m now hard-errors (#error) when built for arm64e, so the unsupported configuration fails loudly instead of silently producing a broken hook. arm64 compiles and the framework builds as before. - Update README, README_CN, AGENT, CONTRIBUTING, ROADMAP, and the binary-format doc to state arm64-only and remove the arm64e "preview / on-device validation" language. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a macOS CI job that runs scripts/test_batching_stress.sh, which compiles appletrace.mm with the stress harness and verifies the per-thread batched writer in both text and binary (APPLETRACE_BINARY=1) modes — asserting no events are lost or duplicated across threads, flushes, and thread exits. Needs only system python3 + clang++, matching the existing smoke-test jobs (no pip deps). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pytest <9.0.3 uses a predictable /tmp/pytest-of-{user} directory that
lets local users cause a DoS or possibly escalate privileges
(GHSA-6w46-j5rx-g56g). The old pin (>=8.0,<9.0) sat entirely in the
vulnerable range. 9.0.3 is the patched release and passes the existing
suite (24 tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Repositions AppleTrace as an actively developed, lightweight, embeddable tracer
that produces Perfetto-ready traces. Drops the maintenance-mode / Messier
messaging and lands the high-value items from the new
ROADMAP.md.Performance (objc_msgSend hook)
(Class, SEL)→ formatted name (and the "do not trace" decision), sothe hot path no longer does a
malloc/snprintfper message.per-thread array stack.
pop can't unwind an unrelated parent section.
(
APPLETRACE_TRACE_CLASS_ALLOW/APPLETRACE_TRACE_CLASS_DENY).Per-thread batched writing
dispatch_asyncwith a per-thread accumulation buffer:the hot path appends under that buffer's
os_unfair_lockand ships a wholebatch to the serial writer only when it crosses a threshold.
APTFlushdrain every thread andreclaim buffers at thread exit, preserving the flush contract.
LoggerManager::AddBlocksplits a batch on line boundaries so fragmentrollover never splits a JSON object. Design:
docs/perf-batching-design.md.Richer events
thread_namemetadata so threads are labeled in Perfetto.APTInstant,APTCounter,APTAsyncBegin/APTAsyncEnd.Visualization — Perfetto only
get_catapult.sh,sampledata/trace.html,sampledata/genhtml.sh, thehtml/allCLIsubcommands).
go.shandappletrace_cli.py opennow merge and open ui.perfetto.dev.merge.pystreams output and defaults toXcomplete events (--rawoptsout), roughly halving section-event count.
Platform
auto-hook rebinds authenticated
__auth_gotentries and still needson-device ptrauth validation.
Docs & housekeeping
CONTRIBUTING.mdandROADMAP.md.Test plan
python3 -m pytest tests— 11 passing (merge streaming, X complete-eventcollapsing incl. nesting / per-thread isolation / async + metadata
pass-through).
testjob green; native simulator smoke jobs re-run after theC++11 build fix (the project compiles as
gnu++0x, sostd::make_uniquewasreplaced).
scripts/test_batching_stress.sh— buildsappletrace.mmwith amulti-threaded harness and asserts no events are lost/duplicated across
threads + flush + thread-exit. Run a few times (concurrency).
TraceAllMsgDemo— confirmdispatch_asynccount and peak queue memory dropped.https://claude.ai/code/session_018QmSENiXvZHgJVBTnemWLX