Conversation
`clear profile` now writes heap.pb.gz, lock.pb.gz, and mvcc.pb.gz
into the .profile/ dir alongside the existing text reports. CPU is
delegated to `perf_to_profile` (the standard Go converter for
perf.data); we print a one-line install hint when it's missing.
Pure-Ruby pprof v3 encoder (`src/tools/pprof.rb`, ~190 lines) so the
feature stays stdlib-only — no `google-protobuf` C-extension dep.
Sample-type columns mirror Go's conventions so pprof's standard
flags work out of the box:
- heap: alloc_objects / alloc_space / inuse_objects / inuse_space
- lock: contentions / delay / hold / acquisitions
- mvcc: reads / commits / retries / cow_bytes
(cow_bytes = struct_size * (commits + retries))
Each per-site address resolves via addr2line + the transpiler's
`// CLR:N` markers, so `pprof -list <fn>` shows CLEAR source lines.
Verified end-to-end: `pprof -top -alloc_space heap.pb.gz` reads
function names correctly (clearMain / entryWrapper / makeResultSet
/ intToString) and `-tags` shows the per-sample addr labels. 20 new
specs cover the wire encoder, dedup, gzip envelope, and per-format
column math; 4096 unit specs total, 0 failures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds tab-completion for bash, zsh, and fish. The CLI is the source of truth for the completion script — regenerating `clear completions bash` after a subcommand change refreshes everything. Per-subcommand semantics: - build / run / fmt / fix / profile / explain -> *.cht files - test / benchmark -> *.cht or dirs - doctor -> *.profile/ dirs - completions -> bash | zsh | fish Install (bash): echo 'source <(clear completions bash)' >> ~/.bashrc Install (zsh): mkdir -p ~/.zsh/completions clear completions zsh > ~/.zsh/completions/_clear # ensure fpath=(~/.zsh/completions \$fpath) BEFORE compinit Install (fish): clear completions fish > ~/.config/fish/completions/clear.fish 17 specs cover script structure, dispatch, and CLI integration. Live- verified on this machine: tab after `clear doctor examples/litedb/` suggests only the .profile/ directory; tab after `clear profile examples/litedb/` suggests the .cht file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`alloc-profile.zig` now captures up to 16 stack frames per allocation via `std.debug.captureCurrentStackTrace` (FP-based unwinder; profile builds keep frame pointers via omit_frame_pointer=false). Hash table keyed by the full trace, so distinct call paths to the same helper (intToString, makeResultSet, etc.) resolve as separate sites instead of collapsing into one. Format change (alloc-profile v1 -> v2): the first column of alloc.txt goes from a single hex addr to a comma-separated leaf-first trace. Both consumers updated to parse it; v1 single-addr files still parse as 1-element traces. alloc.txt before: 0x401234 1000 40000 500 20000 500 alloc.txt after: 0x401234,0x402000 1000 40000 500 20000 500 Stack budget: trace buffer is `[16]usize` = 128 B on the recorder's stack, well within the 60 KB Large stacks `clear profile` already forces. captureCurrentStackTrace itself is a frame-pointer walk, no additional frames pushed. SpinLock prevents recorder reentrance, so the existing no-allocation contract holds. Sampling: `clear profile --sample=N` records every Nth alloc and scales captured values by N at record time so pprof / doctor see estimated totals. Default N=1 (no sampling). Header records `sample_n` so consumers can flag the approximation. Useful for hot allocators where the per-alloc unwind cost matters; otherwise leave off for accurate counts. Consumer updates: - src/tools/pprof_converter.rb -- emits multi-frame Sample.location_id; shares Locations across samples that include the same frame (a hot leaf like entryWrapper appears in every trace and is interned once). - src/tools/doctor.rb -- splits the addrs column into a trace; leaf addr remains the "primary site" so existing top-N output is unchanged but now backed by trace data for future caller-aware views. End-to-end verified on examples/litedb: pprof tree shows real caller/ callee relationships (`entryWrapper -> clearMain -> intToString`) where previously cum% always equaled flat%. 4095 unit specs / 0 failures; 2 new specs cover multi-frame parsing and Location reuse. lock-profile and mvcc-profile remain single-frame for now; same pattern applies and will follow as separate commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds optional per-record stack capture to lock-profile (v3) and
mvcc-profile (v2). Off by default; opt in via:
clear profile foo.cht --sync-callstacks
When on, each (lock, caller-trace) pair becomes a distinct row, so
pprof's tree/flame views show per-caller attribution. Doctor
aggregates rows back to one-per-lock for its existing diagnoses
(no behavior change there). Same shape for (cell, caller-trace) in
mvcc-profile.
The flag is off by default because the FP walk costs ~100-500ns per
record, while uncontended mutex acquire is ~10-20ns and MVCC commit
fast paths are ~20-50ns -- the trace can dominate the operation it
measures. When --sync-callstacks is set without an explicit --sample,
we auto-default to --sample=100 to keep the cost manageable. Users
can pass --sample=1 to opt in to full capture at full cost.
FP-unwind through parking-lot verified end-to-end:
parking_mutex_lock -> clearMain -> entryWrapper
parking_mutex_lock -> clearMain.__DoBranchCtx0_0.run -> ...
addr2line resolves all captured frames cleanly. Frame pointers are
already retained in profile builds (omit_frame_pointer = false), and
captureCurrentStackTrace's FP walker handles parking-lot's hot/slow
paths without symbol noise.
New shared module zig/runtime/profile-trace.zig:
- MAX_FRAMES (single source of truth, 16)
- captureFromHere() -- thin wrapper around
std.debug.captureCurrentStackTrace
- syncCallstacksEnabled() -- reads CLEAR_PROFILE_SYNC_CALLSTACKS once
- sampleStride() -- reads CLEAR_PROFILE_SAMPLE once
alloc-profile uses the shared MAX_FRAMES + captureFromHere; its
sample-stride read is unchanged so it doesn't move under us.
Wire format:
- lock-profile v3: 12th tab-separated column `caller_trace`. `-` =
empty (sync-callstacks off); comma-separated leaf-first addrs
when populated.
- mvcc-profile v2: 8th column same shape.
Tab-separated lines so the trace field can carry commas without a
column-count ambiguity. Older whitespace-only files still parse via
fallback split.
Consumers (doctor.rb, pprof_converter.rb) updated to:
- tab-split the line; fall back to whitespace-split if too few cols
- parse `-` as empty trace
- aggregate (addr, trace) rows by addr in doctor for the per-lock /
per-cell view
- emit multi-frame Sample.location_id in pprof_converter, with the
lock/cell pointer as the Sample's leaf and caller frames stacked
below
4 new specs cover the v3/v2 wire format, empty-trace handling, multi-
frame stack emission, and per-row sample emission. 4099 unit specs
total / 0 failures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new doctor flags that exploit the multi-frame trace data
shipped in the prior commits, plus a Mapping in our pb.gz output so
pprof stops complaining about the missing binary name.
clear doctor foo.profile/ --cumulative
Rank functions by cumulative bytes/allocs across every frame in
each trace. A function high in the call stack accrues its
callees' costs ("intToString shows up on N% of allocation paths"),
where the existing flat view only credits the leaf alloc site.
clear doctor foo.profile/ --focus=REGEX
Keep only sites whose trace touches a function matching the
pattern. Composes with the existing top-N display.
clear doctor new.profile/ --diff old.profile/
Per-function delta on alloc_space, lock contention, and MVCC
retries. Annotates with directional arrows and brief diagnoses:
"newly contended", "retries eliminated", "new retry storm",
cold->hot site lists. Computes deltas ourselves rather than
shelling to `pprof -base` so doctor stays self-contained and can
layer commentary on top of the math.
Mapping in pb.gz (src/tools/pprof.rb):
- New `add_mapping(binary:, build_id:)` registers the binary the
profile is about. pprof picks up the filename for its header
("File: litedb" replaces "Main binary filename not available").
- Locations now carry `mapping_id = primary_mapping` so symbol
metadata (has_functions / has_filenames / has_line_numbers)
propagates without per-Location plumbing.
- All three converters (heap / lock / mvcc) call add_mapping when
a binary is available; converter changes are 1 line each.
Source-line view (`pprof -list <fn>`) already worked because
Function.filename was set to source.cht; verified end-to-end.
Diff implementation notes:
- parse_alloc_for_diff keys by leaf-function name when a binary is
available, raw hex addr otherwise (so unit tests can run without
a real binary, and the diff still groups distinct sites).
- Uses the after-dir's binary as the canonical addr2line target
so before/after addresses resolve through the same symbol table.
ASLR / rebuild can shift addresses but the function name stays
stable for the same source.
- Lock diff sums (lock,trace) rows by addr first, then deltas; same
for mvcc by cell pointer.
10 new specs (6 doctor-diff scenarios, 2 pprof Mapping cases, plus
1 cumulative/focus integration each via existing CLI tests). 4108
unit specs total, 0 failures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes building on the prior round of doctor + pprof work:
1. **Runtime-function misattribution fix.** addr2line returns paths
like `/.../runtime/runtime.zig:652` for runtime frames, but our
resolver walked back through `transpiled.zig` looking for `// CLR:`
markers regardless of what file the address came from. As a result
`pprof -list entryWrapper` rendered the runtime function against
random source.cht line numbers. Resolver now only assigns
`clear_line` when addr2line's file is the user's transpiled CLEAR
build target (basename matches `._clear_tmp_*.zig`); runtime and
stdlib frames keep their actual zig file path so `pprof -list`
shows the right source. Discriminator metadata
("(discriminator 4)" appended to the file:line) handled in the
line-extract regex too.
2. **`clear doctor --peek=REGEX`.** Aggregates samples whose trace
touches the regex; reports self-bytes plus the callers (frames
above) and callees (frames below, when matched non-leaf). Mirrors
`pprof -peek`. ~80 lines using the trace data we already capture.
3. **`clear doctor --ignore=REGEX`.** Inverse of `--focus`. Drops
samples whose trace touches a matching function. Composes with
`--focus`: ignore wins on overlap so a focused-but-also-ignored
site is excluded. ~5 lines.
Also: `clear_source_path` is now is_user_zig-aware so user functions
point at `source.cht` while runtime/stdlib functions point at their
actual `.zig` file. pprof's web UI and `-list` show the right source
for each.
10 new specs (5 peek scenarios, 3 ignore scenarios, 2 misattribution
unit checks). 4118 unit specs total / 0 failures. Verified end-to-end:
`pprof -list entryWrapper` now shows runtime.zig:651 ("// 4. EXECUTE
USER CODE"), `pprof -list clearMain` shows source.cht:51.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two final pprof additions: 1. **`clear doctor --by=COL`**. Sorts/displays the heap section by one of `bytes` (default), `allocs`, `inuse_bytes`, or `inuse_allocs`. Mirrors pprof's `-sample_index=` switching for the doctor view. Reveals "this site has tons of small allocs" vs "few big ones" without re-running. Inuse columns are derived from alloc - free at parse time, so the column is available even when the runtime didn't record `live_bytes` directly. 2. **channels.txt → channels.pb.gz**. Completes the pb.gz emission set. One sample per registered channel; sample types: pushes / pops / push_blocked / pop_blocked / max_depth. Synthetic function names like `channel#7` give pprof a stable label per channel; the capacity travels as a per-sample tag (visible via `pprof -tags`). Skipped (returns nil) when channels.txt has no rows. Verified end-to-end on transpile-tests/241_open_stream_pipelines, which exercises `BG STREAM` and produces real channel telemetry — `pprof -top -sample_index=pushes channels.pb.gz` ranks the channels by push count. 6 new specs (3 channels conversion, 3 --by pivoting). 4124 unit specs total / 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #22 +/- ##
==========================================
+ Coverage 89.60% 89.85% +0.24%
==========================================
Files 170 185 +15
Lines 45996 47991 +1995
Branches 11290 11953 +663
==========================================
+ Hits 41216 43123 +1907
- Misses 4780 4868 +88
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
| Branch | pprof |
| Testbed | ubuntu-latest |
⚠️ WARNING: No Threshold found!Without a Threshold, no Alerts will ever be generated.
Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the--ci-only-thresholdsflag.
Click to view all benchmark results
| Benchmark | leak-build-ms | Measure (units) x 1e3 | leak-count | Measure (units) | leak-run-ms | Measure (units) |
|---|---|---|---|---|---|---|
| benchmarks/concurrent/02_concurrent_search/bench | 📈 view plot | 1.81 units x 1e3 | 📈 view plot | 0.00 units | 📈 view plot | 6.74 units |
| benchmarks/concurrent/07_stream_merge/bench | 📈 view plot | 1.60 units x 1e3 | 📈 view plot | 0.00 units | 📈 view plot | 34.10 units |
| benchmarks/concurrent/12_false_sharing/bench | 📈 view plot | 1.74 units x 1e3 | 📈 view plot | 0.00 units | 📈 view plot | 1,083.89 units |
| benchmarks/concurrent/19_atomic_ptr/bench | 📈 view plot | 1.78 units x 1e3 | 📈 view plot | 0.00 units | 📈 view plot | 269.81 units |
| benchmarks/inter-clear/05_concurrent_mvcc_pure_read/bench | 📈 view plot | 1.67 units x 1e3 | 📈 view plot | 0.00 units | 📈 view plot | 496.73 units |
| benchmarks/sequential/04_hashmap/bench | 📈 view plot | 1.56 units x 1e3 | 📈 view plot | 0.00 units | 📈 view plot | 2,031.81 units |
| benchmarks/sequential/09_frame_vs_heap/bench | 📈 view plot | 1.63 units x 1e3 | 📈 view plot | 0.00 units | 📈 view plot | 1,717.45 units |
| benchmarks/sequential/14_iterator/bench | 📈 view plot | 1.58 units x 1e3 | 📈 view plot | 0.00 units | 📈 view plot | 372.94 units |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.