Skip to content

Pprof#22

Open
cuzzo wants to merge 7 commits intomasterfrom
pprof
Open

Pprof#22
cuzzo wants to merge 7 commits intomasterfrom
pprof

Conversation

@cuzzo
Copy link
Copy Markdown
Owner

@cuzzo cuzzo commented May 8, 2026

No description provided.

cuzzo and others added 7 commits May 8, 2026 02:18
`clear profile` now writes heap.pb.gz, lock.pb.gz, and mvcc.pb.gz
into the .profile/ dir alongside the existing text reports. CPU is
delegated to `perf_to_profile` (the standard Go converter for
perf.data); we print a one-line install hint when it's missing.

Pure-Ruby pprof v3 encoder (`src/tools/pprof.rb`, ~190 lines) so the
feature stays stdlib-only — no `google-protobuf` C-extension dep.

Sample-type columns mirror Go's conventions so pprof's standard
flags work out of the box:
  - heap:  alloc_objects / alloc_space / inuse_objects / inuse_space
  - lock:  contentions / delay / hold / acquisitions
  - mvcc:  reads / commits / retries / cow_bytes
            (cow_bytes = struct_size * (commits + retries))

Each per-site address resolves via addr2line + the transpiler's
`// CLR:N` markers, so `pprof -list <fn>` shows CLEAR source lines.

Verified end-to-end: `pprof -top -alloc_space heap.pb.gz` reads
function names correctly (clearMain / entryWrapper / makeResultSet
/ intToString) and `-tags` shows the per-sample addr labels. 20 new
specs cover the wire encoder, dedup, gzip envelope, and per-format
column math; 4096 unit specs total, 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds tab-completion for bash, zsh, and fish. The CLI is the source
of truth for the completion script — regenerating `clear completions
bash` after a subcommand change refreshes everything.

Per-subcommand semantics:
  - build / run / fmt / fix / profile / explain  -> *.cht files
  - test / benchmark                              -> *.cht or dirs
  - doctor                                        -> *.profile/ dirs
  - completions                                   -> bash | zsh | fish

Install (bash):
  echo 'source <(clear completions bash)' >> ~/.bashrc

Install (zsh):
  mkdir -p ~/.zsh/completions
  clear completions zsh > ~/.zsh/completions/_clear
  # ensure fpath=(~/.zsh/completions \$fpath) BEFORE compinit

Install (fish):
  clear completions fish > ~/.config/fish/completions/clear.fish

17 specs cover script structure, dispatch, and CLI integration. Live-
verified on this machine: tab after `clear doctor examples/litedb/`
suggests only the .profile/ directory; tab after `clear profile
examples/litedb/` suggests the .cht file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`alloc-profile.zig` now captures up to 16 stack frames per allocation
via `std.debug.captureCurrentStackTrace` (FP-based unwinder; profile
builds keep frame pointers via omit_frame_pointer=false). Hash table
keyed by the full trace, so distinct call paths to the same helper
(intToString, makeResultSet, etc.) resolve as separate sites instead
of collapsing into one.

Format change (alloc-profile v1 -> v2): the first column of alloc.txt
goes from a single hex addr to a comma-separated leaf-first trace.
Both consumers updated to parse it; v1 single-addr files still parse
as 1-element traces.

  alloc.txt  before:  0x401234           1000  40000  500  20000  500
  alloc.txt  after:   0x401234,0x402000  1000  40000  500  20000  500

Stack budget: trace buffer is `[16]usize` = 128 B on the recorder's
stack, well within the 60 KB Large stacks `clear profile` already
forces. captureCurrentStackTrace itself is a frame-pointer walk, no
additional frames pushed. SpinLock prevents recorder reentrance, so
the existing no-allocation contract holds.

Sampling: `clear profile --sample=N` records every Nth alloc and
scales captured values by N at record time so pprof / doctor see
estimated totals. Default N=1 (no sampling). Header records
`sample_n` so consumers can flag the approximation. Useful for hot
allocators where the per-alloc unwind cost matters; otherwise leave
off for accurate counts.

Consumer updates:
- src/tools/pprof_converter.rb -- emits multi-frame Sample.location_id;
  shares Locations across samples that include the same frame (a hot
  leaf like entryWrapper appears in every trace and is interned once).
- src/tools/doctor.rb -- splits the addrs column into a trace; leaf
  addr remains the "primary site" so existing top-N output is
  unchanged but now backed by trace data for future caller-aware
  views.

End-to-end verified on examples/litedb: pprof tree shows real caller/
callee relationships (`entryWrapper -> clearMain -> intToString`)
where previously cum% always equaled flat%. 4095 unit specs / 0
failures; 2 new specs cover multi-frame parsing and Location reuse.

lock-profile and mvcc-profile remain single-frame for now; same
pattern applies and will follow as separate commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds optional per-record stack capture to lock-profile (v3) and
mvcc-profile (v2). Off by default; opt in via:

  clear profile foo.cht --sync-callstacks

When on, each (lock, caller-trace) pair becomes a distinct row, so
pprof's tree/flame views show per-caller attribution. Doctor
aggregates rows back to one-per-lock for its existing diagnoses
(no behavior change there). Same shape for (cell, caller-trace) in
mvcc-profile.

The flag is off by default because the FP walk costs ~100-500ns per
record, while uncontended mutex acquire is ~10-20ns and MVCC commit
fast paths are ~20-50ns -- the trace can dominate the operation it
measures. When --sync-callstacks is set without an explicit --sample,
we auto-default to --sample=100 to keep the cost manageable. Users
can pass --sample=1 to opt in to full capture at full cost.

FP-unwind through parking-lot verified end-to-end:
  parking_mutex_lock -> clearMain -> entryWrapper
  parking_mutex_lock -> clearMain.__DoBranchCtx0_0.run -> ...
addr2line resolves all captured frames cleanly. Frame pointers are
already retained in profile builds (omit_frame_pointer = false), and
captureCurrentStackTrace's FP walker handles parking-lot's hot/slow
paths without symbol noise.

New shared module zig/runtime/profile-trace.zig:
  - MAX_FRAMES (single source of truth, 16)
  - captureFromHere() -- thin wrapper around
    std.debug.captureCurrentStackTrace
  - syncCallstacksEnabled() -- reads CLEAR_PROFILE_SYNC_CALLSTACKS once
  - sampleStride() -- reads CLEAR_PROFILE_SAMPLE once

alloc-profile uses the shared MAX_FRAMES + captureFromHere; its
sample-stride read is unchanged so it doesn't move under us.

Wire format:
  - lock-profile v3: 12th tab-separated column `caller_trace`. `-` =
    empty (sync-callstacks off); comma-separated leaf-first addrs
    when populated.
  - mvcc-profile v2: 8th column same shape.
  Tab-separated lines so the trace field can carry commas without a
  column-count ambiguity. Older whitespace-only files still parse via
  fallback split.

Consumers (doctor.rb, pprof_converter.rb) updated to:
  - tab-split the line; fall back to whitespace-split if too few cols
  - parse `-` as empty trace
  - aggregate (addr, trace) rows by addr in doctor for the per-lock /
    per-cell view
  - emit multi-frame Sample.location_id in pprof_converter, with the
    lock/cell pointer as the Sample's leaf and caller frames stacked
    below

4 new specs cover the v3/v2 wire format, empty-trace handling, multi-
frame stack emission, and per-row sample emission. 4099 unit specs
total / 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new doctor flags that exploit the multi-frame trace data
shipped in the prior commits, plus a Mapping in our pb.gz output so
pprof stops complaining about the missing binary name.

  clear doctor foo.profile/ --cumulative
    Rank functions by cumulative bytes/allocs across every frame in
    each trace. A function high in the call stack accrues its
    callees' costs ("intToString shows up on N% of allocation paths"),
    where the existing flat view only credits the leaf alloc site.

  clear doctor foo.profile/ --focus=REGEX
    Keep only sites whose trace touches a function matching the
    pattern. Composes with the existing top-N display.

  clear doctor new.profile/ --diff old.profile/
    Per-function delta on alloc_space, lock contention, and MVCC
    retries. Annotates with directional arrows and brief diagnoses:
    "newly contended", "retries eliminated", "new retry storm",
    cold->hot site lists. Computes deltas ourselves rather than
    shelling to `pprof -base` so doctor stays self-contained and can
    layer commentary on top of the math.

Mapping in pb.gz (src/tools/pprof.rb):
  - New `add_mapping(binary:, build_id:)` registers the binary the
    profile is about. pprof picks up the filename for its header
    ("File: litedb" replaces "Main binary filename not available").
  - Locations now carry `mapping_id = primary_mapping` so symbol
    metadata (has_functions / has_filenames / has_line_numbers)
    propagates without per-Location plumbing.
  - All three converters (heap / lock / mvcc) call add_mapping when
    a binary is available; converter changes are 1 line each.

Source-line view (`pprof -list <fn>`) already worked because
Function.filename was set to source.cht; verified end-to-end.

Diff implementation notes:
  - parse_alloc_for_diff keys by leaf-function name when a binary is
    available, raw hex addr otherwise (so unit tests can run without
    a real binary, and the diff still groups distinct sites).
  - Uses the after-dir's binary as the canonical addr2line target
    so before/after addresses resolve through the same symbol table.
    ASLR / rebuild can shift addresses but the function name stays
    stable for the same source.
  - Lock diff sums (lock,trace) rows by addr first, then deltas; same
    for mvcc by cell pointer.

10 new specs (6 doctor-diff scenarios, 2 pprof Mapping cases, plus
1 cumulative/focus integration each via existing CLI tests). 4108
unit specs total, 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes building on the prior round of doctor + pprof work:

1. **Runtime-function misattribution fix.** addr2line returns paths
   like `/.../runtime/runtime.zig:652` for runtime frames, but our
   resolver walked back through `transpiled.zig` looking for `// CLR:`
   markers regardless of what file the address came from. As a result
   `pprof -list entryWrapper` rendered the runtime function against
   random source.cht line numbers. Resolver now only assigns
   `clear_line` when addr2line's file is the user's transpiled CLEAR
   build target (basename matches `._clear_tmp_*.zig`); runtime and
   stdlib frames keep their actual zig file path so `pprof -list`
   shows the right source. Discriminator metadata
   ("(discriminator 4)" appended to the file:line) handled in the
   line-extract regex too.

2. **`clear doctor --peek=REGEX`.** Aggregates samples whose trace
   touches the regex; reports self-bytes plus the callers (frames
   above) and callees (frames below, when matched non-leaf). Mirrors
   `pprof -peek`. ~80 lines using the trace data we already capture.

3. **`clear doctor --ignore=REGEX`.** Inverse of `--focus`. Drops
   samples whose trace touches a matching function. Composes with
   `--focus`: ignore wins on overlap so a focused-but-also-ignored
   site is excluded. ~5 lines.

Also: `clear_source_path` is now is_user_zig-aware so user functions
point at `source.cht` while runtime/stdlib functions point at their
actual `.zig` file. pprof's web UI and `-list` show the right source
for each.

10 new specs (5 peek scenarios, 3 ignore scenarios, 2 misattribution
unit checks). 4118 unit specs total / 0 failures. Verified end-to-end:
`pprof -list entryWrapper` now shows runtime.zig:651 ("// 4. EXECUTE
USER CODE"), `pprof -list clearMain` shows source.cht:51.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two final pprof additions:

1. **`clear doctor --by=COL`**. Sorts/displays the heap section by
   one of `bytes` (default), `allocs`, `inuse_bytes`, or
   `inuse_allocs`. Mirrors pprof's `-sample_index=` switching for the
   doctor view. Reveals "this site has tons of small allocs" vs
   "few big ones" without re-running. Inuse columns are derived from
   alloc - free at parse time, so the column is available even when
   the runtime didn't record `live_bytes` directly.

2. **channels.txt → channels.pb.gz**. Completes the pb.gz emission
   set. One sample per registered channel; sample types: pushes /
   pops / push_blocked / pop_blocked / max_depth. Synthetic function
   names like `channel#7` give pprof a stable label per channel; the
   capacity travels as a per-sample tag (visible via `pprof -tags`).
   Skipped (returns nil) when channels.txt has no rows.

Verified end-to-end on transpile-tests/241_open_stream_pipelines,
which exercises `BG STREAM` and produces real channel telemetry —
`pprof -top -sample_index=pushes channels.pb.gz` ranks the channels
by push count.

6 new specs (3 channels conversion, 3 --by pivoting). 4124 unit
specs total / 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 8, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 88.76081% with 78 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.85%. Comparing base (07b3f81) to head (9c27336).
⚠️ Report is 31 commits behind head on master.

Files with missing lines Patch % Lines
src/tools/doctor.rb 87.74% 43 Missing ⚠️
src/tools/pprof_converter.rb 82.90% 33 Missing ⚠️
src/tools/pprof.rb 98.47% 2 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #22      +/-   ##
==========================================
+ Coverage   89.60%   89.85%   +0.24%     
==========================================
  Files         170      185      +15     
  Lines       45996    47991    +1995     
  Branches    11290    11953     +663     
==========================================
+ Hits        41216    43123    +1907     
- Misses       4780     4868      +88     
Flag Coverage Δ
ruby 86.06% <88.76%> (+0.68%) ⬆️
zig 95.62% <ø> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

🐰 Bencher Report

Branchpprof
Testbedubuntu-latest

⚠️ WARNING: No Threshold found!

Without a Threshold, no Alerts will ever be generated.

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds flag.

Click to view all benchmark results
Benchmarkleak-build-msMeasure (units) x 1e3leak-countMeasure (units)leak-run-msMeasure (units)
benchmarks/concurrent/02_concurrent_search/bench📈 view plot
⚠️ NO THRESHOLD
1.81 units x 1e3📈 view plot
⚠️ NO THRESHOLD
0.00 units📈 view plot
⚠️ NO THRESHOLD
6.74 units
benchmarks/concurrent/07_stream_merge/bench📈 view plot
⚠️ NO THRESHOLD
1.60 units x 1e3📈 view plot
⚠️ NO THRESHOLD
0.00 units📈 view plot
⚠️ NO THRESHOLD
34.10 units
benchmarks/concurrent/12_false_sharing/bench📈 view plot
⚠️ NO THRESHOLD
1.74 units x 1e3📈 view plot
⚠️ NO THRESHOLD
0.00 units📈 view plot
⚠️ NO THRESHOLD
1,083.89 units
benchmarks/concurrent/19_atomic_ptr/bench📈 view plot
⚠️ NO THRESHOLD
1.78 units x 1e3📈 view plot
⚠️ NO THRESHOLD
0.00 units📈 view plot
⚠️ NO THRESHOLD
269.81 units
benchmarks/inter-clear/05_concurrent_mvcc_pure_read/bench📈 view plot
⚠️ NO THRESHOLD
1.67 units x 1e3📈 view plot
⚠️ NO THRESHOLD
0.00 units📈 view plot
⚠️ NO THRESHOLD
496.73 units
benchmarks/sequential/04_hashmap/bench📈 view plot
⚠️ NO THRESHOLD
1.56 units x 1e3📈 view plot
⚠️ NO THRESHOLD
0.00 units📈 view plot
⚠️ NO THRESHOLD
2,031.81 units
benchmarks/sequential/09_frame_vs_heap/bench📈 view plot
⚠️ NO THRESHOLD
1.63 units x 1e3📈 view plot
⚠️ NO THRESHOLD
0.00 units📈 view plot
⚠️ NO THRESHOLD
1,717.45 units
benchmarks/sequential/14_iterator/bench📈 view plot
⚠️ NO THRESHOLD
1.58 units x 1e3📈 view plot
⚠️ NO THRESHOLD
0.00 units📈 view plot
⚠️ NO THRESHOLD
372.94 units
🐰 View full continuous benchmarking report in Bencher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants