Skip to content

perf: Linux perf call-graph flamegraph artifact + README embed#134

Merged
div0rce merged 17 commits into
mainfrom
perf/flamegraph-artifact
Jun 22, 2026
Merged

perf: Linux perf call-graph flamegraph artifact + README embed#134
div0rce merged 17 commits into
mainfrom
perf/flamegraph-artifact

Conversation

@div0rce

@div0rce div0rce commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Supersedes #130, which GitHub auto-closed when its base branch docs/codex-resume-anchor-sync was deleted on the #129 squash-merge. Same branch (perf/flamegraph-artifact), same content, now retargeted to main.


Summary

Closes #32 — the missing-flamegraph follow-up. PR #89 landed the perf stat / perf record / text perf report --stdio workflow but did not produce a flamegraph; this adds make flamegraph and commits a real artifact generated on the bare-metal Apple M2 (aarch64) Fedora Asahi host.

Stacked PR. Based on docs/codex-resume-anchor-sync (#129) so the diff is clean; merge #129 first and GitHub will retarget this to main.

What's added

  • scripts/flamegraph.py — a dependency-free (Python stdlib only) stackcollapse + SVG flamegraph renderer. Deliberately not vendoring Brendan Gregg's Perl FlameGraph toolkit, so the artifact is reproducible from this repo alone. Deterministic by construction: frames sorted by name, colors a pure function of the frame name, no RNG and no timestamps in the drawn body.
  • scripts/flamegraph.sh — driver modeled on perf_record.sh: Linux-only, records perf record --call-graph dwarf -F 4000 -g -e cpu-clock on qsl-bench, reuses qsl_common.sh provenance + qsl_publish_artifact, and honors QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs are used because the Release bench preset omits frame pointers; application symbols still resolve from the symbol table without changing the optimization level under measurement.
  • make flamegraph target (+ .PHONY, command lists in CLAUDE.md/AGENTS.md).
  • tests/shell/test_flamegraph.sh — CTest-registered (python3-only; skips cleanly if absent), 16 assertions over folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, render determinism, and empty-input handling.
  • Docs: docs/perf_analysis.md (flamegraph section + artifact entry), results/README.md, MILESTONES.md backlog, PROGRESS.md log.

Committed artifact

results/flamegraph.svg (+ results/flamegraph.txt provenance/classification companion) generated from the clean committed tree: 397 cpu-clock samples, 171 folded stacks, Dirty inputs: no, well-formed XML. Hot paths resolve to real engine symbols — OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow.

Honesty / scope

This is a software cpu-clock sampling hot-symbol profile (frame width ∝ on-CPU samples), not a latency or throughput measurement, and is hardware/build dependent. Full hardware cache-PMU evidence remains tracked by #90.

Verification

  • make check242/242 (includes the new qsl_flamegraph_render CTest).
  • make flamegraph regenerates the artifact with Dirty inputs: no; SVG validated well-formed via xml.dom.minidom; geometry checked (root frame spans the plot; no NaN/negative widths); git grep confirms no non-broadcast MAC leaked into the artifact.
  • bash -n scripts/flamegraph.sh clean.

🤖 Generated with Claude Code

div0rce and others added 16 commits June 21, 2026 02:23
…ollow-up)

Resolve the Codex review findings left on `main` by PRs #127 and #128:

- PROGRESS.md: remove the stale "Next action remains" block that still steered
  /resume to the merged PR #125 on `perf/linux-host-artifact-refresh`; replace
  with the v0.2.0 between-releases state (no active milestone; #94/#90 gated).
- AGENTS.md: bring it into sync with CLAUDE.md's v0.2.0 partial-PMU reframe. The
  constraints bullet, the "correct claim" block, and the "M29 perf evidence
  status" subsection no longer label the artifacts "constrained Docker
  validation"; the stale `perf/linux-host-artifact-refresh` follow-up line is
  updated (also in CLAUDE.md) to the released state.
- docs/perf_analysis.md: narrow the PMU claim so it no longer implies the Apple
  Blizzard (E-core) PMU carries live counts. The `apple_blizzard_pmu/...` rows
  read `<not counted>` in results/perf_stat_linux.txt because the single-threaded
  benchmark stays on the Avalanche P-cores — expected scheduling, not a counter.

Docs/memory only; no code or artifacts changed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32.
The perf stat/record text workflow already existed; this renders a perf
call-graph flamegraph.

- scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e
  cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a
  results/flamegraph.txt provenance/classification companion (top folded
  stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance
  + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained
  hosts. DWARF call graphs unwind correctly despite the Release `bench` preset
  omitting frame pointers.
- scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG
  renderer, so the artifact is reproducible from the repo without vendoring the
  Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure
  function of the name, no RNG/timestamps in the drawn body.
- tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly
  if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base,
  count aggregation, sortedness), SVG well-formedness, XML escaping,
  determinism, empty-input handling.
- docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md,
  AGENTS.md), MILESTONES.md backlog, PROGRESS.md log.

`make check` 242/242. Full hardware cache-PMU evidence stays in #90.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph`
from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi
host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot
paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the
dispatch_storage cancel path, decode_new_order, the gateway Session path,
replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a
latency/throughput claim; full hardware cache-PMU evidence stays in #90.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address two Codex review findings in scripts/flamegraph.py::parse_collapsed:

- Prefer a tab separator when present so a tab-separated folded line whose stack
  contains spaces (C++ signatures) splits on the trailing count instead of an
  interior space and is silently dropped.
- Ignore non-positive sample counts, so hand-crafted --from-collapsed input with
  0/negative counts cannot render a misleading SVG (all-non-positive input now
  fails with exit 1 via the existing empty-folded guard).

Adds test coverage in tests/shell/test_flamegraph.sh (19/19).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
flamegraph.py is a provenance input, so regenerate results/flamegraph.svg +
.txt from the clean tree to keep the Source digest consistent (423 samples,
Dirty inputs: no).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large
Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested
complexity). Restructure without changing output:

- fold_perf_script: move per-line state into a small _Folder helper so the
  parsing loop is a flat if/elif/else instead of a nested block.
- render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was
  7) and extract _append_chrome, _frame_svg, _truncate; geometry constants
  (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass
  carries derived geometry.

Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt
from the clean tree (402 samples, Dirty inputs: no).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Clear the last two CodeScene flags on scripts/flamegraph.py:

- _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a
  flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings
  never contain parens, so a non-nested " (...)$" match is exact and won't strip
  a C++ signature's own parentheses.
- _layout: drop the unused `total` parameter (5 args -> 4).

Output unchanged; tests/shell/test_flamegraph.sh 19/19.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Provenance input changed; regenerate from clean tree (416 samples, Dirty
inputs: no).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The decision-log entry and bottom "Next action remains" block already
covered this follow-up, but the top `## Current state` bullets — the first
thing `/resume` reads — still presented the v0.2.0 release (PR #127) as the
"Last action". A resuming agent could therefore miss that this resume-anchor
/ PMU sync already happened and duplicate it.

Record the sync as the current "Last action" and "Last completed docs sync",
demoting the v0.2.0 release detail to a "Prior action" bullet. Docs-only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address five Codex review findings on the flamegraph driver:

1. Classify `zero-sized data` (perf script's no-sample report) as a perf
   limitation, matching scripts/perf_record.sh, so the documented
   QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping
   the unexpected-failure exit.
2. Remove any prior results/flamegraph.svg when a partial run captures no
   folded stacks, so a constrained rerun cannot leave a previous host's SVG
   beside a .txt that says there is no sample report.
3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the
   minimum-sample gate on the authoritative folded sample total rather than
   perf record's self-described estimate. Report both counts.
4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`;
   a renderer/parser failure now exits 4 (unmaskable) rather than being
   published as a constrained-environment artifact.
5. Derive the sampling-kind label/caveat from the selected event (software
   cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment,
   and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded
samples / 159 stacks, classified `flamegraph (software cpu-clock sampling
hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the
hardened scripts/flamegraph.sh; the .txt reports both the folded total and
perf record's estimate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… profile (Codex #129)

The constraints bullet labeled all perf artifacts as partial hardware PMU
evidence, but only results/perf_stat_linux.txt carries real PMU counters
(cycles/instructions/branches/branch-misses). results/perf_report_linux.txt
is a software cpu-clock sampling profile, not PMU evidence. Scope the claim
to the perf-stat artifact and call out perf-record separately, identically in
AGENTS.md and CLAUDE.md so the two memories stay in sync.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The flamegraph artifact, generator, provenance companion, and docs already
existed but no page actually displayed the SVG — it was only referenced by
filename. Embed the rendered results/flamegraph.svg as a visible image under
the Benchmarks section, with a caption that classifies it honestly as a
software cpu-clock sampling hot-symbol profile (not PMU evidence), names the
hot frames, and links the provenance .txt and docs/perf_analysis.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@div0rce, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 42 minutes and 53 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 64e72000-ad03-4b95-94a9-49484c48ecf9

📥 Commits

Reviewing files that changed from the base of the PR and between 94fb544 and b8351de.

⛔ Files ignored due to path filters (1)
  • results/flamegraph.svg is excluded by !**/*.svg
📒 Files selected for processing (13)
  • AGENTS.md
  • CLAUDE.md
  • MILESTONES.md
  • Makefile
  • PROGRESS.md
  • README.md
  • docs/perf_analysis.md
  • results/README.md
  • results/flamegraph.txt
  • scripts/flamegraph.py
  • scripts/flamegraph.sh
  • tests/CMakeLists.txt
  • tests/shell/test_flamegraph.sh
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/flamegraph-artifact

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

codescene-delta-analysis[bot]

This comment was marked as outdated.

@div0rce div0rce merged commit 31d16f6 into main Jun 22, 2026
7 of 8 checks passed

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gates Passed
6 Quality Gates Passed

See analysis details in CodeScene

Quality Gate Profile: Pay Down Tech Debt
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.

@div0rce div0rce deleted the perf/flamegraph-artifact branch June 22, 2026 16:54

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5093beb518

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/flamegraph.sh
echo "Record status: $RECORD_STATUS"
echo "Script status: $SCRIPT_STATUS"
echo "Perf access limitation: $PERF_LIMITATION"
echo "Flamegraph SVG: $(qsl_repo_relative_or_empty "$OUT_SVG")"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Report missing partial SVGs as absent

When the documented constrained path is used (QSL_PERF_ALLOW_PARTIAL=1) and perf captures zero folded stacks, the branch above deletes $OUT_SVG, but the text artifact still records Flamegraph SVG: results/flamegraph.svg. That produces a successful partial report whose metadata points readers or scripts at a file the script deliberately removed; gate this line on STACK_COUNT > 0 or emit an explicit “not generated” value.

Useful? React with 👍 / 👎.

Comment thread scripts/flamegraph.sh
--countname "$EVENT samples" \
--from-collapsed <"$FOLDED" | tail -n +2
} >"$SVG_TMP"
qsl_publish_artifact "$SVG_TMP" "$OUT_SVG"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid overwriting clean flamegraphs before the sample gate

When perf captures some stacks but fewer than QSL_FLAMEGRAPH_MIN_SAMPLES and QSL_PERF_ALLOW_PARTIAL is not set, this publishes $OUT_SVG before the final gate exits with an error. A failed make flamegraph on an under-sampled host can therefore overwrite the committed SVG with an artifact the script itself rejects; defer publishing until after the sample/partial-mode gate, or keep rejected output in temp diagnostics.

Useful? React with 👍 / 👎.

Comment thread scripts/flamegraph.sh
Comment on lines +150 to +152
case "$EVENT" in
cpu-clock | task-clock) SAMPLE_KIND="software $EVENT sampling" ;;
*) SAMPLE_KIND="$EVENT hardware-PMU sampling" ;;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not label all custom perf events as PMU samples

When callers set QSL_FLAMEGRAPH_EVENT to a software event with a modifier (for example cpu-clock:u) or another perf software event such as page-faults, this case falls through to the hardware-PMU label. The perf_event_open(2) docs list PERF_COUNT_SW_CPU_CLOCK and page-fault/context-switch counters as software events, so the generated artifact can overclaim PMU evidence even though the sampling source is still software; classify known software events/modifiers separately or use a neutral custom-event label.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Follow-up: generate flamegraph profiling artifact

1 participant