perf: Linux perf call-graph flamegraph artifact + README embed by div0rce · Pull Request #134 · div0rce/quant-systems-lab

div0rce · 2026-06-22T16:45:27Z

Supersedes #130, which GitHub auto-closed when its base branch docs/codex-resume-anchor-sync was deleted on the #129 squash-merge. Same branch (perf/flamegraph-artifact), same content, now retargeted to main.

Summary

Closes #32 — the missing-flamegraph follow-up. PR #89 landed the perf stat / perf record / text perf report --stdio workflow but did not produce a flamegraph; this adds make flamegraph and commits a real artifact generated on the bare-metal Apple M2 (aarch64) Fedora Asahi host.

Stacked PR. Based on docs/codex-resume-anchor-sync (#129) so the diff is clean; merge #129 first and GitHub will retarget this to main.

What's added

scripts/flamegraph.py — a dependency-free (Python stdlib only) stackcollapse + SVG flamegraph renderer. Deliberately not vendoring Brendan Gregg's Perl FlameGraph toolkit, so the artifact is reproducible from this repo alone. Deterministic by construction: frames sorted by name, colors a pure function of the frame name, no RNG and no timestamps in the drawn body.
scripts/flamegraph.sh — driver modeled on perf_record.sh: Linux-only, records perf record --call-graph dwarf -F 4000 -g -e cpu-clock on qsl-bench, reuses qsl_common.sh provenance + qsl_publish_artifact, and honors QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs are used because the Release bench preset omits frame pointers; application symbols still resolve from the symbol table without changing the optimization level under measurement.
make flamegraph target (+ .PHONY, command lists in CLAUDE.md/AGENTS.md).
tests/shell/test_flamegraph.sh — CTest-registered (python3-only; skips cleanly if absent), 16 assertions over folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, render determinism, and empty-input handling.
Docs: docs/perf_analysis.md (flamegraph section + artifact entry), results/README.md, MILESTONES.md backlog, PROGRESS.md log.

Committed artifact

results/flamegraph.svg (+ results/flamegraph.txt provenance/classification companion) generated from the clean committed tree: 397 cpu-clock samples, 171 folded stacks, Dirty inputs: no, well-formed XML. Hot paths resolve to real engine symbols — OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow.

Honesty / scope

This is a software cpu-clock sampling hot-symbol profile (frame width ∝ on-CPU samples), not a latency or throughput measurement, and is hardware/build dependent. Full hardware cache-PMU evidence remains tracked by #90.

Verification

make check → 242/242 (includes the new qsl_flamegraph_render CTest).
make flamegraph regenerates the artifact with Dirty inputs: no; SVG validated well-formed via xml.dom.minidom; geometry checked (root frame spans the plot; no NaN/negative widths); git grep confirms no non-broadcast MAC leaked into the artifact.
bash -n scripts/flamegraph.sh clean.

🤖 Generated with Claude Code

…ollow-up) Resolve the Codex review findings left on `main` by PRs #127 and #128: - PROGRESS.md: remove the stale "Next action remains" block that still steered /resume to the merged PR #125 on `perf/linux-host-artifact-refresh`; replace with the v0.2.0 between-releases state (no active milestone; #94/#90 gated). - AGENTS.md: bring it into sync with CLAUDE.md's v0.2.0 partial-PMU reframe. The constraints bullet, the "correct claim" block, and the "M29 perf evidence status" subsection no longer label the artifacts "constrained Docker validation"; the stale `perf/linux-host-artifact-refresh` follow-up line is updated (also in CLAUDE.md) to the released state. - docs/perf_analysis.md: narrow the PMU claim so it no longer implies the Apple Blizzard (E-core) PMU carries live counts. The `apple_blizzard_pmu/...` rows read `<not counted>` in results/perf_stat_linux.txt because the single-threaded benchmark stays on the Avalanche P-cores — expected scheduling, not a counter. Docs/memory only; no code or artifacts changed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32. The perf stat/record text workflow already existed; this renders a perf call-graph flamegraph. - scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a results/flamegraph.txt provenance/classification companion (top folded stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs unwind correctly despite the Release `bench` preset omitting frame pointers. - scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG renderer, so the artifact is reproducible from the repo without vendoring the Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure function of the name, no RNG/timestamps in the drawn body. - tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, determinism, empty-input handling. - docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md, AGENTS.md), MILESTONES.md backlog, PROGRESS.md log. `make check` 242/242. Full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph` from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a latency/throughput claim; full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Address two Codex review findings in scripts/flamegraph.py::parse_collapsed: - Prefer a tab separator when present so a tab-separated folded line whose stack contains spaces (C++ signatures) splits on the trailing count instead of an interior space and is silently dropped. - Ignore non-positive sample counts, so hand-crafted --from-collapsed input with 0/negative counts cannot render a misleading SVG (all-non-positive input now fails with exit 1 via the existing empty-folded guard). Adds test coverage in tests/shell/test_flamegraph.sh (19/19). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

flamegraph.py is a provenance input, so regenerate results/flamegraph.svg + .txt from the clean tree to keep the Source digest consistent (423 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested complexity). Restructure without changing output: - fold_perf_script: move per-line state into a small _Folder helper so the parsing loop is a flat if/elif/else instead of a nested block. - render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was 7) and extract _append_chrome, _frame_svg, _truncate; geometry constants (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass carries derived geometry. Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt from the clean tree (402 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Clear the last two CodeScene flags on scripts/flamegraph.py: - _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings never contain parens, so a non-nested " (...)$" match is exact and won't strip a C++ signature's own parentheses. - _layout: drop the unused `total` parameter (5 args -> 4). Output unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provenance input changed; regenerate from clean tree (416 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The decision-log entry and bottom "Next action remains" block already covered this follow-up, but the top `## Current state` bullets — the first thing `/resume` reads — still presented the v0.2.0 release (PR #127) as the "Last action". A resuming agent could therefore miss that this resume-anchor / PMU sync already happened and duplicate it. Record the sync as the current "Last action" and "Last completed docs sync", demoting the v0.2.0 release detail to a "Prior action" bullet. Docs-only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ifact

Address five Codex review findings on the flamegraph driver: 1. Classify `zero-sized data` (perf script's no-sample report) as a perf limitation, matching scripts/perf_record.sh, so the documented QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping the unexpected-failure exit. 2. Remove any prior results/flamegraph.svg when a partial run captures no folded stacks, so a constrained rerun cannot leave a previous host's SVG beside a .txt that says there is no sample report. 3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the minimum-sample gate on the authoritative folded sample total rather than perf record's self-described estimate. Report both counts. 4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`; a renderer/parser failure now exits 4 (unmaskable) rather than being published as a constrained-environment artifact. 5. Derive the sampling-kind label/caveat from the selected event (software cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment, and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded samples / 159 stacks, classified `flamegraph (software cpu-clock sampling hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the hardened scripts/flamegraph.sh; the .txt reports both the folded total and perf record's estimate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… profile (Codex #129) The constraints bullet labeled all perf artifacts as partial hardware PMU evidence, but only results/perf_stat_linux.txt carries real PMU counters (cycles/instructions/branches/branch-misses). results/perf_report_linux.txt is a software cpu-clock sampling profile, not PMU evidence. Scope the claim to the perf-stat artifact and call out perf-record separately, identically in AGENTS.md and CLAUDE.md so the two memories stay in sync. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ifact

The flamegraph artifact, generator, provenance companion, and docs already existed but no page actually displayed the SVG — it was only referenced by filename. Embed the rendered results/flamegraph.svg as a visible image under the Benchmarks section, with a caption that classifies it honestly as a software cpu-clock sampling hot-symbol profile (not PMU evidence), names the hot frames, and links the provenance .txt and docs/perf_analysis.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-22T16:45:41Z

Warning

Review limit reached

@div0rce, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 42 minutes and 53 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 64e72000-ad03-4b95-94a9-49484c48ecf9

📥 Commits

Reviewing files that changed from the base of the PR and between 94fb544 and b8351de.

⛔ Files ignored due to path filters (1)

results/flamegraph.svg is excluded by !**/*.svg

📒 Files selected for processing (13)

AGENTS.md
CLAUDE.md
MILESTONES.md
Makefile
PROGRESS.md
README.md
docs/perf_analysis.md
results/README.md
results/flamegraph.txt
scripts/flamegraph.py
scripts/flamegraph.sh
tests/CMakeLists.txt
tests/shell/test_flamegraph.sh

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/flamegraph-artifact

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

# Conflicts: # PROGRESS.md

codescene-delta-analysis

Gates Passed
6 Quality Gates Passed

See analysis details in CodeScene

Quality Gate Profile: Pay Down Tech Debt
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5093beb518

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-22T16:55:30Z

+    echo "Record status: $RECORD_STATUS"
+    echo "Script status: $SCRIPT_STATUS"
+    echo "Perf access limitation: $PERF_LIMITATION"
+    echo "Flamegraph SVG: $(qsl_repo_relative_or_empty "$OUT_SVG")"


Report missing partial SVGs as absent

When the documented constrained path is used (QSL_PERF_ALLOW_PARTIAL=1) and perf captures zero folded stacks, the branch above deletes $OUT_SVG, but the text artifact still records Flamegraph SVG: results/flamegraph.svg. That produces a successful partial report whose metadata points readers or scripts at a file the script deliberately removed; gate this line on STACK_COUNT > 0 or emit an explicit “not generated” value.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-22T16:55:30Z

+            --countname "$EVENT samples" \
+            --from-collapsed <"$FOLDED" | tail -n +2
+    } >"$SVG_TMP"
+    qsl_publish_artifact "$SVG_TMP" "$OUT_SVG"


Avoid overwriting clean flamegraphs before the sample gate

When perf captures some stacks but fewer than QSL_FLAMEGRAPH_MIN_SAMPLES and QSL_PERF_ALLOW_PARTIAL is not set, this publishes $OUT_SVG before the final gate exits with an error. A failed make flamegraph on an under-sampled host can therefore overwrite the committed SVG with an artifact the script itself rejects; defer publishing until after the sample/partial-mode gate, or keep rejected output in temp diagnostics.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-22T16:55:30Z

+case "$EVENT" in
+cpu-clock | task-clock) SAMPLE_KIND="software $EVENT sampling" ;;
+*) SAMPLE_KIND="$EVENT hardware-PMU sampling" ;;


Do not label all custom perf events as PMU samples

When callers set QSL_FLAMEGRAPH_EVENT to a software event with a modifier (for example cpu-clock:u) or another perf software event such as page-faults, this case falls through to the hardware-PMU label. The perf_event_open(2) docs list PERF_COUNT_SW_CPU_CLOCK and page-fault/context-switch counters as software events, so the generated artifact can overclaim PMU evidence even though the sampling source is still software; classify known software events/modifiers separately or use a neutral custom-event label.

Useful? React with 👍 / 👎.

div0rce and others added 16 commits June 21, 2026 02:23

perf: regenerate flamegraph artifact after parser hardening

0201d54

flamegraph.py is a provenance input, so regenerate results/flamegraph.svg + .txt from the clean tree to keep the Source digest consistent (423 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

perf: regenerate flamegraph artifact after code-health refactor

d4be2da

flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt from the clean tree (402 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

perf: regenerate flamegraph artifact after complexity flattening

3905059

Provenance input changed; regenerate from clean tree (416 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Merge branch 'docs/codex-resume-anchor-sync' into perf/flamegraph-art…

6ef5015

…ifact

Merge branch 'docs/codex-resume-anchor-sync' into perf/flamegraph-art…

2199820

…ifact

This comment was marked as outdated.

Sign in to view

Merge remote-tracking branch 'origin/main' into perf/flamegraph-artifact

b8351de

# Conflicts: # PROGRESS.md

div0rce merged commit 31d16f6 into main Jun 22, 2026
7 of 8 checks passed

codescene-delta-analysis Bot approved these changes Jun 22, 2026

View reviewed changes

div0rce deleted the perf/flamegraph-artifact branch June 22, 2026 16:54

chatgpt-codex-connector Bot reviewed Jun 22, 2026

View reviewed changes

chatgpt-codex-connector Bot mentioned this pull request Jun 22, 2026

release: v0.2.1 — FIX text protocol adapter (#29), perf flamegraph (#32), anchor sweep #133

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Linux perf call-graph flamegraph artifact + README embed#134

perf: Linux perf call-graph flamegraph artifact + README embed#134
div0rce merged 17 commits into
mainfrom
perf/flamegraph-artifact

div0rce commented Jun 22, 2026

Uh oh!

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

Review limit reached

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

codescene-delta-analysis Bot left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

div0rce commented Jun 22, 2026

Summary

What's added

Committed artifact

Honesty / scope

Verification

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading