perf: Linux perf call-graph flamegraph artifact + README embed#134
Conversation
…ollow-up) Resolve the Codex review findings left on `main` by PRs #127 and #128: - PROGRESS.md: remove the stale "Next action remains" block that still steered /resume to the merged PR #125 on `perf/linux-host-artifact-refresh`; replace with the v0.2.0 between-releases state (no active milestone; #94/#90 gated). - AGENTS.md: bring it into sync with CLAUDE.md's v0.2.0 partial-PMU reframe. The constraints bullet, the "correct claim" block, and the "M29 perf evidence status" subsection no longer label the artifacts "constrained Docker validation"; the stale `perf/linux-host-artifact-refresh` follow-up line is updated (also in CLAUDE.md) to the released state. - docs/perf_analysis.md: narrow the PMU claim so it no longer implies the Apple Blizzard (E-core) PMU carries live counts. The `apple_blizzard_pmu/...` rows read `<not counted>` in results/perf_stat_linux.txt because the single-threaded benchmark stays on the Avalanche P-cores — expected scheduling, not a counter. Docs/memory only; no code or artifacts changed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32. The perf stat/record text workflow already existed; this renders a perf call-graph flamegraph. - scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a results/flamegraph.txt provenance/classification companion (top folded stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs unwind correctly despite the Release `bench` preset omitting frame pointers. - scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG renderer, so the artifact is reproducible from the repo without vendoring the Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure function of the name, no RNG/timestamps in the drawn body. - tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, determinism, empty-input handling. - docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md, AGENTS.md), MILESTONES.md backlog, PROGRESS.md log. `make check` 242/242. Full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph` from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a latency/throughput claim; full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address two Codex review findings in scripts/flamegraph.py::parse_collapsed: - Prefer a tab separator when present so a tab-separated folded line whose stack contains spaces (C++ signatures) splits on the trailing count instead of an interior space and is silently dropped. - Ignore non-positive sample counts, so hand-crafted --from-collapsed input with 0/negative counts cannot render a misleading SVG (all-non-positive input now fails with exit 1 via the existing empty-folded guard). Adds test coverage in tests/shell/test_flamegraph.sh (19/19). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
flamegraph.py is a provenance input, so regenerate results/flamegraph.svg + .txt from the clean tree to keep the Source digest consistent (423 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested complexity). Restructure without changing output: - fold_perf_script: move per-line state into a small _Folder helper so the parsing loop is a flat if/elif/else instead of a nested block. - render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was 7) and extract _append_chrome, _frame_svg, _truncate; geometry constants (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass carries derived geometry. Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt from the clean tree (402 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Clear the last two CodeScene flags on scripts/flamegraph.py: - _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings never contain parens, so a non-nested " (...)$" match is exact and won't strip a C++ signature's own parentheses. - _layout: drop the unused `total` parameter (5 args -> 4). Output unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Provenance input changed; regenerate from clean tree (416 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The decision-log entry and bottom "Next action remains" block already covered this follow-up, but the top `## Current state` bullets — the first thing `/resume` reads — still presented the v0.2.0 release (PR #127) as the "Last action". A resuming agent could therefore miss that this resume-anchor / PMU sync already happened and duplicate it. Record the sync as the current "Last action" and "Last completed docs sync", demoting the v0.2.0 release detail to a "Prior action" bullet. Docs-only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address five Codex review findings on the flamegraph driver: 1. Classify `zero-sized data` (perf script's no-sample report) as a perf limitation, matching scripts/perf_record.sh, so the documented QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping the unexpected-failure exit. 2. Remove any prior results/flamegraph.svg when a partial run captures no folded stacks, so a constrained rerun cannot leave a previous host's SVG beside a .txt that says there is no sample report. 3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the minimum-sample gate on the authoritative folded sample total rather than perf record's self-described estimate. Report both counts. 4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`; a renderer/parser failure now exits 4 (unmaskable) rather than being published as a constrained-environment artifact. 5. Derive the sampling-kind label/caveat from the selected event (software cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment, and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded samples / 159 stacks, classified `flamegraph (software cpu-clock sampling hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the hardened scripts/flamegraph.sh; the .txt reports both the folded total and perf record's estimate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… profile (Codex #129) The constraints bullet labeled all perf artifacts as partial hardware PMU evidence, but only results/perf_stat_linux.txt carries real PMU counters (cycles/instructions/branches/branch-misses). results/perf_report_linux.txt is a software cpu-clock sampling profile, not PMU evidence. Scope the claim to the perf-stat artifact and call out perf-record separately, identically in AGENTS.md and CLAUDE.md so the two memories stay in sync. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The flamegraph artifact, generator, provenance companion, and docs already existed but no page actually displayed the SVG — it was only referenced by filename. Embed the rendered results/flamegraph.svg as a visible image under the Benchmarks section, with a caption that classifies it honestly as a software cpu-clock sampling hot-symbol profile (not PMU evidence), names the hot frames, and links the provenance .txt and docs/perf_analysis.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 42 minutes and 53 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (13)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
# Conflicts: # PROGRESS.md
There was a problem hiding this comment.
Gates Passed
6 Quality Gates Passed
See analysis details in CodeScene
Quality Gate Profile: Pay Down Tech Debt
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5093beb518
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| echo "Record status: $RECORD_STATUS" | ||
| echo "Script status: $SCRIPT_STATUS" | ||
| echo "Perf access limitation: $PERF_LIMITATION" | ||
| echo "Flamegraph SVG: $(qsl_repo_relative_or_empty "$OUT_SVG")" |
There was a problem hiding this comment.
Report missing partial SVGs as absent
When the documented constrained path is used (QSL_PERF_ALLOW_PARTIAL=1) and perf captures zero folded stacks, the branch above deletes $OUT_SVG, but the text artifact still records Flamegraph SVG: results/flamegraph.svg. That produces a successful partial report whose metadata points readers or scripts at a file the script deliberately removed; gate this line on STACK_COUNT > 0 or emit an explicit “not generated” value.
Useful? React with 👍 / 👎.
| --countname "$EVENT samples" \ | ||
| --from-collapsed <"$FOLDED" | tail -n +2 | ||
| } >"$SVG_TMP" | ||
| qsl_publish_artifact "$SVG_TMP" "$OUT_SVG" |
There was a problem hiding this comment.
Avoid overwriting clean flamegraphs before the sample gate
When perf captures some stacks but fewer than QSL_FLAMEGRAPH_MIN_SAMPLES and QSL_PERF_ALLOW_PARTIAL is not set, this publishes $OUT_SVG before the final gate exits with an error. A failed make flamegraph on an under-sampled host can therefore overwrite the committed SVG with an artifact the script itself rejects; defer publishing until after the sample/partial-mode gate, or keep rejected output in temp diagnostics.
Useful? React with 👍 / 👎.
| case "$EVENT" in | ||
| cpu-clock | task-clock) SAMPLE_KIND="software $EVENT sampling" ;; | ||
| *) SAMPLE_KIND="$EVENT hardware-PMU sampling" ;; |
There was a problem hiding this comment.
Do not label all custom perf events as PMU samples
When callers set QSL_FLAMEGRAPH_EVENT to a software event with a modifier (for example cpu-clock:u) or another perf software event such as page-faults, this case falls through to the hardware-PMU label. The perf_event_open(2) docs list PERF_COUNT_SW_CPU_CLOCK and page-fault/context-switch counters as software events, so the generated artifact can overclaim PMU evidence even though the sampling source is still software; classify known software events/modifiers separately or use a neutral custom-event label.
Useful? React with 👍 / 👎.
Supersedes #130, which GitHub auto-closed when its base branch
docs/codex-resume-anchor-syncwas deleted on the #129 squash-merge. Same branch (perf/flamegraph-artifact), same content, now retargeted tomain.Summary
Closes #32 — the missing-flamegraph follow-up. PR #89 landed the
perf stat/perf record/ textperf report --stdioworkflow but did not produce a flamegraph; this addsmake flamegraphand commits a real artifact generated on the bare-metal Apple M2 (aarch64) Fedora Asahi host.What's added
scripts/flamegraph.py— a dependency-free (Python stdlib only)stackcollapse+ SVG flamegraph renderer. Deliberately not vendoring Brendan Gregg's Perl FlameGraph toolkit, so the artifact is reproducible from this repo alone. Deterministic by construction: frames sorted by name, colors a pure function of the frame name, no RNG and no timestamps in the drawn body.scripts/flamegraph.sh— driver modeled onperf_record.sh: Linux-only, recordsperf record --call-graph dwarf -F 4000 -g -e cpu-clockonqsl-bench, reusesqsl_common.shprovenance +qsl_publish_artifact, and honorsQSL_PERF_ALLOW_PARTIALfor constrained hosts. DWARF call graphs are used because the Releasebenchpreset omits frame pointers; application symbols still resolve from the symbol table without changing the optimization level under measurement.make flamegraphtarget (+.PHONY, command lists in CLAUDE.md/AGENTS.md).tests/shell/test_flamegraph.sh— CTest-registered (python3-only; skips cleanly if absent), 16 assertions over folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, render determinism, and empty-input handling.docs/perf_analysis.md(flamegraph section + artifact entry),results/README.md,MILESTONES.mdbacklog,PROGRESS.mdlog.Committed artifact
results/flamegraph.svg(+results/flamegraph.txtprovenance/classification companion) generated from the clean committed tree: 397 cpu-clock samples, 171 folded stacks,Dirty inputs: no, well-formed XML. Hot paths resolve to real engine symbols —OrderBook::modify/cancel/add_limit, thedispatch_storagecancel path,decode_new_order, the gatewaySessionpath,replay::generate_flow.Honesty / scope
This is a software cpu-clock sampling hot-symbol profile (frame width ∝ on-CPU samples), not a latency or throughput measurement, and is hardware/build dependent. Full hardware cache-PMU evidence remains tracked by #90.
Verification
make check→ 242/242 (includes the newqsl_flamegraph_renderCTest).make flamegraphregenerates the artifact withDirty inputs: no; SVG validated well-formed viaxml.dom.minidom; geometry checked (root frame spans the plot; no NaN/negative widths);git grepconfirms no non-broadcast MAC leaked into the artifact.bash -n scripts/flamegraph.shclean.🤖 Generated with Claude Code