perf: generate perf call-graph flamegraph artifact (#32)#130
Conversation
Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32. The perf stat/record text workflow already existed; this renders a perf call-graph flamegraph. - scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a results/flamegraph.txt provenance/classification companion (top folded stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs unwind correctly despite the Release `bench` preset omitting frame pointers. - scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG renderer, so the artifact is reproducible from the repo without vendoring the Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure function of the name, no RNG/timestamps in the drawn body. - tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, determinism, empty-input handling. - docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md, AGENTS.md), MILESTONES.md backlog, PROGRESS.md log. `make check` 242/242. Full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph` from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a latency/throughput claim; full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: beec2d0c11
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if grep -Eiq 'No samples|failed to open|Permission denied|Operation not permitted|perf_event_open|not supported|Operation not supported|perf not found for kernel|linux-tools' \ | ||
| "$RECORD_ERR" "$SCRIPT_ERR"; then |
There was a problem hiding this comment.
Classify zero-sized perf data as partial
In no-sample environments perf script can report zero-sized data; the existing scripts/perf_record.sh wrapper treats that as a perf limitation, but this grep omits it. In that scenario SCRIPT_STATUS is nonzero, PERF_LIMITATION stays no, and the later failure path exits 3 even when QSL_PERF_ALLOW_PARTIAL=1, so the documented constrained-host artifact path does not work. Please include the same no-sample diagnostic here.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 31070b1: zero-sized data is now in the PERF_LIMITATION grep (matching scripts/perf_record.sh), so a no-sample capture is classified as a perf limitation and the QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping the unexpected-failure exit.
| SUBTITLE="$ARTIFACT_TYPE | $HOST | $EVENT @ ${FREQ}Hz | ${SAMPLE_COUNT} samples | ${STACK_COUNT} stacks | $DATE" | ||
|
|
||
| # Render the SVG (deterministic for a fixed folded input + fixed subtitle). | ||
| if [[ "$STACK_COUNT" -gt 0 ]]; then |
There was a problem hiding this comment.
Remove stale SVGs on zero-stack partial runs
When a partial run captures no folded stacks, this branch skips writing OUT_SVG but leaves any existing results/flamegraph.svg in place. A constrained rerun can therefore publish a fresh .txt saying there is no clean sample report while the old SVG from a previous host remains beside it and can be committed as if it matched. Please either replace/remove the SVG on zero-stack runs or emit a placeholder that matches the .txt classification.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 31070b1: a zero-stack run now rm -f "$OUT_SVG", so a constrained rerun can't leave a previous host's SVG beside a .txt that reports no sample report.
Address two Codex review findings in scripts/flamegraph.py::parse_collapsed: - Prefer a tab separator when present so a tab-separated folded line whose stack contains spaces (C++ signatures) splits on the trailing count instead of an interior space and is silently dropped. - Ignore non-positive sample counts, so hand-crafted --from-collapsed input with 0/negative counts cannot render a misleading SVG (all-non-positive input now fails with exit 1 via the existing empty-folded guard). Adds test coverage in tests/shell/test_flamegraph.sh (19/19). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
flamegraph.py is a provenance input, so regenerate results/flamegraph.svg + .txt from the clean tree to keep the Source digest consistent (423 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested complexity). Restructure without changing output: - fold_perf_script: move per-line state into a small _Folder helper so the parsing loop is a flat if/elif/else instead of a nested block. - render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was 7) and extract _append_chrome, _frame_svg, _truncate; geometry constants (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass carries derived geometry. Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt from the clean tree (402 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d4be2daf64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| PERF_LIMITATION=yes | ||
| fi | ||
|
|
||
| SAMPLE_TOKEN="$(sed -nE 's/.*\(([0-9][0-9.,]*[KkMm]?) samples\).*/\1/p' "$RECORD_ERR" | head -1)" |
There was a problem hiding this comment.
Use folded sample totals for the sample gate
On perf versions that print the record summary with an estimate marker, e.g. (~2526 samples) as shown in the perf tutorial's collecting-samples example (which also notes this record summary is only an estimate: https://perfwiki.github.io/main/tutorial/#collecting-samples), this regex leaves SAMPLE_TOKEN empty and SAMPLE_COUNT becomes 0. A valid run with plenty of folded stacks is then classified as insufficient and make flamegraph exits 3 unless partial mode is enabled; even when the regex does match, the gate is based on perf's estimate rather than the actual folded sample total. Please derive the minimum-sample gate from the folded counts or at least accept the optional ~ and verify against actual folded samples.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 31070b1: the regex now accepts the optional ~ ((~?N samples)), and — more importantly — the minimum-sample gate now uses the authoritative folded sample total (sum of folded counts), not perf record's estimate. Both are reported (folded total + perf estimate).
| # Fold to collapsed stacks for the text summary and as an SVG precondition. | ||
| STACK_COUNT=0 | ||
| if [[ "$SCRIPT_STATUS" -eq 0 && -s "$SCRIPT_OUT" ]]; then | ||
| python3 scripts/flamegraph.py --collapse-only <"$SCRIPT_OUT" >"$FOLDED" 2>/dev/null || true |
There was a problem hiding this comment.
Fail on collapsed-stack renderer errors
When perf script succeeds but flamegraph.py --collapse-only fails, this || true converts the renderer failure into an empty folded file and STACK_COUNT=0. In runs with QSL_PERF_ALLOW_PARTIAL=1, that publishes a constrained/partial artifact instead of failing, so a Python incompatibility or parser regression can be mistaken for a perf sampling limitation; capture the collapse status/stderr and treat nonzero as an unexpected generator failure.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 31070b1: --collapse-only now captures COLLAPSE_STATUS (and stderr) instead of || true; a renderer/parser failure exits 4 (unmaskable) rather than being published as a constrained-environment artifact.
Clear the last two CodeScene flags on scripts/flamegraph.py: - _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings never contain parens, so a non-nested " (...)$" match is exact and won't strip a C++ signature's own parentheses. - _layout: drop the unused `total` parameter (5 args -> 4). Output unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Provenance input changed; regenerate from clean tree (416 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3905059eba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| echo "Caveat: this flamegraph is a software cpu-clock sampling profile for hot-symbol" | ||
| echo "investigation. Frame width is proportional to on-CPU samples, not wall-clock" | ||
| echo "latency or throughput, and is hardware/kernel/compiler/build dependent." |
There was a problem hiding this comment.
Label hardware-event flamegraphs accurately
When QSL_FLAMEGRAPH_EVENT=cycles is used (the new perf docs advertise this hardware-PMU mode), ARTIFACT_TYPE changes to flamegraph (cycles hardware-PMU...), but the emitted text companion still hard-codes the caveat as a software cpu-clock profile. The SVG provenance comment has the same wording, so a regenerated hardware-event artifact would contain contradictory metadata and violate the repo's measured-evidence classification rules; make these lines derive from the selected event/artifact type.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 31070b1: a single SAMPLE_KIND (software cpu-clock/task-clock vs hardware-PMU) is derived from the event and drives the artifact type, the SVG provenance caveat, and the text-companion caveat, so QSL_FLAMEGRAPH_EVENT=cycles no longer yields contradictory cpu-clock wording. Artifact regenerated in 06b7675.
Address five Codex review findings on the flamegraph driver: 1. Classify `zero-sized data` (perf script's no-sample report) as a perf limitation, matching scripts/perf_record.sh, so the documented QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping the unexpected-failure exit. 2. Remove any prior results/flamegraph.svg when a partial run captures no folded stacks, so a constrained rerun cannot leave a previous host's SVG beside a .txt that says there is no sample report. 3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the minimum-sample gate on the authoritative folded sample total rather than perf record's self-described estimate. Report both counts. 4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`; a renderer/parser failure now exits 4 (unmaskable) rather than being published as a constrained-environment artifact. 5. Derive the sampling-kind label/caveat from the selected event (software cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment, and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded samples / 159 stacks, classified `flamegraph (software cpu-clock sampling hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the hardened scripts/flamegraph.sh; the .txt reports both the folded total and perf record's estimate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Propagate the #129/#130/#131 Codex fixes up the stack and resolve the v0.2.1 release-anchor Codex findings (#133): - Reconcile the PROGRESS/HANDOFF backlog conflict to the release state: #29 (FIX adapter) and #32 (flamegraph) are both done, shipped in v0.2.1; no open legacy backlog remains. - Sync AGENTS.md and CLAUDE.md released-state anchors from v0.2.0 to v0.2.1, recording the #129/#130/#131 content (the canonical files a resume reads). - Refresh docs/release_readiness.md for v0.2.1: 263/263 tests, the FIX-adapter and flamegraph coverage, the bare-metal flamegraph artifact, and the next release re-pointed to v0.2.1. - Update the v0.2.1 current-state test count 261 -> 263 (the two new FIX rejection tests) and add a decision-log entry for this second Codex round. make check / make asan 263/263 on the bare-metal Apple M2 Fedora Asahi host. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 21998201de
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| exit 1 | ||
| fi | ||
|
|
||
| mkdir -p "$(dirname "$OUT_SVG")" "$(dirname "$DATA")" |
There was a problem hiding this comment.
Create the text output directory before publishing
When QSL_FLAMEGRAPH_TXT is overridden to a path in a directory that does not already exist, this only creates the SVG and perf-data directories, but qsl_publish_artifact later calls mktemp inside the text output directory. That makes make flamegraph fail after the SVG may already have been published, leaving a half-generated artifact set instead of honoring the advertised text-output override; include $(dirname "$OUT_TXT") in the directory creation.
Useful? React with 👍 / 👎.
| if grep -Eiq 'zero-sized data|No samples|failed to open|Permission denied|Operation not permitted|perf_event_open|not supported|Operation not supported|perf not found for kernel|linux-tools' \ | ||
| "$RECORD_ERR" "$SCRIPT_ERR"; then |
There was a problem hiding this comment.
Classify perf no-permission errors as partial
On hosts where perf record reports the standard access error No permission to enable ... event without also printing Operation not permitted, this grep leaves PERF_LIMITATION=no; I reproduced that path with a fake perf and QSL_PERF_ALLOW_PARTIAL=1, and the script still exited 3 after writing a constrained artifact. Since perf's own error path emits No permission to enable %s event, include that diagnostic so permission-limited Linux hosts follow the documented partial-artifact path instead of being treated as unexpected generator failures.
Useful? React with 👍 / 👎.
| if self._comm: | ||
| frames.insert(0, self._comm) | ||
| key = ";".join(frames) | ||
| self.folded[key] = self.folded.get(key, 0) + 1 |
There was a problem hiding this comment.
Weight folded stacks by perf sample period
For perf-script records whose header period is greater than 1 (the test fixture itself uses ...: 1000 cpu-clock:u:), _flush() still adds only 1 to the collapsed stack. The standard stackcollapse-perf.pl parses that event period and records stacks with $m_period, so make flamegraph can skew frame widths and understate folded totals for cpu-clock or hardware-event runs where perf emits variable periods.
Useful? React with 👍 / 👎.
The flamegraph artifact, generator, provenance companion, and docs already existed but no page actually displayed the SVG — it was only referenced by filename. Embed the rendered results/flamegraph.svg as a visible image under the Benchmarks section, with a caption that classifies it honestly as a software cpu-clock sampling hot-symbol profile (not PMU evidence), names the hot frames, and links the provenance .txt and docs/perf_analysis.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Gates Passed
6 Quality Gates Passed
See analysis details in CodeScene
Quality Gate Profile: Pay Down Tech Debt
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.
* docs: sync resume anchors and PMU claims to v0.2.0 (Codex #127/#128 follow-up) Resolve the Codex review findings left on `main` by PRs #127 and #128: - PROGRESS.md: remove the stale "Next action remains" block that still steered /resume to the merged PR #125 on `perf/linux-host-artifact-refresh`; replace with the v0.2.0 between-releases state (no active milestone; #94/#90 gated). - AGENTS.md: bring it into sync with CLAUDE.md's v0.2.0 partial-PMU reframe. The constraints bullet, the "correct claim" block, and the "M29 perf evidence status" subsection no longer label the artifacts "constrained Docker validation"; the stale `perf/linux-host-artifact-refresh` follow-up line is updated (also in CLAUDE.md) to the released state. - docs/perf_analysis.md: narrow the PMU claim so it no longer implies the Apple Blizzard (E-core) PMU carries live counts. The `apple_blizzard_pmu/...` rows read `<not counted>` in results/perf_stat_linux.txt because the single-threaded benchmark stays on the Avalanche P-cores — expected scheduling, not a counter. Docs/memory only; no code or artifacts changed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: add flamegraph generator and make target (#32) Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32. The perf stat/record text workflow already existed; this renders a perf call-graph flamegraph. - scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a results/flamegraph.txt provenance/classification companion (top folded stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs unwind correctly despite the Release `bench` preset omitting frame pointers. - scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG renderer, so the artifact is reproducible from the repo without vendoring the Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure function of the name, no RNG/timestamps in the drawn body. - tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, determinism, empty-input handling. - docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md, AGENTS.md), MILESTONES.md backlog, PROGRESS.md log. `make check` 242/242. Full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: add generated flamegraph artifact on bare-metal Fedora Asahi (#32) results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph` from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a latency/throughput claim; full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: harden flamegraph collapsed-stack parsing (Codex review) Address two Codex review findings in scripts/flamegraph.py::parse_collapsed: - Prefer a tab separator when present so a tab-separated folded line whose stack contains spaces (C++ signatures) splits on the trailing count instead of an interior space and is silently dropped. - Ignore non-positive sample counts, so hand-crafted --from-collapsed input with 0/negative counts cannot render a misleading SVG (all-non-positive input now fails with exit 1 via the existing empty-folded guard). Adds test coverage in tests/shell/test_flamegraph.sh (19/19). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after parser hardening flamegraph.py is a provenance input, so regenerate results/flamegraph.svg + .txt from the clean tree to keep the Source digest consistent (423 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: improve flamegraph.py code health (CodeScene gate) CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested complexity). Restructure without changing output: - fold_perf_script: move per-line state into a small _Folder helper so the parsing loop is a flat if/elif/else instead of a nested block. - render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was 7) and extract _append_chrome, _frame_svg, _truncate; geometry constants (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass carries derived geometry. Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after code-health refactor flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt from the clean tree (402 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: flatten flamegraph.py remaining complexity (CodeScene) Clear the last two CodeScene flags on scripts/flamegraph.py: - _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings never contain parens, so a non-nested " (...)$" match is exact and won't strip a C++ signature's own parentheses. - _layout: drop the unused `total` parameter (5 args -> 4). Output unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after complexity flattening Provenance input changed; regenerate from clean tree (416 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: record resume-anchor sync in PROGRESS current-state (Codex #129) The decision-log entry and bottom "Next action remains" block already covered this follow-up, but the top `## Current state` bullets — the first thing `/resume` reads — still presented the v0.2.0 release (PR #127) as the "Last action". A resuming agent could therefore miss that this resume-anchor / PMU sync already happened and duplicate it. Record the sync as the current "Last action" and "Last completed docs sync", demoting the v0.2.0 release detail to a "Prior action" bullet. Docs-only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: harden flamegraph.sh classification + sample gating (Codex #130) Address five Codex review findings on the flamegraph driver: 1. Classify `zero-sized data` (perf script's no-sample report) as a perf limitation, matching scripts/perf_record.sh, so the documented QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping the unexpected-failure exit. 2. Remove any prior results/flamegraph.svg when a partial run captures no folded stacks, so a constrained rerun cannot leave a previous host's SVG beside a .txt that says there is no sample report. 3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the minimum-sample gate on the authoritative folded sample total rather than perf record's self-described estimate. Report both counts. 4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`; a renderer/parser failure now exits 4 (unmaskable) rather than being published as a constrained-environment artifact. 5. Derive the sampling-kind label/caveat from the selected event (software cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment, and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after classification hardening Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded samples / 159 stacks, classified `flamegraph (software cpu-clock sampling hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the hardened scripts/flamegraph.sh; the .txt reports both the folded total and perf record's estimate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: scope partial-PMU claim to perf-stat; perf-record is a software profile (Codex #129) The constraints bullet labeled all perf artifacts as partial hardware PMU evidence, but only results/perf_stat_linux.txt carries real PMU counters (cycles/instructions/branches/branch-misses). results/perf_report_linux.txt is a software cpu-clock sampling profile, not PMU evidence. Scope the claim to the perf-stat artifact and call out perf-record separately, identically in AGENTS.md and CLAUDE.md so the two memories stay in sync. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: embed the flamegraph as a visible image in the README The flamegraph artifact, generator, provenance companion, and docs already existed but no page actually displayed the SVG — it was only referenced by filename. Embed the rendered results/flamegraph.svg as a visible image under the Benchmarks section, with a caption that classifies it honestly as a software cpu-clock sampling hot-symbol profile (not PMU evidence), names the hot frames, and links the provenance .txt and docs/perf_analysis.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* docs: sync resume anchors and PMU claims to v0.2.0 (Codex #127/#128 follow-up) Resolve the Codex review findings left on `main` by PRs #127 and #128: - PROGRESS.md: remove the stale "Next action remains" block that still steered /resume to the merged PR #125 on `perf/linux-host-artifact-refresh`; replace with the v0.2.0 between-releases state (no active milestone; #94/#90 gated). - AGENTS.md: bring it into sync with CLAUDE.md's v0.2.0 partial-PMU reframe. The constraints bullet, the "correct claim" block, and the "M29 perf evidence status" subsection no longer label the artifacts "constrained Docker validation"; the stale `perf/linux-host-artifact-refresh` follow-up line is updated (also in CLAUDE.md) to the released state. - docs/perf_analysis.md: narrow the PMU claim so it no longer implies the Apple Blizzard (E-core) PMU carries live counts. The `apple_blizzard_pmu/...` rows read `<not counted>` in results/perf_stat_linux.txt because the single-threaded benchmark stays on the Avalanche P-cores — expected scheduling, not a counter. Docs/memory only; no code or artifacts changed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: add flamegraph generator and make target (#32) Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32. The perf stat/record text workflow already existed; this renders a perf call-graph flamegraph. - scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a results/flamegraph.txt provenance/classification companion (top folded stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs unwind correctly despite the Release `bench` preset omitting frame pointers. - scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG renderer, so the artifact is reproducible from the repo without vendoring the Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure function of the name, no RNG/timestamps in the drawn body. - tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, determinism, empty-input handling. - docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md, AGENTS.md), MILESTONES.md backlog, PROGRESS.md log. `make check` 242/242. Full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: add generated flamegraph artifact on bare-metal Fedora Asahi (#32) results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph` from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a latency/throughput claim; full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat: add FIX-like text protocol adapter (#29) Add a human-readable `tag=value` text protocol alongside the binary codec, mapping the same internal message structs (issue #29, reprioritized by the human from the backlog). - include/qsl/protocol/fix.hpp + src/protocol/fix.cpp: SOH-framed `tag=value` adapter with genuine FIX framing — 8 BeginString / 9 BodyLength / 35 MsgType / ... / 10 mod-256 CheckSum — for the client->gateway order path: NewOrderSingle (35=D) -> NewOrder, OrderCancelRequest (35=F) -> CancelOrder. Decoding is total, deterministic, and noexcept (fixed field table, std::from_chars, std::string_view; no heap on the decode path) and reports every malformed input through a FixError taxonomy mirroring the binary codec's DecodeError. Documented, deliberate simplifications: Symbol (55) carries the numeric SymbolId; Price (44) carries integer ticks and is always present, so NewOrder<->FIX is a lossless bijection like the binary codec (price is never a float). - tests/unit/test_fix_protocol.cpp: mirrors the binary required tests and adds a cross-codec equivalence test (binary and FIX decode the same order to identical structs across all Side x OrdType x TIF), a byte-pinned fixture, and rejection of malformed framing / unsupported BeginString / unknown-or-wrong MsgType / BodyLength mismatch / CheckSum mismatch / missing field / invalid field / invalid enum / out-of-range / oversized messages. - docs/fix_protocol.md (+ pointer from docs/binary_protocol.md); MILESTONES.md and PROGRESS.md updated. make check 260/260; make asan 260/260 (the adapter parses untrusted text). Closes #29. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: harden flamegraph collapsed-stack parsing (Codex review) Address two Codex review findings in scripts/flamegraph.py::parse_collapsed: - Prefer a tab separator when present so a tab-separated folded line whose stack contains spaces (C++ signatures) splits on the trailing count instead of an interior space and is silently dropped. - Ignore non-positive sample counts, so hand-crafted --from-collapsed input with 0/negative counts cannot render a misleading SVG (all-non-positive input now fails with exit 1 via the existing empty-folded guard). Adds test coverage in tests/shell/test_flamegraph.sh (19/19). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after parser hardening flamegraph.py is a provenance input, so regenerate results/flamegraph.svg + .txt from the clean tree to keep the Source digest consistent (423 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: enforce FIX-required ClOrdID on OrderCancelRequest (Codex review) decode_cancel_order validated only OrigClOrdID (41) and Symbol (55), so a 35=F message missing ClOrdID (tag 11) — which FIX requires and encode() emits — was accepted. Validate tag 11 (present and numeric) on decode without storing it, keeping decode symmetric with encode. Adds a rejection test and clarifies the docs/fix_protocol.md note for tag 11. make check 261/261; make asan 261/261. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: improve flamegraph.py code health (CodeScene gate) CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested complexity). Restructure without changing output: - fold_perf_script: move per-line state into a small _Folder helper so the parsing loop is a flat if/elif/else instead of a nested block. - render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was 7) and extract _append_chrome, _frame_svg, _truncate; geometry constants (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass carries derived geometry. Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after code-health refactor flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt from the clean tree (402 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: reduce decode_new_order complexity in fix.cpp (CodeScene gate) CodeScene's delta gate scored src/protocol/fix.cpp at 8.02 (decode_new_order: Complex Method / Complex Conditional / Overall Code Complexity). Restructure without changing behavior: - Extract map_side / map_ord_type / map_tif (the enum-code switches) and expect_msg_type (the tag-35 check) as small helpers. - Add a FieldReader accumulator that reads required integer/coded fields and short-circuits on the first error, so decode_new_order and decode_cancel_order become a flat fluent chain plus a single error check instead of a long if-return ladder with three inline switches. make check 261/261; make asan 261/261; FIX tests 19 cases / 140 assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: flatten flamegraph.py remaining complexity (CodeScene) Clear the last two CodeScene flags on scripts/flamegraph.py: - _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings never contain parens, so a non-nested " (...)$" match is exact and won't strip a C++ signature's own parentheses. - _layout: drop the unused `total` parameter (5 args -> 4). Output unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after complexity flattening Provenance input changed; regenerate from clean tree (416 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: remove decoder duplication and split parse_envelope (CodeScene) Clear the remaining CodeScene flags on src/protocol/fix.cpp (Code Duplication, Complex Method, Complex Conditional): - Extract a decode_typed<T> skeleton (validate envelope -> confirm MsgType -> fill body via FieldReader -> error check) so decode_new_order and decode_cancel_order collapse to just their field maps, removing the duplicated prologue/epilogue. - Split parse_envelope into tokenize / check_envelope_shape / verify_length_and_checksum, each a small single-purpose function, and fold the 8/9/10 ordering check into a named bool. Behavior unchanged: make check 261/261, make asan 261/261, FIX tests 19 cases / 140 assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: table-driven enum maps + simpler msg-type check (CodeScene) Clear the last two CodeScene advisory flags on src/protocol/fix.cpp: - Code Duplication: the three near-identical map_side/map_ord_type/map_tif switches are replaced by one generic FieldReader::coded(tag, out, table) that looks a code up in a small constexpr {code, enum} table; the decoder supplies the per-enum tables. No per-enum mapping duplication remains. - Complex Conditional: expect_msg_type's 3-term `||` is split so each branch has a single operator. make check 261/261; make asan 261/261; FIX tests 19 cases / 140 assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: record resume-anchor sync in PROGRESS current-state (Codex #129) The decision-log entry and bottom "Next action remains" block already covered this follow-up, but the top `## Current state` bullets — the first thing `/resume` reads — still presented the v0.2.0 release (PR #127) as the "Last action". A resuming agent could therefore miss that this resume-anchor / PMU sync already happened and duplicate it. Record the sync as the current "Last action" and "Last completed docs sync", demoting the v0.2.0 release detail to a "Prior action" bullet. Docs-only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: harden flamegraph.sh classification + sample gating (Codex #130) Address five Codex review findings on the flamegraph driver: 1. Classify `zero-sized data` (perf script's no-sample report) as a perf limitation, matching scripts/perf_record.sh, so the documented QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping the unexpected-failure exit. 2. Remove any prior results/flamegraph.svg when a partial run captures no folded stacks, so a constrained rerun cannot leave a previous host's SVG beside a .txt that says there is no sample report. 3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the minimum-sample gate on the authoritative folded sample total rather than perf record's self-described estimate. Report both counts. 4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`; a renderer/parser failure now exits 4 (unmaskable) rather than being published as a constrained-environment artifact. 5. Derive the sampling-kind label/caveat from the selected event (software cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment, and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after classification hardening Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded samples / 159 stacks, classified `flamegraph (software cpu-clock sampling hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the hardened scripts/flamegraph.sh; the .txt reports both the folded total and perf record's estimate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: enforce FIX envelope MsgType position + reject duplicate tags (Codex #131) Two Codex review findings on the FIX adapter's parser strictness: - check_envelope_shape now requires MsgType (35) as the first body field, immediately after BodyLength, so a non-standard header like 8/9/34/35/.../10 is rejected as Malformed instead of decoding via a first-match scan. - tokenize now rejects any repeated tag. This adapter maps each business tag exactly once (no repeating groups), so a duplicate such as `55=2` then `55=999` is an ambiguous/malformed frame rather than a silently-ignored later value. Adds a deterministic rejection test for each. (The earlier ClOrdID-required finding was already resolved by 3e4c8e3 and is covered by an existing test.) make check 263/263, make asan 263/263. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: drop delivered #29 from open-backlog anchors (Codex #131) The decision log marks #29 (FIX adapter) closed by this PR, but the current-state / resume anchors in PROGRESS.md and HANDOFF.md still listed it as open backlog, so /resume could send the next session to re-implement work this PR just added. Remove #29 from those backlog lists and note it as delivered in this PR. (#32 stays listed here; the v0.2.1 release PR removes it as part of its release sweep.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: scope partial-PMU claim to perf-stat; perf-record is a software profile (Codex #129) The constraints bullet labeled all perf artifacts as partial hardware PMU evidence, but only results/perf_stat_linux.txt carries real PMU counters (cycles/instructions/branches/branch-misses). results/perf_report_linux.txt is a software cpu-clock sampling profile, not PMU evidence. Scope the claim to the perf-stat artifact and call out perf-record separately, identically in AGENTS.md and CLAUDE.md so the two memories stay in sync. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: embed the flamegraph as a visible image in the README The flamegraph artifact, generator, provenance companion, and docs already existed but no page actually displayed the SVG — it was only referenced by filename. Embed the rendered results/flamegraph.svg as a visible image under the Benchmarks section, with a caption that classifies it honestly as a software cpu-clock sampling hot-symbol profile (not PMU evidence), names the hot frames, and links the provenance .txt and docs/perf_analysis.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5093beb518
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ARTIFACT_TYPE="constrained-environment validation (partial; insufficient samples for hot-symbol conclusions)" | ||
| fi | ||
|
|
||
| PROVENANCE="$(qsl_emit_provenance "$PROVENANCE_SCOPE" "$OUT_SVG" "${PROVENANCE_INPUTS[@]}")" |
There was a problem hiding this comment.
Generate provenance for each flamegraph output
When make flamegraph writes the text companion, this single provenance block is emitted with OUT_SVG, so results/flamegraph.txt currently self-reports Generated output: results/flamegraph.svg. That corrupts the metadata for the .txt artifact (and any QSL_FLAMEGRAPH_TXT override), even though the results policy expects each artifact to identify its own generated output; emit separate provenance for the SVG and text outputs or make the text file explicitly self-identify.
Useful? React with 👍 / 👎.
| case "$EVENT" in | ||
| cpu-clock | task-clock) SAMPLE_KIND="software $EVENT sampling" ;; | ||
| *) SAMPLE_KIND="$EVENT hardware-PMU sampling" ;; |
There was a problem hiding this comment.
Strip perf modifiers before classifying sample source
The new classifier handles the default cpu-clock, but a common perf event spelling such as QSL_FLAMEGRAPH_EVENT=cpu-clock:u or another kernel software event like context-switches falls into the *) arm and is published as hardware-PMU sampling. Linux documents cpu-clock, task-clock, page faults, and context switches as software events in perf_event_open(2), so this can produce contradictory evidence labels in both the SVG and .txt; normalize the base event name before deciding whether it is PMU-backed.
Useful? React with 👍 / 👎.
| (E-core) PMU carries live counts — the `apple_blizzard_pmu/...` rows read `<not counted>` in | ||
| `results/perf_stat_linux.txt` because the single-threaded benchmark stays on the Avalanche P-cores. | ||
| Docs/memory only; no code or artifacts changed. | ||
| - [2026-06-21] Issue #32 flamegraph profiling artifact (`perf/flamegraph-artifact`, stacked on the |
There was a problem hiding this comment.
Refresh resume anchors after closing #32
This new entry records issue #32 as done and adds a 242nd check plus a new flamegraph artifact, but the current-state resume anchors still list #32 as open legacy backlog and report the old 241-test / 15-result state. Because /resume reads those top-level anchors before the history log, the next agent can restart the just-finished flamegraph work or report stale verification; update PROGRESS/HANDOFF current state alongside this completion entry.
Useful? React with 👍 / 👎.
| echo "Insufficient samples: $INSUFFICIENT_SAMPLES" | ||
| echo "Record status: $RECORD_STATUS" | ||
| echo "Script status: $SCRIPT_STATUS" | ||
| echo "Perf access limitation: $PERF_LIMITATION" |
There was a problem hiding this comment.
Include perf stderr in partial flamegraph reports
When QSL_PERF_ALLOW_PARTIAL=1 is used on a no-sample or permission-limited host, the published .txt only records numeric statuses and this boolean, but it drops the actual perf record / perf script diagnostics that justified the constrained classification. That makes a committed partial artifact hard to audit or distinguish from an unexpected perf/script failure after the fact; include the captured stderr blocks in the text companion, as the existing perf-record report does.
Useful? React with 👍 / 👎.
…), anchor sweep (#133) * docs: sync resume anchors and PMU claims to v0.2.0 (Codex #127/#128 follow-up) Resolve the Codex review findings left on `main` by PRs #127 and #128: - PROGRESS.md: remove the stale "Next action remains" block that still steered /resume to the merged PR #125 on `perf/linux-host-artifact-refresh`; replace with the v0.2.0 between-releases state (no active milestone; #94/#90 gated). - AGENTS.md: bring it into sync with CLAUDE.md's v0.2.0 partial-PMU reframe. The constraints bullet, the "correct claim" block, and the "M29 perf evidence status" subsection no longer label the artifacts "constrained Docker validation"; the stale `perf/linux-host-artifact-refresh` follow-up line is updated (also in CLAUDE.md) to the released state. - docs/perf_analysis.md: narrow the PMU claim so it no longer implies the Apple Blizzard (E-core) PMU carries live counts. The `apple_blizzard_pmu/...` rows read `<not counted>` in results/perf_stat_linux.txt because the single-threaded benchmark stays on the Avalanche P-cores — expected scheduling, not a counter. Docs/memory only; no code or artifacts changed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: add flamegraph generator and make target (#32) Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32. The perf stat/record text workflow already existed; this renders a perf call-graph flamegraph. - scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a results/flamegraph.txt provenance/classification companion (top folded stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs unwind correctly despite the Release `bench` preset omitting frame pointers. - scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG renderer, so the artifact is reproducible from the repo without vendoring the Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure function of the name, no RNG/timestamps in the drawn body. - tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, determinism, empty-input handling. - docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md, AGENTS.md), MILESTONES.md backlog, PROGRESS.md log. `make check` 242/242. Full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: add generated flamegraph artifact on bare-metal Fedora Asahi (#32) results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph` from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a latency/throughput claim; full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat: add FIX-like text protocol adapter (#29) Add a human-readable `tag=value` text protocol alongside the binary codec, mapping the same internal message structs (issue #29, reprioritized by the human from the backlog). - include/qsl/protocol/fix.hpp + src/protocol/fix.cpp: SOH-framed `tag=value` adapter with genuine FIX framing — 8 BeginString / 9 BodyLength / 35 MsgType / ... / 10 mod-256 CheckSum — for the client->gateway order path: NewOrderSingle (35=D) -> NewOrder, OrderCancelRequest (35=F) -> CancelOrder. Decoding is total, deterministic, and noexcept (fixed field table, std::from_chars, std::string_view; no heap on the decode path) and reports every malformed input through a FixError taxonomy mirroring the binary codec's DecodeError. Documented, deliberate simplifications: Symbol (55) carries the numeric SymbolId; Price (44) carries integer ticks and is always present, so NewOrder<->FIX is a lossless bijection like the binary codec (price is never a float). - tests/unit/test_fix_protocol.cpp: mirrors the binary required tests and adds a cross-codec equivalence test (binary and FIX decode the same order to identical structs across all Side x OrdType x TIF), a byte-pinned fixture, and rejection of malformed framing / unsupported BeginString / unknown-or-wrong MsgType / BodyLength mismatch / CheckSum mismatch / missing field / invalid field / invalid enum / out-of-range / oversized messages. - docs/fix_protocol.md (+ pointer from docs/binary_protocol.md); MILESTONES.md and PROGRESS.md updated. make check 260/260; make asan 260/260 (the adapter parses untrusted text). Closes #29. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: harden flamegraph collapsed-stack parsing (Codex review) Address two Codex review findings in scripts/flamegraph.py::parse_collapsed: - Prefer a tab separator when present so a tab-separated folded line whose stack contains spaces (C++ signatures) splits on the trailing count instead of an interior space and is silently dropped. - Ignore non-positive sample counts, so hand-crafted --from-collapsed input with 0/negative counts cannot render a misleading SVG (all-non-positive input now fails with exit 1 via the existing empty-folded guard). Adds test coverage in tests/shell/test_flamegraph.sh (19/19). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after parser hardening flamegraph.py is a provenance input, so regenerate results/flamegraph.svg + .txt from the clean tree to keep the Source digest consistent (423 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: enforce FIX-required ClOrdID on OrderCancelRequest (Codex review) decode_cancel_order validated only OrigClOrdID (41) and Symbol (55), so a 35=F message missing ClOrdID (tag 11) — which FIX requires and encode() emits — was accepted. Validate tag 11 (present and numeric) on decode without storing it, keeping decode symmetric with encode. Adds a rejection test and clarifies the docs/fix_protocol.md note for tag 11. make check 261/261; make asan 261/261. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: improve flamegraph.py code health (CodeScene gate) CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested complexity). Restructure without changing output: - fold_perf_script: move per-line state into a small _Folder helper so the parsing loop is a flat if/elif/else instead of a nested block. - render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was 7) and extract _append_chrome, _frame_svg, _truncate; geometry constants (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass carries derived geometry. Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after code-health refactor flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt from the clean tree (402 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: reduce decode_new_order complexity in fix.cpp (CodeScene gate) CodeScene's delta gate scored src/protocol/fix.cpp at 8.02 (decode_new_order: Complex Method / Complex Conditional / Overall Code Complexity). Restructure without changing behavior: - Extract map_side / map_ord_type / map_tif (the enum-code switches) and expect_msg_type (the tag-35 check) as small helpers. - Add a FieldReader accumulator that reads required integer/coded fields and short-circuits on the first error, so decode_new_order and decode_cancel_order become a flat fluent chain plus a single error check instead of a long if-return ladder with three inline switches. make check 261/261; make asan 261/261; FIX tests 19 cases / 140 assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: flatten flamegraph.py remaining complexity (CodeScene) Clear the last two CodeScene flags on scripts/flamegraph.py: - _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings never contain parens, so a non-nested " (...)$" match is exact and won't strip a C++ signature's own parentheses. - _layout: drop the unused `total` parameter (5 args -> 4). Output unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after complexity flattening Provenance input changed; regenerate from clean tree (416 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: remove decoder duplication and split parse_envelope (CodeScene) Clear the remaining CodeScene flags on src/protocol/fix.cpp (Code Duplication, Complex Method, Complex Conditional): - Extract a decode_typed<T> skeleton (validate envelope -> confirm MsgType -> fill body via FieldReader -> error check) so decode_new_order and decode_cancel_order collapse to just their field maps, removing the duplicated prologue/epilogue. - Split parse_envelope into tokenize / check_envelope_shape / verify_length_and_checksum, each a small single-purpose function, and fold the 8/9/10 ordering check into a named bool. Behavior unchanged: make check 261/261, make asan 261/261, FIX tests 19 cases / 140 assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: table-driven enum maps + simpler msg-type check (CodeScene) Clear the last two CodeScene advisory flags on src/protocol/fix.cpp: - Code Duplication: the three near-identical map_side/map_ord_type/map_tif switches are replaced by one generic FieldReader::coded(tag, out, table) that looks a code up in a small constexpr {code, enum} table; the decoder supplies the per-enum tables. No per-enum mapping duplication remains. - Complex Conditional: expect_msg_type's 3-term `||` is split so each branch has a single operator. make check 261/261; make asan 261/261; FIX tests 19 cases / 140 assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * release: v0.2.1 — FIX text protocol adapter, perf flamegraph, anchor sweep Bump the project version 0.2.0 -> 0.2.1 and record the release. - CMakeLists.txt: project VERSION 0.2.1. - CHANGELOG.md: new [0.2.1] section — FIX-like text protocol adapter (#29), `make flamegraph` + dependency-free renderer (#32), and the Codex resume-anchor / PMU-claim consistency sweep (#127/#128 follow-up); [Unreleased] reset. - PROGRESS.md / HANDOFF.md: release and resume anchors brought to the v0.2.1 released state; #29 and #32 marked done. No code or benchmark artifacts change in this release PR. On squash-merge, tag `v0.2.1` on the merge commit and publish the GitHub release. make check 261/261. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: record resume-anchor sync in PROGRESS current-state (Codex #129) The decision-log entry and bottom "Next action remains" block already covered this follow-up, but the top `## Current state` bullets — the first thing `/resume` reads — still presented the v0.2.0 release (PR #127) as the "Last action". A resuming agent could therefore miss that this resume-anchor / PMU sync already happened and duplicate it. Record the sync as the current "Last action" and "Last completed docs sync", demoting the v0.2.0 release detail to a "Prior action" bullet. Docs-only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: harden flamegraph.sh classification + sample gating (Codex #130) Address five Codex review findings on the flamegraph driver: 1. Classify `zero-sized data` (perf script's no-sample report) as a perf limitation, matching scripts/perf_record.sh, so the documented QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping the unexpected-failure exit. 2. Remove any prior results/flamegraph.svg when a partial run captures no folded stacks, so a constrained rerun cannot leave a previous host's SVG beside a .txt that says there is no sample report. 3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the minimum-sample gate on the authoritative folded sample total rather than perf record's self-described estimate. Report both counts. 4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`; a renderer/parser failure now exits 4 (unmaskable) rather than being published as a constrained-environment artifact. 5. Derive the sampling-kind label/caveat from the selected event (software cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment, and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after classification hardening Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded samples / 159 stacks, classified `flamegraph (software cpu-clock sampling hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the hardened scripts/flamegraph.sh; the .txt reports both the folded total and perf record's estimate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: enforce FIX envelope MsgType position + reject duplicate tags (Codex #131) Two Codex review findings on the FIX adapter's parser strictness: - check_envelope_shape now requires MsgType (35) as the first body field, immediately after BodyLength, so a non-standard header like 8/9/34/35/.../10 is rejected as Malformed instead of decoding via a first-match scan. - tokenize now rejects any repeated tag. This adapter maps each business tag exactly once (no repeating groups), so a duplicate such as `55=2` then `55=999` is an ambiguous/malformed frame rather than a silently-ignored later value. Adds a deterministic rejection test for each. (The earlier ClOrdID-required finding was already resolved by 3e4c8e3 and is covered by an existing test.) make check 263/263, make asan 263/263. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: drop delivered #29 from open-backlog anchors (Codex #131) The decision log marks #29 (FIX adapter) closed by this PR, but the current-state / resume anchors in PROGRESS.md and HANDOFF.md still listed it as open backlog, so /resume could send the next session to re-implement work this PR just added. Remove #29 from those backlog lists and note it as delivered in this PR. (#32 stays listed here; the v0.2.1 release PR removes it as part of its release sweep.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: scope partial-PMU claim to perf-stat; perf-record is a software profile (Codex #129) The constraints bullet labeled all perf artifacts as partial hardware PMU evidence, but only results/perf_stat_linux.txt carries real PMU counters (cycles/instructions/branches/branch-misses). results/perf_report_linux.txt is a software cpu-clock sampling profile, not PMU evidence. Scope the claim to the perf-stat artifact and call out perf-record separately, identically in AGENTS.md and CLAUDE.md so the two memories stay in sync. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: embed the flamegraph as a visible image in the README The flamegraph artifact, generator, provenance companion, and docs already existed but no page actually displayed the SVG — it was only referenced by filename. Embed the rendered results/flamegraph.svg as a visible image under the Benchmarks section, with a caption that classifies it honestly as a software cpu-clock sampling hot-symbol profile (not PMU evidence), names the hot frames, and links the provenance .txt and docs/perf_analysis.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Summary
Closes #32 — the missing-flamegraph follow-up. PR #89 landed the
perf stat/perf record/ textperf report --stdioworkflow but did not produce a flamegraph; this addsmake flamegraphand commits a real artifact generated on the bare-metal Apple M2 (aarch64) Fedora Asahi host.What's added
scripts/flamegraph.py— a dependency-free (Python stdlib only)stackcollapse+ SVG flamegraph renderer. Deliberately not vendoring Brendan Gregg's Perl FlameGraph toolkit, so the artifact is reproducible from this repo alone. Deterministic by construction: frames sorted by name, colors a pure function of the frame name, no RNG and no timestamps in the drawn body.scripts/flamegraph.sh— driver modeled onperf_record.sh: Linux-only, recordsperf record --call-graph dwarf -F 4000 -g -e cpu-clockonqsl-bench, reusesqsl_common.shprovenance +qsl_publish_artifact, and honorsQSL_PERF_ALLOW_PARTIALfor constrained hosts. DWARF call graphs are used because the Releasebenchpreset omits frame pointers; application symbols still resolve from the symbol table without changing the optimization level under measurement.make flamegraphtarget (+.PHONY, command lists in CLAUDE.md/AGENTS.md).tests/shell/test_flamegraph.sh— CTest-registered (python3-only; skips cleanly if absent), 16 assertions over folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, render determinism, and empty-input handling.docs/perf_analysis.md(flamegraph section + artifact entry),results/README.md,MILESTONES.mdbacklog,PROGRESS.mdlog.Committed artifact
results/flamegraph.svg(+results/flamegraph.txtprovenance/classification companion) generated from the clean committed tree: 397 cpu-clock samples, 171 folded stacks,Dirty inputs: no, well-formed XML. Hot paths resolve to real engine symbols —OrderBook::modify/cancel/add_limit, thedispatch_storagecancel path,decode_new_order, the gatewaySessionpath,replay::generate_flow.Honesty / scope
This is a software cpu-clock sampling hot-symbol profile (frame width ∝ on-CPU samples), not a latency or throughput measurement, and is hardware/build dependent. Full hardware cache-PMU evidence remains tracked by #90.
Verification
make check→ 242/242 (includes the newqsl_flamegraph_renderCTest).make flamegraphregenerates the artifact withDirty inputs: no; SVG validated well-formed viaxml.dom.minidom; geometry checked (root frame spans the plot; no NaN/negative widths);git grepconfirms no non-broadcast MAC leaked into the artifact.bash -n scripts/flamegraph.shclean.🤖 Generated with Claude Code