perf: generate perf call-graph flamegraph artifact (#32) by div0rce · Pull Request #130 · div0rce/quant-systems-lab

div0rce · 2026-06-21T06:40:33Z

Summary

Closes #32 — the missing-flamegraph follow-up. PR #89 landed the perf stat / perf record / text perf report --stdio workflow but did not produce a flamegraph; this adds make flamegraph and commits a real artifact generated on the bare-metal Apple M2 (aarch64) Fedora Asahi host.

Stacked PR. Based on docs/codex-resume-anchor-sync (#129) so the diff is clean; merge #129 first and GitHub will retarget this to main.

What's added

scripts/flamegraph.py — a dependency-free (Python stdlib only) stackcollapse + SVG flamegraph renderer. Deliberately not vendoring Brendan Gregg's Perl FlameGraph toolkit, so the artifact is reproducible from this repo alone. Deterministic by construction: frames sorted by name, colors a pure function of the frame name, no RNG and no timestamps in the drawn body.
scripts/flamegraph.sh — driver modeled on perf_record.sh: Linux-only, records perf record --call-graph dwarf -F 4000 -g -e cpu-clock on qsl-bench, reuses qsl_common.sh provenance + qsl_publish_artifact, and honors QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs are used because the Release bench preset omits frame pointers; application symbols still resolve from the symbol table without changing the optimization level under measurement.
make flamegraph target (+ .PHONY, command lists in CLAUDE.md/AGENTS.md).
tests/shell/test_flamegraph.sh — CTest-registered (python3-only; skips cleanly if absent), 16 assertions over folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, render determinism, and empty-input handling.
Docs: docs/perf_analysis.md (flamegraph section + artifact entry), results/README.md, MILESTONES.md backlog, PROGRESS.md log.

Committed artifact

results/flamegraph.svg (+ results/flamegraph.txt provenance/classification companion) generated from the clean committed tree: 397 cpu-clock samples, 171 folded stacks, Dirty inputs: no, well-formed XML. Hot paths resolve to real engine symbols — OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow.

Honesty / scope

This is a software cpu-clock sampling hot-symbol profile (frame width ∝ on-CPU samples), not a latency or throughput measurement, and is hardware/build dependent. Full hardware cache-PMU evidence remains tracked by #90.

Verification

make check → 242/242 (includes the new qsl_flamegraph_render CTest).
make flamegraph regenerates the artifact with Dirty inputs: no; SVG validated well-formed via xml.dom.minidom; geometry checked (root frame spans the plot; no NaN/negative widths); git grep confirms no non-broadcast MAC leaked into the artifact.
bash -n scripts/flamegraph.sh clean.

🤖 Generated with Claude Code

Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32. The perf stat/record text workflow already existed; this renders a perf call-graph flamegraph. - scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a results/flamegraph.txt provenance/classification companion (top folded stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs unwind correctly despite the Release `bench` preset omitting frame pointers. - scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG renderer, so the artifact is reproducible from the repo without vendoring the Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure function of the name, no RNG/timestamps in the drawn body. - tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, determinism, empty-input handling. - docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md, AGENTS.md), MILESTONES.md backlog, PROGRESS.md log. `make check` 242/242. Full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph` from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a latency/throughput claim; full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-21T06:40:40Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 82278c5a-2514-4edb-842f-25e2a78e6777

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/flamegraph-artifact

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: beec2d0c11

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-21T06:50:04Z

+if grep -Eiq 'No samples|failed to open|Permission denied|Operation not permitted|perf_event_open|not supported|Operation not supported|perf not found for kernel|linux-tools' \
+    "$RECORD_ERR" "$SCRIPT_ERR"; then


Classify zero-sized perf data as partial

In no-sample environments perf script can report zero-sized data; the existing scripts/perf_record.sh wrapper treats that as a perf limitation, but this grep omits it. In that scenario SCRIPT_STATUS is nonzero, PERF_LIMITATION stays no, and the later failure path exits 3 even when QSL_PERF_ALLOW_PARTIAL=1, so the documented constrained-host artifact path does not work. Please include the same no-sample diagnostic here.

Useful? React with 👍 / 👎.

Fixed in 31070b1: zero-sized data is now in the PERF_LIMITATION grep (matching scripts/perf_record.sh), so a no-sample capture is classified as a perf limitation and the QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping the unexpected-failure exit.

chatgpt-codex-connector · 2026-06-21T06:50:04Z

+SUBTITLE="$ARTIFACT_TYPE | $HOST | $EVENT @ ${FREQ}Hz | ${SAMPLE_COUNT} samples | ${STACK_COUNT} stacks | $DATE"
+
+# Render the SVG (deterministic for a fixed folded input + fixed subtitle).
+if [[ "$STACK_COUNT" -gt 0 ]]; then


Remove stale SVGs on zero-stack partial runs

When a partial run captures no folded stacks, this branch skips writing OUT_SVG but leaves any existing results/flamegraph.svg in place. A constrained rerun can therefore publish a fresh .txt saying there is no clean sample report while the old SVG from a previous host remains beside it and can be committed as if it matched. Please either replace/remove the SVG on zero-stack runs or emit a placeholder that matches the .txt classification.

Useful? React with 👍 / 👎.

Fixed in 31070b1: a zero-stack run now rm -f "$OUT_SVG", so a constrained rerun can't leave a previous host's SVG beside a .txt that reports no sample report.

Address two Codex review findings in scripts/flamegraph.py::parse_collapsed: - Prefer a tab separator when present so a tab-separated folded line whose stack contains spaces (C++ signatures) splits on the trailing count instead of an interior space and is silently dropped. - Ignore non-positive sample counts, so hand-crafted --from-collapsed input with 0/negative counts cannot render a misleading SVG (all-non-positive input now fails with exit 1 via the existing empty-folded guard). Adds test coverage in tests/shell/test_flamegraph.sh (19/19). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

flamegraph.py is a provenance input, so regenerate results/flamegraph.svg + .txt from the clean tree to keep the Source digest consistent (423 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested complexity). Restructure without changing output: - fold_perf_script: move per-line state into a small _Folder helper so the parsing loop is a flat if/elif/else instead of a nested block. - render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was 7) and extract _append_chrome, _frame_svg, _truncate; geometry constants (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass carries derived geometry. Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt from the clean tree (402 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d4be2daf64

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-22T01:27:23Z

+    PERF_LIMITATION=yes
+fi
+
+SAMPLE_TOKEN="$(sed -nE 's/.*\(([0-9][0-9.,]*[KkMm]?) samples\).*/\1/p' "$RECORD_ERR" | head -1)"


Use folded sample totals for the sample gate

On perf versions that print the record summary with an estimate marker, e.g. (~2526 samples) as shown in the perf tutorial's collecting-samples example (which also notes this record summary is only an estimate: https://perfwiki.github.io/main/tutorial/#collecting-samples), this regex leaves SAMPLE_TOKEN empty and SAMPLE_COUNT becomes 0. A valid run with plenty of folded stacks is then classified as insufficient and make flamegraph exits 3 unless partial mode is enabled; even when the regex does match, the gate is based on perf's estimate rather than the actual folded sample total. Please derive the minimum-sample gate from the folded counts or at least accept the optional ~ and verify against actual folded samples.

Useful? React with 👍 / 👎.

Fixed in 31070b1: the regex now accepts the optional ~ ((~?N samples)), and — more importantly — the minimum-sample gate now uses the authoritative folded sample total (sum of folded counts), not perf record's estimate. Both are reported (folded total + perf estimate).

chatgpt-codex-connector · 2026-06-22T01:27:23Z

+# Fold to collapsed stacks for the text summary and as an SVG precondition.
+STACK_COUNT=0
+if [[ "$SCRIPT_STATUS" -eq 0 && -s "$SCRIPT_OUT" ]]; then
+    python3 scripts/flamegraph.py --collapse-only <"$SCRIPT_OUT" >"$FOLDED" 2>/dev/null || true


Fail on collapsed-stack renderer errors

When perf script succeeds but flamegraph.py --collapse-only fails, this || true converts the renderer failure into an empty folded file and STACK_COUNT=0. In runs with QSL_PERF_ALLOW_PARTIAL=1, that publishes a constrained/partial artifact instead of failing, so a Python incompatibility or parser regression can be mistaken for a perf sampling limitation; capture the collapse status/stderr and treat nonzero as an unexpected generator failure.

Useful? React with 👍 / 👎.

Fixed in 31070b1: --collapse-only now captures COLLAPSE_STATUS (and stderr) instead of || true; a renderer/parser failure exits 4 (unmaskable) rather than being published as a constrained-environment artifact.

Clear the last two CodeScene flags on scripts/flamegraph.py: - _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings never contain parens, so a non-nested " (...)$" match is exact and won't strip a C++ signature's own parentheses. - _layout: drop the unused `total` parameter (5 args -> 4). Output unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provenance input changed; regenerate from clean tree (416 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3905059eba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-22T01:40:30Z

+        echo "Caveat: this flamegraph is a software cpu-clock sampling profile for hot-symbol"
+        echo "investigation. Frame width is proportional to on-CPU samples, not wall-clock"
+        echo "latency or throughput, and is hardware/kernel/compiler/build dependent."


Label hardware-event flamegraphs accurately

When QSL_FLAMEGRAPH_EVENT=cycles is used (the new perf docs advertise this hardware-PMU mode), ARTIFACT_TYPE changes to flamegraph (cycles hardware-PMU...), but the emitted text companion still hard-codes the caveat as a software cpu-clock profile. The SVG provenance comment has the same wording, so a regenerated hardware-event artifact would contain contradictory metadata and violate the repo's measured-evidence classification rules; make these lines derive from the selected event/artifact type.

Useful? React with 👍 / 👎.

Fixed in 31070b1: a single SAMPLE_KIND (software cpu-clock/task-clock vs hardware-PMU) is derived from the event and drives the artifact type, the SVG provenance caveat, and the text-companion caveat, so QSL_FLAMEGRAPH_EVENT=cycles no longer yields contradictory cpu-clock wording. Artifact regenerated in 06b7675.

…ifact

Address five Codex review findings on the flamegraph driver: 1. Classify `zero-sized data` (perf script's no-sample report) as a perf limitation, matching scripts/perf_record.sh, so the documented QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping the unexpected-failure exit. 2. Remove any prior results/flamegraph.svg when a partial run captures no folded stacks, so a constrained rerun cannot leave a previous host's SVG beside a .txt that says there is no sample report. 3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the minimum-sample gate on the authoritative folded sample total rather than perf record's self-described estimate. Report both counts. 4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`; a renderer/parser failure now exits 4 (unmaskable) rather than being published as a constrained-environment artifact. 5. Derive the sampling-kind label/caveat from the selected event (software cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment, and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded samples / 159 stacks, classified `flamegraph (software cpu-clock sampling hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the hardened scripts/flamegraph.sh; the .txt reports both the folded total and perf record's estimate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-06-22T02:19:36Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Propagate the #129/#130/#131 Codex fixes up the stack and resolve the v0.2.1 release-anchor Codex findings (#133): - Reconcile the PROGRESS/HANDOFF backlog conflict to the release state: #29 (FIX adapter) and #32 (flamegraph) are both done, shipped in v0.2.1; no open legacy backlog remains. - Sync AGENTS.md and CLAUDE.md released-state anchors from v0.2.0 to v0.2.1, recording the #129/#130/#131 content (the canonical files a resume reads). - Refresh docs/release_readiness.md for v0.2.1: 263/263 tests, the FIX-adapter and flamegraph coverage, the bare-metal flamegraph artifact, and the next release re-pointed to v0.2.1. - Update the v0.2.1 current-state test count 261 -> 263 (the two new FIX rejection tests) and add a decision-log entry for this second Codex round. make check / make asan 263/263 on the bare-metal Apple M2 Fedora Asahi host. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ifact

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21998201de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-22T16:39:58Z

+    exit 1
+fi
+
+mkdir -p "$(dirname "$OUT_SVG")" "$(dirname "$DATA")"


Create the text output directory before publishing

When QSL_FLAMEGRAPH_TXT is overridden to a path in a directory that does not already exist, this only creates the SVG and perf-data directories, but qsl_publish_artifact later calls mktemp inside the text output directory. That makes make flamegraph fail after the SVG may already have been published, leaving a half-generated artifact set instead of honoring the advertised text-output override; include $(dirname "$OUT_TXT") in the directory creation.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-22T16:39:58Z

+if grep -Eiq 'zero-sized data|No samples|failed to open|Permission denied|Operation not permitted|perf_event_open|not supported|Operation not supported|perf not found for kernel|linux-tools' \
+    "$RECORD_ERR" "$SCRIPT_ERR"; then


Classify perf no-permission errors as partial

On hosts where perf record reports the standard access error No permission to enable ... event without also printing Operation not permitted, this grep leaves PERF_LIMITATION=no; I reproduced that path with a fake perf and QSL_PERF_ALLOW_PARTIAL=1, and the script still exited 3 after writing a constrained artifact. Since perf's own error path emits No permission to enable %s event, include that diagnostic so permission-limited Linux hosts follow the documented partial-artifact path instead of being treated as unexpected generator failures.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-22T16:39:58Z

+            if self._comm:
+                frames.insert(0, self._comm)
+            key = ";".join(frames)
+            self.folded[key] = self.folded.get(key, 0) + 1


Weight folded stacks by perf sample period

For perf-script records whose header period is greater than 1 (the test fixture itself uses ...: 1000 cpu-clock:u:), _flush() still adds only 1 to the collapsed stack. The standard stackcollapse-perf.pl parses that event period and records stacks with $m_period, so make flamegraph can skew frame widths and understate folded totals for cpu-clock or hardware-event runs where perf emits variable periods.

Useful? React with 👍 / 👎.

The flamegraph artifact, generator, provenance companion, and docs already existed but no page actually displayed the SVG — it was only referenced by filename. Embed the rendered results/flamegraph.svg as a visible image under the Benchmarks section, with a caption that classifies it honestly as a software cpu-clock sampling hot-symbol profile (not PMU evidence), names the hot frames, and links the provenance .txt and docs/perf_analysis.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

codescene-delta-analysis

Gates Passed
6 Quality Gates Passed

See analysis details in CodeScene

Quality Gate Profile: Pay Down Tech Debt
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.

* docs: sync resume anchors and PMU claims to v0.2.0 (Codex #127/#128 follow-up) Resolve the Codex review findings left on `main` by PRs #127 and #128: - PROGRESS.md: remove the stale "Next action remains" block that still steered /resume to the merged PR #125 on `perf/linux-host-artifact-refresh`; replace with the v0.2.0 between-releases state (no active milestone; #94/#90 gated). - AGENTS.md: bring it into sync with CLAUDE.md's v0.2.0 partial-PMU reframe. The constraints bullet, the "correct claim" block, and the "M29 perf evidence status" subsection no longer label the artifacts "constrained Docker validation"; the stale `perf/linux-host-artifact-refresh` follow-up line is updated (also in CLAUDE.md) to the released state. - docs/perf_analysis.md: narrow the PMU claim so it no longer implies the Apple Blizzard (E-core) PMU carries live counts. The `apple_blizzard_pmu/...` rows read `<not counted>` in results/perf_stat_linux.txt because the single-threaded benchmark stays on the Avalanche P-cores — expected scheduling, not a counter. Docs/memory only; no code or artifacts changed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: add flamegraph generator and make target (#32) Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32. The perf stat/record text workflow already existed; this renders a perf call-graph flamegraph. - scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a results/flamegraph.txt provenance/classification companion (top folded stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs unwind correctly despite the Release `bench` preset omitting frame pointers. - scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG renderer, so the artifact is reproducible from the repo without vendoring the Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure function of the name, no RNG/timestamps in the drawn body. - tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, determinism, empty-input handling. - docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md, AGENTS.md), MILESTONES.md backlog, PROGRESS.md log. `make check` 242/242. Full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: add generated flamegraph artifact on bare-metal Fedora Asahi (#32) results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph` from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a latency/throughput claim; full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: harden flamegraph collapsed-stack parsing (Codex review) Address two Codex review findings in scripts/flamegraph.py::parse_collapsed: - Prefer a tab separator when present so a tab-separated folded line whose stack contains spaces (C++ signatures) splits on the trailing count instead of an interior space and is silently dropped. - Ignore non-positive sample counts, so hand-crafted --from-collapsed input with 0/negative counts cannot render a misleading SVG (all-non-positive input now fails with exit 1 via the existing empty-folded guard). Adds test coverage in tests/shell/test_flamegraph.sh (19/19). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after parser hardening flamegraph.py is a provenance input, so regenerate results/flamegraph.svg + .txt from the clean tree to keep the Source digest consistent (423 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: improve flamegraph.py code health (CodeScene gate) CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested complexity). Restructure without changing output: - fold_perf_script: move per-line state into a small _Folder helper so the parsing loop is a flat if/elif/else instead of a nested block. - render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was 7) and extract _append_chrome, _frame_svg, _truncate; geometry constants (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass carries derived geometry. Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after code-health refactor flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt from the clean tree (402 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: flatten flamegraph.py remaining complexity (CodeScene) Clear the last two CodeScene flags on scripts/flamegraph.py: - _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings never contain parens, so a non-nested " (...)$" match is exact and won't strip a C++ signature's own parentheses. - _layout: drop the unused `total` parameter (5 args -> 4). Output unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after complexity flattening Provenance input changed; regenerate from clean tree (416 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: record resume-anchor sync in PROGRESS current-state (Codex #129) The decision-log entry and bottom "Next action remains" block already covered this follow-up, but the top `## Current state` bullets — the first thing `/resume` reads — still presented the v0.2.0 release (PR #127) as the "Last action". A resuming agent could therefore miss that this resume-anchor / PMU sync already happened and duplicate it. Record the sync as the current "Last action" and "Last completed docs sync", demoting the v0.2.0 release detail to a "Prior action" bullet. Docs-only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: harden flamegraph.sh classification + sample gating (Codex #130) Address five Codex review findings on the flamegraph driver: 1. Classify `zero-sized data` (perf script's no-sample report) as a perf limitation, matching scripts/perf_record.sh, so the documented QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping the unexpected-failure exit. 2. Remove any prior results/flamegraph.svg when a partial run captures no folded stacks, so a constrained rerun cannot leave a previous host's SVG beside a .txt that says there is no sample report. 3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the minimum-sample gate on the authoritative folded sample total rather than perf record's self-described estimate. Report both counts. 4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`; a renderer/parser failure now exits 4 (unmaskable) rather than being published as a constrained-environment artifact. 5. Derive the sampling-kind label/caveat from the selected event (software cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment, and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after classification hardening Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded samples / 159 stacks, classified `flamegraph (software cpu-clock sampling hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the hardened scripts/flamegraph.sh; the .txt reports both the folded total and perf record's estimate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: scope partial-PMU claim to perf-stat; perf-record is a software profile (Codex #129) The constraints bullet labeled all perf artifacts as partial hardware PMU evidence, but only results/perf_stat_linux.txt carries real PMU counters (cycles/instructions/branches/branch-misses). results/perf_report_linux.txt is a software cpu-clock sampling profile, not PMU evidence. Scope the claim to the perf-stat artifact and call out perf-record separately, identically in AGENTS.md and CLAUDE.md so the two memories stay in sync. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: embed the flamegraph as a visible image in the README The flamegraph artifact, generator, provenance companion, and docs already existed but no page actually displayed the SVG — it was only referenced by filename. Embed the rendered results/flamegraph.svg as a visible image under the Benchmarks section, with a caption that classifies it honestly as a software cpu-clock sampling hot-symbol profile (not PMU evidence), names the hot frames, and links the provenance .txt and docs/perf_analysis.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

* docs: sync resume anchors and PMU claims to v0.2.0 (Codex #127/#128 follow-up) Resolve the Codex review findings left on `main` by PRs #127 and #128: - PROGRESS.md: remove the stale "Next action remains" block that still steered /resume to the merged PR #125 on `perf/linux-host-artifact-refresh`; replace with the v0.2.0 between-releases state (no active milestone; #94/#90 gated). - AGENTS.md: bring it into sync with CLAUDE.md's v0.2.0 partial-PMU reframe. The constraints bullet, the "correct claim" block, and the "M29 perf evidence status" subsection no longer label the artifacts "constrained Docker validation"; the stale `perf/linux-host-artifact-refresh` follow-up line is updated (also in CLAUDE.md) to the released state. - docs/perf_analysis.md: narrow the PMU claim so it no longer implies the Apple Blizzard (E-core) PMU carries live counts. The `apple_blizzard_pmu/...` rows read `<not counted>` in results/perf_stat_linux.txt because the single-threaded benchmark stays on the Avalanche P-cores — expected scheduling, not a counter. Docs/memory only; no code or artifacts changed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: add flamegraph generator and make target (#32) Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32. The perf stat/record text workflow already existed; this renders a perf call-graph flamegraph. - scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a results/flamegraph.txt provenance/classification companion (top folded stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs unwind correctly despite the Release `bench` preset omitting frame pointers. - scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG renderer, so the artifact is reproducible from the repo without vendoring the Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure function of the name, no RNG/timestamps in the drawn body. - tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, determinism, empty-input handling. - docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md, AGENTS.md), MILESTONES.md backlog, PROGRESS.md log. `make check` 242/242. Full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: add generated flamegraph artifact on bare-metal Fedora Asahi (#32) results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph` from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a latency/throughput claim; full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat: add FIX-like text protocol adapter (#29) Add a human-readable `tag=value` text protocol alongside the binary codec, mapping the same internal message structs (issue #29, reprioritized by the human from the backlog). - include/qsl/protocol/fix.hpp + src/protocol/fix.cpp: SOH-framed `tag=value` adapter with genuine FIX framing — 8 BeginString / 9 BodyLength / 35 MsgType / ... / 10 mod-256 CheckSum — for the client->gateway order path: NewOrderSingle (35=D) -> NewOrder, OrderCancelRequest (35=F) -> CancelOrder. Decoding is total, deterministic, and noexcept (fixed field table, std::from_chars, std::string_view; no heap on the decode path) and reports every malformed input through a FixError taxonomy mirroring the binary codec's DecodeError. Documented, deliberate simplifications: Symbol (55) carries the numeric SymbolId; Price (44) carries integer ticks and is always present, so NewOrder<->FIX is a lossless bijection like the binary codec (price is never a float). - tests/unit/test_fix_protocol.cpp: mirrors the binary required tests and adds a cross-codec equivalence test (binary and FIX decode the same order to identical structs across all Side x OrdType x TIF), a byte-pinned fixture, and rejection of malformed framing / unsupported BeginString / unknown-or-wrong MsgType / BodyLength mismatch / CheckSum mismatch / missing field / invalid field / invalid enum / out-of-range / oversized messages. - docs/fix_protocol.md (+ pointer from docs/binary_protocol.md); MILESTONES.md and PROGRESS.md updated. make check 260/260; make asan 260/260 (the adapter parses untrusted text). Closes #29. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: harden flamegraph collapsed-stack parsing (Codex review) Address two Codex review findings in scripts/flamegraph.py::parse_collapsed: - Prefer a tab separator when present so a tab-separated folded line whose stack contains spaces (C++ signatures) splits on the trailing count instead of an interior space and is silently dropped. - Ignore non-positive sample counts, so hand-crafted --from-collapsed input with 0/negative counts cannot render a misleading SVG (all-non-positive input now fails with exit 1 via the existing empty-folded guard). Adds test coverage in tests/shell/test_flamegraph.sh (19/19). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after parser hardening flamegraph.py is a provenance input, so regenerate results/flamegraph.svg + .txt from the clean tree to keep the Source digest consistent (423 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: enforce FIX-required ClOrdID on OrderCancelRequest (Codex review) decode_cancel_order validated only OrigClOrdID (41) and Symbol (55), so a 35=F message missing ClOrdID (tag 11) — which FIX requires and encode() emits — was accepted. Validate tag 11 (present and numeric) on decode without storing it, keeping decode symmetric with encode. Adds a rejection test and clarifies the docs/fix_protocol.md note for tag 11. make check 261/261; make asan 261/261. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: improve flamegraph.py code health (CodeScene gate) CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested complexity). Restructure without changing output: - fold_perf_script: move per-line state into a small _Folder helper so the parsing loop is a flat if/elif/else instead of a nested block. - render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was 7) and extract _append_chrome, _frame_svg, _truncate; geometry constants (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass carries derived geometry. Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after code-health refactor flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt from the clean tree (402 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: reduce decode_new_order complexity in fix.cpp (CodeScene gate) CodeScene's delta gate scored src/protocol/fix.cpp at 8.02 (decode_new_order: Complex Method / Complex Conditional / Overall Code Complexity). Restructure without changing behavior: - Extract map_side / map_ord_type / map_tif (the enum-code switches) and expect_msg_type (the tag-35 check) as small helpers. - Add a FieldReader accumulator that reads required integer/coded fields and short-circuits on the first error, so decode_new_order and decode_cancel_order become a flat fluent chain plus a single error check instead of a long if-return ladder with three inline switches. make check 261/261; make asan 261/261; FIX tests 19 cases / 140 assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: flatten flamegraph.py remaining complexity (CodeScene) Clear the last two CodeScene flags on scripts/flamegraph.py: - _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings never contain parens, so a non-nested " (...)$" match is exact and won't strip a C++ signature's own parentheses. - _layout: drop the unused `total` parameter (5 args -> 4). Output unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after complexity flattening Provenance input changed; regenerate from clean tree (416 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: remove decoder duplication and split parse_envelope (CodeScene) Clear the remaining CodeScene flags on src/protocol/fix.cpp (Code Duplication, Complex Method, Complex Conditional): - Extract a decode_typed<T> skeleton (validate envelope -> confirm MsgType -> fill body via FieldReader -> error check) so decode_new_order and decode_cancel_order collapse to just their field maps, removing the duplicated prologue/epilogue. - Split parse_envelope into tokenize / check_envelope_shape / verify_length_and_checksum, each a small single-purpose function, and fold the 8/9/10 ordering check into a named bool. Behavior unchanged: make check 261/261, make asan 261/261, FIX tests 19 cases / 140 assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: table-driven enum maps + simpler msg-type check (CodeScene) Clear the last two CodeScene advisory flags on src/protocol/fix.cpp: - Code Duplication: the three near-identical map_side/map_ord_type/map_tif switches are replaced by one generic FieldReader::coded(tag, out, table) that looks a code up in a small constexpr {code, enum} table; the decoder supplies the per-enum tables. No per-enum mapping duplication remains. - Complex Conditional: expect_msg_type's 3-term `||` is split so each branch has a single operator. make check 261/261; make asan 261/261; FIX tests 19 cases / 140 assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: record resume-anchor sync in PROGRESS current-state (Codex #129) The decision-log entry and bottom "Next action remains" block already covered this follow-up, but the top `## Current state` bullets — the first thing `/resume` reads — still presented the v0.2.0 release (PR #127) as the "Last action". A resuming agent could therefore miss that this resume-anchor / PMU sync already happened and duplicate it. Record the sync as the current "Last action" and "Last completed docs sync", demoting the v0.2.0 release detail to a "Prior action" bullet. Docs-only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: harden flamegraph.sh classification + sample gating (Codex #130) Address five Codex review findings on the flamegraph driver: 1. Classify `zero-sized data` (perf script's no-sample report) as a perf limitation, matching scripts/perf_record.sh, so the documented QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping the unexpected-failure exit. 2. Remove any prior results/flamegraph.svg when a partial run captures no folded stacks, so a constrained rerun cannot leave a previous host's SVG beside a .txt that says there is no sample report. 3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the minimum-sample gate on the authoritative folded sample total rather than perf record's self-described estimate. Report both counts. 4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`; a renderer/parser failure now exits 4 (unmaskable) rather than being published as a constrained-environment artifact. 5. Derive the sampling-kind label/caveat from the selected event (software cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment, and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after classification hardening Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded samples / 159 stacks, classified `flamegraph (software cpu-clock sampling hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the hardened scripts/flamegraph.sh; the .txt reports both the folded total and perf record's estimate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: enforce FIX envelope MsgType position + reject duplicate tags (Codex #131) Two Codex review findings on the FIX adapter's parser strictness: - check_envelope_shape now requires MsgType (35) as the first body field, immediately after BodyLength, so a non-standard header like 8/9/34/35/.../10 is rejected as Malformed instead of decoding via a first-match scan. - tokenize now rejects any repeated tag. This adapter maps each business tag exactly once (no repeating groups), so a duplicate such as `55=2` then `55=999` is an ambiguous/malformed frame rather than a silently-ignored later value. Adds a deterministic rejection test for each. (The earlier ClOrdID-required finding was already resolved by 3e4c8e3 and is covered by an existing test.) make check 263/263, make asan 263/263. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: drop delivered #29 from open-backlog anchors (Codex #131) The decision log marks #29 (FIX adapter) closed by this PR, but the current-state / resume anchors in PROGRESS.md and HANDOFF.md still listed it as open backlog, so /resume could send the next session to re-implement work this PR just added. Remove #29 from those backlog lists and note it as delivered in this PR. (#32 stays listed here; the v0.2.1 release PR removes it as part of its release sweep.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: scope partial-PMU claim to perf-stat; perf-record is a software profile (Codex #129) The constraints bullet labeled all perf artifacts as partial hardware PMU evidence, but only results/perf_stat_linux.txt carries real PMU counters (cycles/instructions/branches/branch-misses). results/perf_report_linux.txt is a software cpu-clock sampling profile, not PMU evidence. Scope the claim to the perf-stat artifact and call out perf-record separately, identically in AGENTS.md and CLAUDE.md so the two memories stay in sync. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: embed the flamegraph as a visible image in the README The flamegraph artifact, generator, provenance companion, and docs already existed but no page actually displayed the SVG — it was only referenced by filename. Embed the rendered results/flamegraph.svg as a visible image under the Benchmarks section, with a caption that classifies it honestly as a software cpu-clock sampling hot-symbol profile (not PMU evidence), names the hot frames, and links the provenance .txt and docs/perf_analysis.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5093beb518

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-22T16:50:08Z

+    ARTIFACT_TYPE="constrained-environment validation (partial; insufficient samples for hot-symbol conclusions)"
+fi
+
+PROVENANCE="$(qsl_emit_provenance "$PROVENANCE_SCOPE" "$OUT_SVG" "${PROVENANCE_INPUTS[@]}")"


Generate provenance for each flamegraph output

When make flamegraph writes the text companion, this single provenance block is emitted with OUT_SVG, so results/flamegraph.txt currently self-reports Generated output: results/flamegraph.svg. That corrupts the metadata for the .txt artifact (and any QSL_FLAMEGRAPH_TXT override), even though the results policy expects each artifact to identify its own generated output; emit separate provenance for the SVG and text outputs or make the text file explicitly self-identify.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-22T16:50:08Z

+case "$EVENT" in
+cpu-clock | task-clock) SAMPLE_KIND="software $EVENT sampling" ;;
+*) SAMPLE_KIND="$EVENT hardware-PMU sampling" ;;


Strip perf modifiers before classifying sample source

The new classifier handles the default cpu-clock, but a common perf event spelling such as QSL_FLAMEGRAPH_EVENT=cpu-clock:u or another kernel software event like context-switches falls into the *) arm and is published as hardware-PMU sampling. Linux documents cpu-clock, task-clock, page faults, and context switches as software events in perf_event_open(2), so this can produce contradictory evidence labels in both the SVG and .txt; normalize the base event name before deciding whether it is PMU-backed.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-22T16:50:08Z

  (E-core) PMU carries live counts — the `apple_blizzard_pmu/...` rows read `<not counted>` in
  `results/perf_stat_linux.txt` because the single-threaded benchmark stays on the Avalanche P-cores.
  Docs/memory only; no code or artifacts changed.
+- [2026-06-21] Issue #32 flamegraph profiling artifact (`perf/flamegraph-artifact`, stacked on the


Refresh resume anchors after closing #32

This new entry records issue #32 as done and adds a 242nd check plus a new flamegraph artifact, but the current-state resume anchors still list #32 as open legacy backlog and report the old 241-test / 15-result state. Because /resume reads those top-level anchors before the history log, the next agent can restart the just-finished flamegraph work or report stale verification; update PROGRESS/HANDOFF current state alongside this completion entry.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-22T16:50:08Z

+    echo "Insufficient samples: $INSUFFICIENT_SAMPLES"
+    echo "Record status: $RECORD_STATUS"
+    echo "Script status: $SCRIPT_STATUS"
+    echo "Perf access limitation: $PERF_LIMITATION"


Include perf stderr in partial flamegraph reports

When QSL_PERF_ALLOW_PARTIAL=1 is used on a no-sample or permission-limited host, the published .txt only records numeric statuses and this boolean, but it drops the actual perf record / perf script diagnostics that justified the constrained classification. That makes a committed partial artifact hard to audit or distinguish from an unexpected perf/script failure after the fact; include the captured stderr blocks in the text companion, as the existing perf-record report does.

Useful? React with 👍 / 👎.

…), anchor sweep (#133) * docs: sync resume anchors and PMU claims to v0.2.0 (Codex #127/#128 follow-up) Resolve the Codex review findings left on `main` by PRs #127 and #128: - PROGRESS.md: remove the stale "Next action remains" block that still steered /resume to the merged PR #125 on `perf/linux-host-artifact-refresh`; replace with the v0.2.0 between-releases state (no active milestone; #94/#90 gated). - AGENTS.md: bring it into sync with CLAUDE.md's v0.2.0 partial-PMU reframe. The constraints bullet, the "correct claim" block, and the "M29 perf evidence status" subsection no longer label the artifacts "constrained Docker validation"; the stale `perf/linux-host-artifact-refresh` follow-up line is updated (also in CLAUDE.md) to the released state. - docs/perf_analysis.md: narrow the PMU claim so it no longer implies the Apple Blizzard (E-core) PMU carries live counts. The `apple_blizzard_pmu/...` rows read `<not counted>` in results/perf_stat_linux.txt because the single-threaded benchmark stays on the Avalanche P-cores — expected scheduling, not a counter. Docs/memory only; no code or artifacts changed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: add flamegraph generator and make target (#32) Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32. The perf stat/record text workflow already existed; this renders a perf call-graph flamegraph. - scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a results/flamegraph.txt provenance/classification companion (top folded stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs unwind correctly despite the Release `bench` preset omitting frame pointers. - scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG renderer, so the artifact is reproducible from the repo without vendoring the Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure function of the name, no RNG/timestamps in the drawn body. - tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, determinism, empty-input handling. - docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md, AGENTS.md), MILESTONES.md backlog, PROGRESS.md log. `make check` 242/242. Full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: add generated flamegraph artifact on bare-metal Fedora Asahi (#32) results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph` from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a latency/throughput claim; full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat: add FIX-like text protocol adapter (#29) Add a human-readable `tag=value` text protocol alongside the binary codec, mapping the same internal message structs (issue #29, reprioritized by the human from the backlog). - include/qsl/protocol/fix.hpp + src/protocol/fix.cpp: SOH-framed `tag=value` adapter with genuine FIX framing — 8 BeginString / 9 BodyLength / 35 MsgType / ... / 10 mod-256 CheckSum — for the client->gateway order path: NewOrderSingle (35=D) -> NewOrder, OrderCancelRequest (35=F) -> CancelOrder. Decoding is total, deterministic, and noexcept (fixed field table, std::from_chars, std::string_view; no heap on the decode path) and reports every malformed input through a FixError taxonomy mirroring the binary codec's DecodeError. Documented, deliberate simplifications: Symbol (55) carries the numeric SymbolId; Price (44) carries integer ticks and is always present, so NewOrder<->FIX is a lossless bijection like the binary codec (price is never a float). - tests/unit/test_fix_protocol.cpp: mirrors the binary required tests and adds a cross-codec equivalence test (binary and FIX decode the same order to identical structs across all Side x OrdType x TIF), a byte-pinned fixture, and rejection of malformed framing / unsupported BeginString / unknown-or-wrong MsgType / BodyLength mismatch / CheckSum mismatch / missing field / invalid field / invalid enum / out-of-range / oversized messages. - docs/fix_protocol.md (+ pointer from docs/binary_protocol.md); MILESTONES.md and PROGRESS.md updated. make check 260/260; make asan 260/260 (the adapter parses untrusted text). Closes #29. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: harden flamegraph collapsed-stack parsing (Codex review) Address two Codex review findings in scripts/flamegraph.py::parse_collapsed: - Prefer a tab separator when present so a tab-separated folded line whose stack contains spaces (C++ signatures) splits on the trailing count instead of an interior space and is silently dropped. - Ignore non-positive sample counts, so hand-crafted --from-collapsed input with 0/negative counts cannot render a misleading SVG (all-non-positive input now fails with exit 1 via the existing empty-folded guard). Adds test coverage in tests/shell/test_flamegraph.sh (19/19). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after parser hardening flamegraph.py is a provenance input, so regenerate results/flamegraph.svg + .txt from the clean tree to keep the Source digest consistent (423 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: enforce FIX-required ClOrdID on OrderCancelRequest (Codex review) decode_cancel_order validated only OrigClOrdID (41) and Symbol (55), so a 35=F message missing ClOrdID (tag 11) — which FIX requires and encode() emits — was accepted. Validate tag 11 (present and numeric) on decode without storing it, keeping decode symmetric with encode. Adds a rejection test and clarifies the docs/fix_protocol.md note for tag 11. make check 261/261; make asan 261/261. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: improve flamegraph.py code health (CodeScene gate) CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested complexity). Restructure without changing output: - fold_perf_script: move per-line state into a small _Folder helper so the parsing loop is a flat if/elif/else instead of a nested block. - render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was 7) and extract _append_chrome, _frame_svg, _truncate; geometry constants (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass carries derived geometry. Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after code-health refactor flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt from the clean tree (402 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: reduce decode_new_order complexity in fix.cpp (CodeScene gate) CodeScene's delta gate scored src/protocol/fix.cpp at 8.02 (decode_new_order: Complex Method / Complex Conditional / Overall Code Complexity). Restructure without changing behavior: - Extract map_side / map_ord_type / map_tif (the enum-code switches) and expect_msg_type (the tag-35 check) as small helpers. - Add a FieldReader accumulator that reads required integer/coded fields and short-circuits on the first error, so decode_new_order and decode_cancel_order become a flat fluent chain plus a single error check instead of a long if-return ladder with three inline switches. make check 261/261; make asan 261/261; FIX tests 19 cases / 140 assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: flatten flamegraph.py remaining complexity (CodeScene) Clear the last two CodeScene flags on scripts/flamegraph.py: - _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings never contain parens, so a non-nested " (...)$" match is exact and won't strip a C++ signature's own parentheses. - _layout: drop the unused `total` parameter (5 args -> 4). Output unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after complexity flattening Provenance input changed; regenerate from clean tree (416 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: remove decoder duplication and split parse_envelope (CodeScene) Clear the remaining CodeScene flags on src/protocol/fix.cpp (Code Duplication, Complex Method, Complex Conditional): - Extract a decode_typed<T> skeleton (validate envelope -> confirm MsgType -> fill body via FieldReader -> error check) so decode_new_order and decode_cancel_order collapse to just their field maps, removing the duplicated prologue/epilogue. - Split parse_envelope into tokenize / check_envelope_shape / verify_length_and_checksum, each a small single-purpose function, and fold the 8/9/10 ordering check into a named bool. Behavior unchanged: make check 261/261, make asan 261/261, FIX tests 19 cases / 140 assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor: table-driven enum maps + simpler msg-type check (CodeScene) Clear the last two CodeScene advisory flags on src/protocol/fix.cpp: - Code Duplication: the three near-identical map_side/map_ord_type/map_tif switches are replaced by one generic FieldReader::coded(tag, out, table) that looks a code up in a small constexpr {code, enum} table; the decoder supplies the per-enum tables. No per-enum mapping duplication remains. - Complex Conditional: expect_msg_type's 3-term `||` is split so each branch has a single operator. make check 261/261; make asan 261/261; FIX tests 19 cases / 140 assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * release: v0.2.1 — FIX text protocol adapter, perf flamegraph, anchor sweep Bump the project version 0.2.0 -> 0.2.1 and record the release. - CMakeLists.txt: project VERSION 0.2.1. - CHANGELOG.md: new [0.2.1] section — FIX-like text protocol adapter (#29), `make flamegraph` + dependency-free renderer (#32), and the Codex resume-anchor / PMU-claim consistency sweep (#127/#128 follow-up); [Unreleased] reset. - PROGRESS.md / HANDOFF.md: release and resume anchors brought to the v0.2.1 released state; #29 and #32 marked done. No code or benchmark artifacts change in this release PR. On squash-merge, tag `v0.2.1` on the merge commit and publish the GitHub release. make check 261/261. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: record resume-anchor sync in PROGRESS current-state (Codex #129) The decision-log entry and bottom "Next action remains" block already covered this follow-up, but the top `## Current state` bullets — the first thing `/resume` reads — still presented the v0.2.0 release (PR #127) as the "Last action". A resuming agent could therefore miss that this resume-anchor / PMU sync already happened and duplicate it. Record the sync as the current "Last action" and "Last completed docs sync", demoting the v0.2.0 release detail to a "Prior action" bullet. Docs-only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: harden flamegraph.sh classification + sample gating (Codex #130) Address five Codex review findings on the flamegraph driver: 1. Classify `zero-sized data` (perf script's no-sample report) as a perf limitation, matching scripts/perf_record.sh, so the documented QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping the unexpected-failure exit. 2. Remove any prior results/flamegraph.svg when a partial run captures no folded stacks, so a constrained rerun cannot leave a previous host's SVG beside a .txt that says there is no sample report. 3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the minimum-sample gate on the authoritative folded sample total rather than perf record's self-described estimate. Report both counts. 4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`; a renderer/parser failure now exits 4 (unmaskable) rather than being published as a constrained-environment artifact. 5. Derive the sampling-kind label/caveat from the selected event (software cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment, and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf: regenerate flamegraph artifact after classification hardening Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded samples / 159 stacks, classified `flamegraph (software cpu-clock sampling hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the hardened scripts/flamegraph.sh; the .txt reports both the folded total and perf record's estimate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: enforce FIX envelope MsgType position + reject duplicate tags (Codex #131) Two Codex review findings on the FIX adapter's parser strictness: - check_envelope_shape now requires MsgType (35) as the first body field, immediately after BodyLength, so a non-standard header like 8/9/34/35/.../10 is rejected as Malformed instead of decoding via a first-match scan. - tokenize now rejects any repeated tag. This adapter maps each business tag exactly once (no repeating groups), so a duplicate such as `55=2` then `55=999` is an ambiguous/malformed frame rather than a silently-ignored later value. Adds a deterministic rejection test for each. (The earlier ClOrdID-required finding was already resolved by 3e4c8e3 and is covered by an existing test.) make check 263/263, make asan 263/263. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: drop delivered #29 from open-backlog anchors (Codex #131) The decision log marks #29 (FIX adapter) closed by this PR, but the current-state / resume anchors in PROGRESS.md and HANDOFF.md still listed it as open backlog, so /resume could send the next session to re-implement work this PR just added. Remove #29 from those backlog lists and note it as delivered in this PR. (#32 stays listed here; the v0.2.1 release PR removes it as part of its release sweep.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: scope partial-PMU claim to perf-stat; perf-record is a software profile (Codex #129) The constraints bullet labeled all perf artifacts as partial hardware PMU evidence, but only results/perf_stat_linux.txt carries real PMU counters (cycles/instructions/branches/branch-misses). results/perf_report_linux.txt is a software cpu-clock sampling profile, not PMU evidence. Scope the claim to the perf-stat artifact and call out perf-record separately, identically in AGENTS.md and CLAUDE.md so the two memories stay in sync. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: embed the flamegraph as a visible image in the README The flamegraph artifact, generator, provenance companion, and docs already existed but no page actually displayed the SVG — it was only referenced by filename. Embed the rendered results/flamegraph.svg as a visible image under the Benchmarks section, with a caption that classifies it honestly as a software cpu-clock sampling hot-symbol profile (not PMU evidence), names the hot frames, and links the provenance .txt and docs/perf_analysis.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

div0rce and others added 2 commits June 21, 2026 02:36