Skip to content

perf: generate perf call-graph flamegraph artifact (#32)#130

Closed
div0rce wants to merge 13 commits into
docs/codex-resume-anchor-syncfrom
perf/flamegraph-artifact
Closed

perf: generate perf call-graph flamegraph artifact (#32)#130
div0rce wants to merge 13 commits into
docs/codex-resume-anchor-syncfrom
perf/flamegraph-artifact

Conversation

@div0rce

@div0rce div0rce commented Jun 21, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #32 — the missing-flamegraph follow-up. PR #89 landed the perf stat / perf record / text perf report --stdio workflow but did not produce a flamegraph; this adds make flamegraph and commits a real artifact generated on the bare-metal Apple M2 (aarch64) Fedora Asahi host.

Stacked PR. Based on docs/codex-resume-anchor-sync (#129) so the diff is clean; merge #129 first and GitHub will retarget this to main.

What's added

  • scripts/flamegraph.py — a dependency-free (Python stdlib only) stackcollapse + SVG flamegraph renderer. Deliberately not vendoring Brendan Gregg's Perl FlameGraph toolkit, so the artifact is reproducible from this repo alone. Deterministic by construction: frames sorted by name, colors a pure function of the frame name, no RNG and no timestamps in the drawn body.
  • scripts/flamegraph.sh — driver modeled on perf_record.sh: Linux-only, records perf record --call-graph dwarf -F 4000 -g -e cpu-clock on qsl-bench, reuses qsl_common.sh provenance + qsl_publish_artifact, and honors QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs are used because the Release bench preset omits frame pointers; application symbols still resolve from the symbol table without changing the optimization level under measurement.
  • make flamegraph target (+ .PHONY, command lists in CLAUDE.md/AGENTS.md).
  • tests/shell/test_flamegraph.sh — CTest-registered (python3-only; skips cleanly if absent), 16 assertions over folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, render determinism, and empty-input handling.
  • Docs: docs/perf_analysis.md (flamegraph section + artifact entry), results/README.md, MILESTONES.md backlog, PROGRESS.md log.

Committed artifact

results/flamegraph.svg (+ results/flamegraph.txt provenance/classification companion) generated from the clean committed tree: 397 cpu-clock samples, 171 folded stacks, Dirty inputs: no, well-formed XML. Hot paths resolve to real engine symbols — OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow.

Honesty / scope

This is a software cpu-clock sampling hot-symbol profile (frame width ∝ on-CPU samples), not a latency or throughput measurement, and is hardware/build dependent. Full hardware cache-PMU evidence remains tracked by #90.

Verification

  • make check242/242 (includes the new qsl_flamegraph_render CTest).
  • make flamegraph regenerates the artifact with Dirty inputs: no; SVG validated well-formed via xml.dom.minidom; geometry checked (root frame spans the plot; no NaN/negative widths); git grep confirms no non-broadcast MAC leaked into the artifact.
  • bash -n scripts/flamegraph.sh clean.

🤖 Generated with Claude Code

div0rce and others added 2 commits June 21, 2026 02:36
Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32.
The perf stat/record text workflow already existed; this renders a perf
call-graph flamegraph.

- scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e
  cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a
  results/flamegraph.txt provenance/classification companion (top folded
  stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance
  + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained
  hosts. DWARF call graphs unwind correctly despite the Release `bench` preset
  omitting frame pointers.
- scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG
  renderer, so the artifact is reproducible from the repo without vendoring the
  Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure
  function of the name, no RNG/timestamps in the drawn body.
- tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly
  if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base,
  count aggregation, sortedness), SVG well-formedness, XML escaping,
  determinism, empty-input handling.
- docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md,
  AGENTS.md), MILESTONES.md backlog, PROGRESS.md log.

`make check` 242/242. Full hardware cache-PMU evidence stays in #90.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph`
from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi
host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot
paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the
dispatch_storage cancel path, decode_new_order, the gateway Session path,
replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a
latency/throughput claim; full hardware cache-PMU evidence stays in #90.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 82278c5a-2514-4edb-842f-25e2a78e6777

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/flamegraph-artifact

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

codescene-delta-analysis[bot]

This comment was marked as outdated.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: beec2d0c11

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/flamegraph.sh Outdated
Comment on lines +108 to +109
if grep -Eiq 'No samples|failed to open|Permission denied|Operation not permitted|perf_event_open|not supported|Operation not supported|perf not found for kernel|linux-tools' \
"$RECORD_ERR" "$SCRIPT_ERR"; then

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Classify zero-sized perf data as partial

In no-sample environments perf script can report zero-sized data; the existing scripts/perf_record.sh wrapper treats that as a perf limitation, but this grep omits it. In that scenario SCRIPT_STATUS is nonzero, PERF_LIMITATION stays no, and the later failure path exits 3 even when QSL_PERF_ALLOW_PARTIAL=1, so the documented constrained-host artifact path does not work. Please include the same no-sample diagnostic here.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 31070b1: zero-sized data is now in the PERF_LIMITATION grep (matching scripts/perf_record.sh), so a no-sample capture is classified as a perf limitation and the QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping the unexpected-failure exit.

Comment thread scripts/flamegraph.sh
SUBTITLE="$ARTIFACT_TYPE | $HOST | $EVENT @ ${FREQ}Hz | ${SAMPLE_COUNT} samples | ${STACK_COUNT} stacks | $DATE"

# Render the SVG (deterministic for a fixed folded input + fixed subtitle).
if [[ "$STACK_COUNT" -gt 0 ]]; then

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove stale SVGs on zero-stack partial runs

When a partial run captures no folded stacks, this branch skips writing OUT_SVG but leaves any existing results/flamegraph.svg in place. A constrained rerun can therefore publish a fresh .txt saying there is no clean sample report while the old SVG from a previous host remains beside it and can be committed as if it matched. Please either replace/remove the SVG on zero-stack runs or emit a placeholder that matches the .txt classification.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 31070b1: a zero-stack run now rm -f "$OUT_SVG", so a constrained rerun can't leave a previous host's SVG beside a .txt that reports no sample report.

div0rce and others added 2 commits June 21, 2026 08:54
Address two Codex review findings in scripts/flamegraph.py::parse_collapsed:

- Prefer a tab separator when present so a tab-separated folded line whose stack
  contains spaces (C++ signatures) splits on the trailing count instead of an
  interior space and is silently dropped.
- Ignore non-positive sample counts, so hand-crafted --from-collapsed input with
  0/negative counts cannot render a misleading SVG (all-non-positive input now
  fails with exit 1 via the existing empty-folded guard).

Adds test coverage in tests/shell/test_flamegraph.sh (19/19).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
flamegraph.py is a provenance input, so regenerate results/flamegraph.svg +
.txt from the clean tree to keep the Source digest consistent (423 samples,
Dirty inputs: no).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

div0rce and others added 2 commits June 21, 2026 21:13
CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large
Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested
complexity). Restructure without changing output:

- fold_perf_script: move per-line state into a small _Folder helper so the
  parsing loop is a flat if/elif/else instead of a nested block.
- render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was
  7) and extract _append_chrome, _frame_svg, _truncate; geometry constants
  (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass
  carries derived geometry.

Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt
from the clean tree (402 samples, Dirty inputs: no).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d4be2daf64

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/flamegraph.sh Outdated
PERF_LIMITATION=yes
fi

SAMPLE_TOKEN="$(sed -nE 's/.*\(([0-9][0-9.,]*[KkMm]?) samples\).*/\1/p' "$RECORD_ERR" | head -1)"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use folded sample totals for the sample gate

On perf versions that print the record summary with an estimate marker, e.g. (~2526 samples) as shown in the perf tutorial's collecting-samples example (which also notes this record summary is only an estimate: https://perfwiki.github.io/main/tutorial/#collecting-samples), this regex leaves SAMPLE_TOKEN empty and SAMPLE_COUNT becomes 0. A valid run with plenty of folded stacks is then classified as insufficient and make flamegraph exits 3 unless partial mode is enabled; even when the regex does match, the gate is based on perf's estimate rather than the actual folded sample total. Please derive the minimum-sample gate from the folded counts or at least accept the optional ~ and verify against actual folded samples.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 31070b1: the regex now accepts the optional ~ ((~?N samples)), and — more importantly — the minimum-sample gate now uses the authoritative folded sample total (sum of folded counts), not perf record's estimate. Both are reported (folded total + perf estimate).

Comment thread scripts/flamegraph.sh Outdated
# Fold to collapsed stacks for the text summary and as an SVG precondition.
STACK_COUNT=0
if [[ "$SCRIPT_STATUS" -eq 0 && -s "$SCRIPT_OUT" ]]; then
python3 scripts/flamegraph.py --collapse-only <"$SCRIPT_OUT" >"$FOLDED" 2>/dev/null || true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail on collapsed-stack renderer errors

When perf script succeeds but flamegraph.py --collapse-only fails, this || true converts the renderer failure into an empty folded file and STACK_COUNT=0. In runs with QSL_PERF_ALLOW_PARTIAL=1, that publishes a constrained/partial artifact instead of failing, so a Python incompatibility or parser regression can be mistaken for a perf sampling limitation; capture the collapse status/stderr and treat nonzero as an unexpected generator failure.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 31070b1: --collapse-only now captures COLLAPSE_STATUS (and stderr) instead of || true; a renderer/parser failure exits 4 (unmaskable) rather than being published as a constrained-environment artifact.

div0rce and others added 2 commits June 21, 2026 21:28
Clear the last two CodeScene flags on scripts/flamegraph.py:

- _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a
  flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings
  never contain parens, so a non-nested " (...)$" match is exact and won't strip
  a C++ signature's own parentheses.
- _layout: drop the unused `total` parameter (5 args -> 4).

Output unchanged; tests/shell/test_flamegraph.sh 19/19.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Provenance input changed; regenerate from clean tree (416 samples, Dirty
inputs: no).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3905059eba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/flamegraph.sh Outdated
Comment on lines +200 to +202
echo "Caveat: this flamegraph is a software cpu-clock sampling profile for hot-symbol"
echo "investigation. Frame width is proportional to on-CPU samples, not wall-clock"
echo "latency or throughput, and is hardware/kernel/compiler/build dependent."

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Label hardware-event flamegraphs accurately

When QSL_FLAMEGRAPH_EVENT=cycles is used (the new perf docs advertise this hardware-PMU mode), ARTIFACT_TYPE changes to flamegraph (cycles hardware-PMU...), but the emitted text companion still hard-codes the caveat as a software cpu-clock profile. The SVG provenance comment has the same wording, so a regenerated hardware-event artifact would contain contradictory metadata and violate the repo's measured-evidence classification rules; make these lines derive from the selected event/artifact type.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 31070b1: a single SAMPLE_KIND (software cpu-clock/task-clock vs hardware-PMU) is derived from the event and drives the artifact type, the SVG provenance caveat, and the text-companion caveat, so QSL_FLAMEGRAPH_EVENT=cycles no longer yields contradictory cpu-clock wording. Artifact regenerated in 06b7675.

div0rce and others added 3 commits June 21, 2026 22:14
Address five Codex review findings on the flamegraph driver:

1. Classify `zero-sized data` (perf script's no-sample report) as a perf
   limitation, matching scripts/perf_record.sh, so the documented
   QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping
   the unexpected-failure exit.
2. Remove any prior results/flamegraph.svg when a partial run captures no
   folded stacks, so a constrained rerun cannot leave a previous host's SVG
   beside a .txt that says there is no sample report.
3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the
   minimum-sample gate on the authoritative folded sample total rather than
   perf record's self-described estimate. Report both counts.
4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`;
   a renderer/parser failure now exits 4 (unmaskable) rather than being
   published as a constrained-environment artifact.
5. Derive the sampling-kind label/caveat from the selected event (software
   cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment,
   and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded
samples / 159 stacks, classified `flamegraph (software cpu-clock sampling
hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the
hardened scripts/flamegraph.sh; the .txt reports both the folded total and
perf record's estimate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

codescene-delta-analysis[bot]

This comment was marked as outdated.

div0rce added a commit that referenced this pull request Jun 22, 2026
Propagate the #129/#130/#131 Codex fixes up the stack and resolve the v0.2.1
release-anchor Codex findings (#133):

- Reconcile the PROGRESS/HANDOFF backlog conflict to the release state: #29
  (FIX adapter) and #32 (flamegraph) are both done, shipped in v0.2.1; no open
  legacy backlog remains.
- Sync AGENTS.md and CLAUDE.md released-state anchors from v0.2.0 to v0.2.1,
  recording the #129/#130/#131 content (the canonical files a resume reads).
- Refresh docs/release_readiness.md for v0.2.1: 263/263 tests, the FIX-adapter
  and flamegraph coverage, the bare-metal flamegraph artifact, and the next
  release re-pointed to v0.2.1.
- Update the v0.2.1 current-state test count 261 -> 263 (the two new FIX
  rejection tests) and add a decision-log entry for this second Codex round.

make check / make asan 263/263 on the bare-metal Apple M2 Fedora Asahi host.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21998201de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/flamegraph.sh
exit 1
fi

mkdir -p "$(dirname "$OUT_SVG")" "$(dirname "$DATA")"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Create the text output directory before publishing

When QSL_FLAMEGRAPH_TXT is overridden to a path in a directory that does not already exist, this only creates the SVG and perf-data directories, but qsl_publish_artifact later calls mktemp inside the text output directory. That makes make flamegraph fail after the SVG may already have been published, leaving a half-generated artifact set instead of honoring the advertised text-output override; include $(dirname "$OUT_TXT") in the directory creation.

Useful? React with 👍 / 👎.

Comment thread scripts/flamegraph.sh
Comment on lines +113 to +114
if grep -Eiq 'zero-sized data|No samples|failed to open|Permission denied|Operation not permitted|perf_event_open|not supported|Operation not supported|perf not found for kernel|linux-tools' \
"$RECORD_ERR" "$SCRIPT_ERR"; then

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Classify perf no-permission errors as partial

On hosts where perf record reports the standard access error No permission to enable ... event without also printing Operation not permitted, this grep leaves PERF_LIMITATION=no; I reproduced that path with a fake perf and QSL_PERF_ALLOW_PARTIAL=1, and the script still exited 3 after writing a constrained artifact. Since perf's own error path emits No permission to enable %s event, include that diagnostic so permission-limited Linux hosts follow the documented partial-artifact path instead of being treated as unexpected generator failures.

Useful? React with 👍 / 👎.

Comment thread scripts/flamegraph.py
if self._comm:
frames.insert(0, self._comm)
key = ";".join(frames)
self.folded[key] = self.folded.get(key, 0) + 1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Weight folded stacks by perf sample period

For perf-script records whose header period is greater than 1 (the test fixture itself uses ...: 1000 cpu-clock:u:), _flush() still adds only 1 to the collapsed stack. The standard stackcollapse-perf.pl parses that event period and records stacks with $m_period, so make flamegraph can skew frame widths and understate folded totals for cpu-clock or hardware-event runs where perf emits variable periods.

Useful? React with 👍 / 👎.

The flamegraph artifact, generator, provenance companion, and docs already
existed but no page actually displayed the SVG — it was only referenced by
filename. Embed the rendered results/flamegraph.svg as a visible image under
the Benchmarks section, with a caption that classifies it honestly as a
software cpu-clock sampling hot-symbol profile (not PMU evidence), names the
hot frames, and links the provenance .txt and docs/perf_analysis.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gates Passed
6 Quality Gates Passed

See analysis details in CodeScene

Quality Gate Profile: Pay Down Tech Debt
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.

@div0rce div0rce deleted the branch docs/codex-resume-anchor-sync June 22, 2026 16:43
@div0rce div0rce closed this Jun 22, 2026
div0rce added a commit that referenced this pull request Jun 22, 2026
* docs: sync resume anchors and PMU claims to v0.2.0 (Codex #127/#128 follow-up)

Resolve the Codex review findings left on `main` by PRs #127 and #128:

- PROGRESS.md: remove the stale "Next action remains" block that still steered
  /resume to the merged PR #125 on `perf/linux-host-artifact-refresh`; replace
  with the v0.2.0 between-releases state (no active milestone; #94/#90 gated).
- AGENTS.md: bring it into sync with CLAUDE.md's v0.2.0 partial-PMU reframe. The
  constraints bullet, the "correct claim" block, and the "M29 perf evidence
  status" subsection no longer label the artifacts "constrained Docker
  validation"; the stale `perf/linux-host-artifact-refresh` follow-up line is
  updated (also in CLAUDE.md) to the released state.
- docs/perf_analysis.md: narrow the PMU claim so it no longer implies the Apple
  Blizzard (E-core) PMU carries live counts. The `apple_blizzard_pmu/...` rows
  read `<not counted>` in results/perf_stat_linux.txt because the single-threaded
  benchmark stays on the Avalanche P-cores — expected scheduling, not a counter.

Docs/memory only; no code or artifacts changed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: add flamegraph generator and make target (#32)

Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32.
The perf stat/record text workflow already existed; this renders a perf
call-graph flamegraph.

- scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e
  cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a
  results/flamegraph.txt provenance/classification companion (top folded
  stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance
  + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained
  hosts. DWARF call graphs unwind correctly despite the Release `bench` preset
  omitting frame pointers.
- scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG
  renderer, so the artifact is reproducible from the repo without vendoring the
  Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure
  function of the name, no RNG/timestamps in the drawn body.
- tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly
  if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base,
  count aggregation, sortedness), SVG well-formedness, XML escaping,
  determinism, empty-input handling.
- docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md,
  AGENTS.md), MILESTONES.md backlog, PROGRESS.md log.

`make check` 242/242. Full hardware cache-PMU evidence stays in #90.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: add generated flamegraph artifact on bare-metal Fedora Asahi (#32)

results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph`
from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi
host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot
paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the
dispatch_storage cancel path, decode_new_order, the gateway Session path,
replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a
latency/throughput claim; full hardware cache-PMU evidence stays in #90.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: harden flamegraph collapsed-stack parsing (Codex review)

Address two Codex review findings in scripts/flamegraph.py::parse_collapsed:

- Prefer a tab separator when present so a tab-separated folded line whose stack
  contains spaces (C++ signatures) splits on the trailing count instead of an
  interior space and is silently dropped.
- Ignore non-positive sample counts, so hand-crafted --from-collapsed input with
  0/negative counts cannot render a misleading SVG (all-non-positive input now
  fails with exit 1 via the existing empty-folded guard).

Adds test coverage in tests/shell/test_flamegraph.sh (19/19).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: regenerate flamegraph artifact after parser hardening

flamegraph.py is a provenance input, so regenerate results/flamegraph.svg +
.txt from the clean tree to keep the Source digest consistent (423 samples,
Dirty inputs: no).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor: improve flamegraph.py code health (CodeScene gate)

CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large
Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested
complexity). Restructure without changing output:

- fold_perf_script: move per-line state into a small _Folder helper so the
  parsing loop is a flat if/elif/else instead of a nested block.
- render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was
  7) and extract _append_chrome, _frame_svg, _truncate; geometry constants
  (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass
  carries derived geometry.

Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: regenerate flamegraph artifact after code-health refactor

flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt
from the clean tree (402 samples, Dirty inputs: no).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor: flatten flamegraph.py remaining complexity (CodeScene)

Clear the last two CodeScene flags on scripts/flamegraph.py:

- _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a
  flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings
  never contain parens, so a non-nested " (...)$" match is exact and won't strip
  a C++ signature's own parentheses.
- _layout: drop the unused `total` parameter (5 args -> 4).

Output unchanged; tests/shell/test_flamegraph.sh 19/19.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: regenerate flamegraph artifact after complexity flattening

Provenance input changed; regenerate from clean tree (416 samples, Dirty
inputs: no).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: record resume-anchor sync in PROGRESS current-state (Codex #129)

The decision-log entry and bottom "Next action remains" block already
covered this follow-up, but the top `## Current state` bullets — the first
thing `/resume` reads — still presented the v0.2.0 release (PR #127) as the
"Last action". A resuming agent could therefore miss that this resume-anchor
/ PMU sync already happened and duplicate it.

Record the sync as the current "Last action" and "Last completed docs sync",
demoting the v0.2.0 release detail to a "Prior action" bullet. Docs-only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: harden flamegraph.sh classification + sample gating (Codex #130)

Address five Codex review findings on the flamegraph driver:

1. Classify `zero-sized data` (perf script's no-sample report) as a perf
   limitation, matching scripts/perf_record.sh, so the documented
   QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping
   the unexpected-failure exit.
2. Remove any prior results/flamegraph.svg when a partial run captures no
   folded stacks, so a constrained rerun cannot leave a previous host's SVG
   beside a .txt that says there is no sample report.
3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the
   minimum-sample gate on the authoritative folded sample total rather than
   perf record's self-described estimate. Report both counts.
4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`;
   a renderer/parser failure now exits 4 (unmaskable) rather than being
   published as a constrained-environment artifact.
5. Derive the sampling-kind label/caveat from the selected event (software
   cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment,
   and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: regenerate flamegraph artifact after classification hardening

Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded
samples / 159 stacks, classified `flamegraph (software cpu-clock sampling
hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the
hardened scripts/flamegraph.sh; the .txt reports both the folded total and
perf record's estimate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: scope partial-PMU claim to perf-stat; perf-record is a software profile (Codex #129)

The constraints bullet labeled all perf artifacts as partial hardware PMU
evidence, but only results/perf_stat_linux.txt carries real PMU counters
(cycles/instructions/branches/branch-misses). results/perf_report_linux.txt
is a software cpu-clock sampling profile, not PMU evidence. Scope the claim
to the perf-stat artifact and call out perf-record separately, identically in
AGENTS.md and CLAUDE.md so the two memories stay in sync.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: embed the flamegraph as a visible image in the README

The flamegraph artifact, generator, provenance companion, and docs already
existed but no page actually displayed the SVG — it was only referenced by
filename. Embed the rendered results/flamegraph.svg as a visible image under
the Benchmarks section, with a caption that classifies it honestly as a
software cpu-clock sampling hot-symbol profile (not PMU evidence), names the
hot frames, and links the provenance .txt and docs/perf_analysis.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
div0rce added a commit that referenced this pull request Jun 22, 2026
* docs: sync resume anchors and PMU claims to v0.2.0 (Codex #127/#128 follow-up)

Resolve the Codex review findings left on `main` by PRs #127 and #128:

- PROGRESS.md: remove the stale "Next action remains" block that still steered
  /resume to the merged PR #125 on `perf/linux-host-artifact-refresh`; replace
  with the v0.2.0 between-releases state (no active milestone; #94/#90 gated).
- AGENTS.md: bring it into sync with CLAUDE.md's v0.2.0 partial-PMU reframe. The
  constraints bullet, the "correct claim" block, and the "M29 perf evidence
  status" subsection no longer label the artifacts "constrained Docker
  validation"; the stale `perf/linux-host-artifact-refresh` follow-up line is
  updated (also in CLAUDE.md) to the released state.
- docs/perf_analysis.md: narrow the PMU claim so it no longer implies the Apple
  Blizzard (E-core) PMU carries live counts. The `apple_blizzard_pmu/...` rows
  read `<not counted>` in results/perf_stat_linux.txt because the single-threaded
  benchmark stays on the Avalanche P-cores — expected scheduling, not a counter.

Docs/memory only; no code or artifacts changed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: add flamegraph generator and make target (#32)

Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32.
The perf stat/record text workflow already existed; this renders a perf
call-graph flamegraph.

- scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e
  cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a
  results/flamegraph.txt provenance/classification companion (top folded
  stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance
  + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained
  hosts. DWARF call graphs unwind correctly despite the Release `bench` preset
  omitting frame pointers.
- scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG
  renderer, so the artifact is reproducible from the repo without vendoring the
  Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure
  function of the name, no RNG/timestamps in the drawn body.
- tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly
  if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base,
  count aggregation, sortedness), SVG well-formedness, XML escaping,
  determinism, empty-input handling.
- docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md,
  AGENTS.md), MILESTONES.md backlog, PROGRESS.md log.

`make check` 242/242. Full hardware cache-PMU evidence stays in #90.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: add generated flamegraph artifact on bare-metal Fedora Asahi (#32)

results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph`
from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi
host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot
paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the
dispatch_storage cancel path, decode_new_order, the gateway Session path,
replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a
latency/throughput claim; full hardware cache-PMU evidence stays in #90.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat: add FIX-like text protocol adapter (#29)

Add a human-readable `tag=value` text protocol alongside the binary codec,
mapping the same internal message structs (issue #29, reprioritized by the human
from the backlog).

- include/qsl/protocol/fix.hpp + src/protocol/fix.cpp: SOH-framed `tag=value`
  adapter with genuine FIX framing — 8 BeginString / 9 BodyLength / 35 MsgType /
  ... / 10 mod-256 CheckSum — for the client->gateway order path:
  NewOrderSingle (35=D) -> NewOrder, OrderCancelRequest (35=F) -> CancelOrder.
  Decoding is total, deterministic, and noexcept (fixed field table,
  std::from_chars, std::string_view; no heap on the decode path) and reports
  every malformed input through a FixError taxonomy mirroring the binary codec's
  DecodeError. Documented, deliberate simplifications: Symbol (55) carries the
  numeric SymbolId; Price (44) carries integer ticks and is always present, so
  NewOrder<->FIX is a lossless bijection like the binary codec (price is never a
  float).
- tests/unit/test_fix_protocol.cpp: mirrors the binary required tests and adds a
  cross-codec equivalence test (binary and FIX decode the same order to identical
  structs across all Side x OrdType x TIF), a byte-pinned fixture, and rejection
  of malformed framing / unsupported BeginString / unknown-or-wrong MsgType /
  BodyLength mismatch / CheckSum mismatch / missing field / invalid field /
  invalid enum / out-of-range / oversized messages.
- docs/fix_protocol.md (+ pointer from docs/binary_protocol.md); MILESTONES.md
  and PROGRESS.md updated.

make check 260/260; make asan 260/260 (the adapter parses untrusted text).

Closes #29.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: harden flamegraph collapsed-stack parsing (Codex review)

Address two Codex review findings in scripts/flamegraph.py::parse_collapsed:

- Prefer a tab separator when present so a tab-separated folded line whose stack
  contains spaces (C++ signatures) splits on the trailing count instead of an
  interior space and is silently dropped.
- Ignore non-positive sample counts, so hand-crafted --from-collapsed input with
  0/negative counts cannot render a misleading SVG (all-non-positive input now
  fails with exit 1 via the existing empty-folded guard).

Adds test coverage in tests/shell/test_flamegraph.sh (19/19).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: regenerate flamegraph artifact after parser hardening

flamegraph.py is a provenance input, so regenerate results/flamegraph.svg +
.txt from the clean tree to keep the Source digest consistent (423 samples,
Dirty inputs: no).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: enforce FIX-required ClOrdID on OrderCancelRequest (Codex review)

decode_cancel_order validated only OrigClOrdID (41) and Symbol (55), so a 35=F
message missing ClOrdID (tag 11) — which FIX requires and encode() emits — was
accepted. Validate tag 11 (present and numeric) on decode without storing it,
keeping decode symmetric with encode. Adds a rejection test and clarifies the
docs/fix_protocol.md note for tag 11.

make check 261/261; make asan 261/261.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor: improve flamegraph.py code health (CodeScene gate)

CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large
Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested
complexity). Restructure without changing output:

- fold_perf_script: move per-line state into a small _Folder helper so the
  parsing loop is a flat if/elif/else instead of a nested block.
- render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was
  7) and extract _append_chrome, _frame_svg, _truncate; geometry constants
  (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass
  carries derived geometry.

Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: regenerate flamegraph artifact after code-health refactor

flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt
from the clean tree (402 samples, Dirty inputs: no).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor: reduce decode_new_order complexity in fix.cpp (CodeScene gate)

CodeScene's delta gate scored src/protocol/fix.cpp at 8.02 (decode_new_order:
Complex Method / Complex Conditional / Overall Code Complexity). Restructure
without changing behavior:

- Extract map_side / map_ord_type / map_tif (the enum-code switches) and
  expect_msg_type (the tag-35 check) as small helpers.
- Add a FieldReader accumulator that reads required integer/coded fields and
  short-circuits on the first error, so decode_new_order and decode_cancel_order
  become a flat fluent chain plus a single error check instead of a long
  if-return ladder with three inline switches.

make check 261/261; make asan 261/261; FIX tests 19 cases / 140 assertions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor: flatten flamegraph.py remaining complexity (CodeScene)

Clear the last two CodeScene flags on scripts/flamegraph.py:

- _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a
  flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings
  never contain parens, so a non-nested " (...)$" match is exact and won't strip
  a C++ signature's own parentheses.
- _layout: drop the unused `total` parameter (5 args -> 4).

Output unchanged; tests/shell/test_flamegraph.sh 19/19.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: regenerate flamegraph artifact after complexity flattening

Provenance input changed; regenerate from clean tree (416 samples, Dirty
inputs: no).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor: remove decoder duplication and split parse_envelope (CodeScene)

Clear the remaining CodeScene flags on src/protocol/fix.cpp (Code Duplication,
Complex Method, Complex Conditional):

- Extract a decode_typed<T> skeleton (validate envelope -> confirm MsgType ->
  fill body via FieldReader -> error check) so decode_new_order and
  decode_cancel_order collapse to just their field maps, removing the duplicated
  prologue/epilogue.
- Split parse_envelope into tokenize / check_envelope_shape /
  verify_length_and_checksum, each a small single-purpose function, and fold the
  8/9/10 ordering check into a named bool.

Behavior unchanged: make check 261/261, make asan 261/261, FIX tests 19 cases /
140 assertions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor: table-driven enum maps + simpler msg-type check (CodeScene)

Clear the last two CodeScene advisory flags on src/protocol/fix.cpp:

- Code Duplication: the three near-identical map_side/map_ord_type/map_tif
  switches are replaced by one generic FieldReader::coded(tag, out, table) that
  looks a code up in a small constexpr {code, enum} table; the decoder supplies
  the per-enum tables. No per-enum mapping duplication remains.
- Complex Conditional: expect_msg_type's 3-term `||` is split so each branch has
  a single operator.

make check 261/261; make asan 261/261; FIX tests 19 cases / 140 assertions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: record resume-anchor sync in PROGRESS current-state (Codex #129)

The decision-log entry and bottom "Next action remains" block already
covered this follow-up, but the top `## Current state` bullets — the first
thing `/resume` reads — still presented the v0.2.0 release (PR #127) as the
"Last action". A resuming agent could therefore miss that this resume-anchor
/ PMU sync already happened and duplicate it.

Record the sync as the current "Last action" and "Last completed docs sync",
demoting the v0.2.0 release detail to a "Prior action" bullet. Docs-only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: harden flamegraph.sh classification + sample gating (Codex #130)

Address five Codex review findings on the flamegraph driver:

1. Classify `zero-sized data` (perf script's no-sample report) as a perf
   limitation, matching scripts/perf_record.sh, so the documented
   QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping
   the unexpected-failure exit.
2. Remove any prior results/flamegraph.svg when a partial run captures no
   folded stacks, so a constrained rerun cannot leave a previous host's SVG
   beside a .txt that says there is no sample report.
3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the
   minimum-sample gate on the authoritative folded sample total rather than
   perf record's self-described estimate. Report both counts.
4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`;
   a renderer/parser failure now exits 4 (unmaskable) rather than being
   published as a constrained-environment artifact.
5. Derive the sampling-kind label/caveat from the selected event (software
   cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment,
   and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: regenerate flamegraph artifact after classification hardening

Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded
samples / 159 stacks, classified `flamegraph (software cpu-clock sampling
hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the
hardened scripts/flamegraph.sh; the .txt reports both the folded total and
perf record's estimate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: enforce FIX envelope MsgType position + reject duplicate tags (Codex #131)

Two Codex review findings on the FIX adapter's parser strictness:

- check_envelope_shape now requires MsgType (35) as the first body field,
  immediately after BodyLength, so a non-standard header like 8/9/34/35/.../10
  is rejected as Malformed instead of decoding via a first-match scan.
- tokenize now rejects any repeated tag. This adapter maps each business tag
  exactly once (no repeating groups), so a duplicate such as `55=2` then
  `55=999` is an ambiguous/malformed frame rather than a silently-ignored
  later value.

Adds a deterministic rejection test for each. (The earlier ClOrdID-required
finding was already resolved by 3e4c8e3 and is covered by an existing test.)
make check 263/263, make asan 263/263.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: drop delivered #29 from open-backlog anchors (Codex #131)

The decision log marks #29 (FIX adapter) closed by this PR, but the
current-state / resume anchors in PROGRESS.md and HANDOFF.md still listed it
as open backlog, so /resume could send the next session to re-implement work
this PR just added. Remove #29 from those backlog lists and note it as
delivered in this PR. (#32 stays listed here; the v0.2.1 release PR removes it
as part of its release sweep.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: scope partial-PMU claim to perf-stat; perf-record is a software profile (Codex #129)

The constraints bullet labeled all perf artifacts as partial hardware PMU
evidence, but only results/perf_stat_linux.txt carries real PMU counters
(cycles/instructions/branches/branch-misses). results/perf_report_linux.txt
is a software cpu-clock sampling profile, not PMU evidence. Scope the claim
to the perf-stat artifact and call out perf-record separately, identically in
AGENTS.md and CLAUDE.md so the two memories stay in sync.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: embed the flamegraph as a visible image in the README

The flamegraph artifact, generator, provenance companion, and docs already
existed but no page actually displayed the SVG — it was only referenced by
filename. Embed the rendered results/flamegraph.svg as a visible image under
the Benchmarks section, with a caption that classifies it honestly as a
software cpu-clock sampling hot-symbol profile (not PMU evidence), names the
hot frames, and links the provenance .txt and docs/perf_analysis.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5093beb518

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/flamegraph.sh
ARTIFACT_TYPE="constrained-environment validation (partial; insufficient samples for hot-symbol conclusions)"
fi

PROVENANCE="$(qsl_emit_provenance "$PROVENANCE_SCOPE" "$OUT_SVG" "${PROVENANCE_INPUTS[@]}")"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Generate provenance for each flamegraph output

When make flamegraph writes the text companion, this single provenance block is emitted with OUT_SVG, so results/flamegraph.txt currently self-reports Generated output: results/flamegraph.svg. That corrupts the metadata for the .txt artifact (and any QSL_FLAMEGRAPH_TXT override), even though the results policy expects each artifact to identify its own generated output; emit separate provenance for the SVG and text outputs or make the text file explicitly self-identify.

Useful? React with 👍 / 👎.

Comment thread scripts/flamegraph.sh
Comment on lines +150 to +152
case "$EVENT" in
cpu-clock | task-clock) SAMPLE_KIND="software $EVENT sampling" ;;
*) SAMPLE_KIND="$EVENT hardware-PMU sampling" ;;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Strip perf modifiers before classifying sample source

The new classifier handles the default cpu-clock, but a common perf event spelling such as QSL_FLAMEGRAPH_EVENT=cpu-clock:u or another kernel software event like context-switches falls into the *) arm and is published as hardware-PMU sampling. Linux documents cpu-clock, task-clock, page faults, and context switches as software events in perf_event_open(2), so this can produce contradictory evidence labels in both the SVG and .txt; normalize the base event name before deciding whether it is PMU-backed.

Useful? React with 👍 / 👎.

Comment thread PROGRESS.md
(E-core) PMU carries live counts — the `apple_blizzard_pmu/...` rows read `<not counted>` in
`results/perf_stat_linux.txt` because the single-threaded benchmark stays on the Avalanche P-cores.
Docs/memory only; no code or artifacts changed.
- [2026-06-21] Issue #32 flamegraph profiling artifact (`perf/flamegraph-artifact`, stacked on the

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Refresh resume anchors after closing #32

This new entry records issue #32 as done and adds a 242nd check plus a new flamegraph artifact, but the current-state resume anchors still list #32 as open legacy backlog and report the old 241-test / 15-result state. Because /resume reads those top-level anchors before the history log, the next agent can restart the just-finished flamegraph work or report stale verification; update PROGRESS/HANDOFF current state alongside this completion entry.

Useful? React with 👍 / 👎.

Comment thread scripts/flamegraph.sh
echo "Insufficient samples: $INSUFFICIENT_SAMPLES"
echo "Record status: $RECORD_STATUS"
echo "Script status: $SCRIPT_STATUS"
echo "Perf access limitation: $PERF_LIMITATION"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include perf stderr in partial flamegraph reports

When QSL_PERF_ALLOW_PARTIAL=1 is used on a no-sample or permission-limited host, the published .txt only records numeric statuses and this boolean, but it drops the actual perf record / perf script diagnostics that justified the constrained classification. That makes a committed partial artifact hard to audit or distinguish from an unexpected perf/script failure after the fact; include the captured stderr blocks in the text companion, as the existing perf-record report does.

Useful? React with 👍 / 👎.

div0rce added a commit that referenced this pull request Jun 22, 2026
…), anchor sweep (#133)

* docs: sync resume anchors and PMU claims to v0.2.0 (Codex #127/#128 follow-up)

Resolve the Codex review findings left on `main` by PRs #127 and #128:

- PROGRESS.md: remove the stale "Next action remains" block that still steered
  /resume to the merged PR #125 on `perf/linux-host-artifact-refresh`; replace
  with the v0.2.0 between-releases state (no active milestone; #94/#90 gated).
- AGENTS.md: bring it into sync with CLAUDE.md's v0.2.0 partial-PMU reframe. The
  constraints bullet, the "correct claim" block, and the "M29 perf evidence
  status" subsection no longer label the artifacts "constrained Docker
  validation"; the stale `perf/linux-host-artifact-refresh` follow-up line is
  updated (also in CLAUDE.md) to the released state.
- docs/perf_analysis.md: narrow the PMU claim so it no longer implies the Apple
  Blizzard (E-core) PMU carries live counts. The `apple_blizzard_pmu/...` rows
  read `<not counted>` in results/perf_stat_linux.txt because the single-threaded
  benchmark stays on the Avalanche P-cores — expected scheduling, not a counter.

Docs/memory only; no code or artifacts changed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: add flamegraph generator and make target (#32)

Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32.
The perf stat/record text workflow already existed; this renders a perf
call-graph flamegraph.

- scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e
  cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a
  results/flamegraph.txt provenance/classification companion (top folded
  stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance
  + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained
  hosts. DWARF call graphs unwind correctly despite the Release `bench` preset
  omitting frame pointers.
- scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG
  renderer, so the artifact is reproducible from the repo without vendoring the
  Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure
  function of the name, no RNG/timestamps in the drawn body.
- tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly
  if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base,
  count aggregation, sortedness), SVG well-formedness, XML escaping,
  determinism, empty-input handling.
- docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md,
  AGENTS.md), MILESTONES.md backlog, PROGRESS.md log.

`make check` 242/242. Full hardware cache-PMU evidence stays in #90.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: add generated flamegraph artifact on bare-metal Fedora Asahi (#32)

results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph`
from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi
host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot
paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the
dispatch_storage cancel path, decode_new_order, the gateway Session path,
replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a
latency/throughput claim; full hardware cache-PMU evidence stays in #90.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat: add FIX-like text protocol adapter (#29)

Add a human-readable `tag=value` text protocol alongside the binary codec,
mapping the same internal message structs (issue #29, reprioritized by the human
from the backlog).

- include/qsl/protocol/fix.hpp + src/protocol/fix.cpp: SOH-framed `tag=value`
  adapter with genuine FIX framing — 8 BeginString / 9 BodyLength / 35 MsgType /
  ... / 10 mod-256 CheckSum — for the client->gateway order path:
  NewOrderSingle (35=D) -> NewOrder, OrderCancelRequest (35=F) -> CancelOrder.
  Decoding is total, deterministic, and noexcept (fixed field table,
  std::from_chars, std::string_view; no heap on the decode path) and reports
  every malformed input through a FixError taxonomy mirroring the binary codec's
  DecodeError. Documented, deliberate simplifications: Symbol (55) carries the
  numeric SymbolId; Price (44) carries integer ticks and is always present, so
  NewOrder<->FIX is a lossless bijection like the binary codec (price is never a
  float).
- tests/unit/test_fix_protocol.cpp: mirrors the binary required tests and adds a
  cross-codec equivalence test (binary and FIX decode the same order to identical
  structs across all Side x OrdType x TIF), a byte-pinned fixture, and rejection
  of malformed framing / unsupported BeginString / unknown-or-wrong MsgType /
  BodyLength mismatch / CheckSum mismatch / missing field / invalid field /
  invalid enum / out-of-range / oversized messages.
- docs/fix_protocol.md (+ pointer from docs/binary_protocol.md); MILESTONES.md
  and PROGRESS.md updated.

make check 260/260; make asan 260/260 (the adapter parses untrusted text).

Closes #29.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: harden flamegraph collapsed-stack parsing (Codex review)

Address two Codex review findings in scripts/flamegraph.py::parse_collapsed:

- Prefer a tab separator when present so a tab-separated folded line whose stack
  contains spaces (C++ signatures) splits on the trailing count instead of an
  interior space and is silently dropped.
- Ignore non-positive sample counts, so hand-crafted --from-collapsed input with
  0/negative counts cannot render a misleading SVG (all-non-positive input now
  fails with exit 1 via the existing empty-folded guard).

Adds test coverage in tests/shell/test_flamegraph.sh (19/19).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: regenerate flamegraph artifact after parser hardening

flamegraph.py is a provenance input, so regenerate results/flamegraph.svg +
.txt from the clean tree to keep the Source digest consistent (423 samples,
Dirty inputs: no).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: enforce FIX-required ClOrdID on OrderCancelRequest (Codex review)

decode_cancel_order validated only OrigClOrdID (41) and Symbol (55), so a 35=F
message missing ClOrdID (tag 11) — which FIX requires and encode() emits — was
accepted. Validate tag 11 (present and numeric) on decode without storing it,
keeping decode symmetric with encode. Adds a rejection test and clarifies the
docs/fix_protocol.md note for tag 11.

make check 261/261; make asan 261/261.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor: improve flamegraph.py code health (CodeScene gate)

CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large
Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested
complexity). Restructure without changing output:

- fold_perf_script: move per-line state into a small _Folder helper so the
  parsing loop is a flat if/elif/else instead of a nested block.
- render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was
  7) and extract _append_chrome, _frame_svg, _truncate; geometry constants
  (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass
  carries derived geometry.

Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: regenerate flamegraph artifact after code-health refactor

flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt
from the clean tree (402 samples, Dirty inputs: no).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor: reduce decode_new_order complexity in fix.cpp (CodeScene gate)

CodeScene's delta gate scored src/protocol/fix.cpp at 8.02 (decode_new_order:
Complex Method / Complex Conditional / Overall Code Complexity). Restructure
without changing behavior:

- Extract map_side / map_ord_type / map_tif (the enum-code switches) and
  expect_msg_type (the tag-35 check) as small helpers.
- Add a FieldReader accumulator that reads required integer/coded fields and
  short-circuits on the first error, so decode_new_order and decode_cancel_order
  become a flat fluent chain plus a single error check instead of a long
  if-return ladder with three inline switches.

make check 261/261; make asan 261/261; FIX tests 19 cases / 140 assertions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor: flatten flamegraph.py remaining complexity (CodeScene)

Clear the last two CodeScene flags on scripts/flamegraph.py:

- _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a
  flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings
  never contain parens, so a non-nested " (...)$" match is exact and won't strip
  a C++ signature's own parentheses.
- _layout: drop the unused `total` parameter (5 args -> 4).

Output unchanged; tests/shell/test_flamegraph.sh 19/19.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: regenerate flamegraph artifact after complexity flattening

Provenance input changed; regenerate from clean tree (416 samples, Dirty
inputs: no).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor: remove decoder duplication and split parse_envelope (CodeScene)

Clear the remaining CodeScene flags on src/protocol/fix.cpp (Code Duplication,
Complex Method, Complex Conditional):

- Extract a decode_typed<T> skeleton (validate envelope -> confirm MsgType ->
  fill body via FieldReader -> error check) so decode_new_order and
  decode_cancel_order collapse to just their field maps, removing the duplicated
  prologue/epilogue.
- Split parse_envelope into tokenize / check_envelope_shape /
  verify_length_and_checksum, each a small single-purpose function, and fold the
  8/9/10 ordering check into a named bool.

Behavior unchanged: make check 261/261, make asan 261/261, FIX tests 19 cases /
140 assertions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor: table-driven enum maps + simpler msg-type check (CodeScene)

Clear the last two CodeScene advisory flags on src/protocol/fix.cpp:

- Code Duplication: the three near-identical map_side/map_ord_type/map_tif
  switches are replaced by one generic FieldReader::coded(tag, out, table) that
  looks a code up in a small constexpr {code, enum} table; the decoder supplies
  the per-enum tables. No per-enum mapping duplication remains.
- Complex Conditional: expect_msg_type's 3-term `||` is split so each branch has
  a single operator.

make check 261/261; make asan 261/261; FIX tests 19 cases / 140 assertions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* release: v0.2.1 — FIX text protocol adapter, perf flamegraph, anchor sweep

Bump the project version 0.2.0 -> 0.2.1 and record the release.

- CMakeLists.txt: project VERSION 0.2.1.
- CHANGELOG.md: new [0.2.1] section — FIX-like text protocol adapter (#29),
  `make flamegraph` + dependency-free renderer (#32), and the Codex
  resume-anchor / PMU-claim consistency sweep (#127/#128 follow-up); [Unreleased]
  reset.
- PROGRESS.md / HANDOFF.md: release and resume anchors brought to the v0.2.1
  released state; #29 and #32 marked done.

No code or benchmark artifacts change in this release PR. On squash-merge, tag
`v0.2.1` on the merge commit and publish the GitHub release. make check 261/261.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: record resume-anchor sync in PROGRESS current-state (Codex #129)

The decision-log entry and bottom "Next action remains" block already
covered this follow-up, but the top `## Current state` bullets — the first
thing `/resume` reads — still presented the v0.2.0 release (PR #127) as the
"Last action". A resuming agent could therefore miss that this resume-anchor
/ PMU sync already happened and duplicate it.

Record the sync as the current "Last action" and "Last completed docs sync",
demoting the v0.2.0 release detail to a "Prior action" bullet. Docs-only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: harden flamegraph.sh classification + sample gating (Codex #130)

Address five Codex review findings on the flamegraph driver:

1. Classify `zero-sized data` (perf script's no-sample report) as a perf
   limitation, matching scripts/perf_record.sh, so the documented
   QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping
   the unexpected-failure exit.
2. Remove any prior results/flamegraph.svg when a partial run captures no
   folded stacks, so a constrained rerun cannot leave a previous host's SVG
   beside a .txt that says there is no sample report.
3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the
   minimum-sample gate on the authoritative folded sample total rather than
   perf record's self-described estimate. Report both counts.
4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`;
   a renderer/parser failure now exits 4 (unmaskable) rather than being
   published as a constrained-environment artifact.
5. Derive the sampling-kind label/caveat from the selected event (software
   cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment,
   and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf: regenerate flamegraph artifact after classification hardening

Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded
samples / 159 stacks, classified `flamegraph (software cpu-clock sampling
hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the
hardened scripts/flamegraph.sh; the .txt reports both the folded total and
perf record's estimate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: enforce FIX envelope MsgType position + reject duplicate tags (Codex #131)

Two Codex review findings on the FIX adapter's parser strictness:

- check_envelope_shape now requires MsgType (35) as the first body field,
  immediately after BodyLength, so a non-standard header like 8/9/34/35/.../10
  is rejected as Malformed instead of decoding via a first-match scan.
- tokenize now rejects any repeated tag. This adapter maps each business tag
  exactly once (no repeating groups), so a duplicate such as `55=2` then
  `55=999` is an ambiguous/malformed frame rather than a silently-ignored
  later value.

Adds a deterministic rejection test for each. (The earlier ClOrdID-required
finding was already resolved by 3e4c8e3 and is covered by an existing test.)
make check 263/263, make asan 263/263.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: drop delivered #29 from open-backlog anchors (Codex #131)

The decision log marks #29 (FIX adapter) closed by this PR, but the
current-state / resume anchors in PROGRESS.md and HANDOFF.md still listed it
as open backlog, so /resume could send the next session to re-implement work
this PR just added. Remove #29 from those backlog lists and note it as
delivered in this PR. (#32 stays listed here; the v0.2.1 release PR removes it
as part of its release sweep.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: scope partial-PMU claim to perf-stat; perf-record is a software profile (Codex #129)

The constraints bullet labeled all perf artifacts as partial hardware PMU
evidence, but only results/perf_stat_linux.txt carries real PMU counters
(cycles/instructions/branches/branch-misses). results/perf_report_linux.txt
is a software cpu-clock sampling profile, not PMU evidence. Scope the claim
to the perf-stat artifact and call out perf-record separately, identically in
AGENTS.md and CLAUDE.md so the two memories stay in sync.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: embed the flamegraph as a visible image in the README

The flamegraph artifact, generator, provenance companion, and docs already
existed but no page actually displayed the SVG — it was only referenced by
filename. Embed the rendered results/flamegraph.svg as a visible image under
the Benchmarks section, with a caption that classifies it honestly as a
software cpu-clock sampling hot-symbol profile (not PMU evidence), names the
hot frames, and links the provenance .txt and docs/perf_analysis.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant