From 587778dd8453c236e44524b7151e4466cf784cc3 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 02:23:23 -0400 Subject: [PATCH 1/3] docs: sync resume anchors and PMU claims to v0.2.0 (Codex #127/#128 follow-up) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Resolve the Codex review findings left on `main` by PRs #127 and #128: - PROGRESS.md: remove the stale "Next action remains" block that still steered /resume to the merged PR #125 on `perf/linux-host-artifact-refresh`; replace with the v0.2.0 between-releases state (no active milestone; #94/#90 gated). - AGENTS.md: bring it into sync with CLAUDE.md's v0.2.0 partial-PMU reframe. The constraints bullet, the "correct claim" block, and the "M29 perf evidence status" subsection no longer label the artifacts "constrained Docker validation"; the stale `perf/linux-host-artifact-refresh` follow-up line is updated (also in CLAUDE.md) to the released state. - docs/perf_analysis.md: narrow the PMU claim so it no longer implies the Apple Blizzard (E-core) PMU carries live counts. The `apple_blizzard_pmu/...` rows read `` in results/perf_stat_linux.txt because the single-threaded benchmark stays on the Avalanche P-cores — expected scheduling, not a counter. Docs/memory only; no code or artifacts changed. Co-Authored-By: Claude Opus 4.8 --- AGENTS.md | 47 +++++++++++++++++++++++++++---------------- CLAUDE.md | 12 ++++++----- PROGRESS.md | 31 ++++++++++++++++++++-------- docs/perf_analysis.md | 12 +++++++---- 4 files changed, 68 insertions(+), 34 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 429bd1e..3b303db 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -159,7 +159,12 @@ Known constraints: - The gateway and feed are loopback-only, unauthenticated simulator surfaces. - The core engine cannot depend on wall-clock time or floating-point prices. -- M29 perf artifacts are constrained-environment evidence until issue #90 is completed. +- Perf artifacts are now **partial hardware PMU evidence** from a bare-metal Apple M2 (aarch64) + Fedora Asahi host: real `cycles`/`instructions`/`branches`/`branch-misses`, but + `cache-references`/`cache-misses` are unsupported by the Apple Silicon PMU. Issue #90's residual + is the cache-counter set specifically, which needs a PMU microarchitecture that exposes it + (x86_64, or an ARM server core) — bare metal alone is not enough. Do not relabel these as either + "full PMU evidence" or "constrained Docker validation". - Issue #94 external review remains one of the highest remaining credibility signals; do not imply independent review has happened until `docs/review_feedback.md` records it. @@ -1144,10 +1149,14 @@ aesthetic product work before M24-M49 unless the human explicitly changes priori The correct claim after this arc is: > "correctness-first deterministic exchange-systems lab with measured concurrency, allocator, -> constrained Linux perf workflow, and socket-profiling evidence." +> bare-metal partial-PMU Linux perf, and socket-profiling evidence." -Do not claim real hardware PMU evidence until issue #90 is completed on a bare-metal or -PMU-capable Linux target. Current M29 artifacts are constrained-environment validation only. +Real hardware PMU evidence now exists on a bare-metal Apple M2 (aarch64) Fedora Asahi host — +`cycles`/`instructions`/`branches`/`branch-misses` are genuine counters. Do not claim *full* PMU +evidence (the Apple Silicon PMU does not expose `cache-references`/`cache-misses`), and do not call +the current artifacts "constrained Docker validation" either: they are **partial hardware PMU +evidence**. Issue #90's residual is the cache-counter set, which needs a PMU microarchitecture that +exposes it (x86_64, or an ARM server core). The incorrect claims remain forbidden: @@ -1185,18 +1194,20 @@ M29 currently means: - Metadata-rich profiling artifacts exist. - Dirty-tree handling exists. - PMU preflight/validation exists. -- Constrained-environment validation exists. +- Bare-metal partial-PMU validation exists. - CI validation exists. - The workflow is reproducible. -M29 does **not** currently mean real hardware PMU evidence has been captured. The committed -artifacts were generated in a constrained Docker Desktop Linux environment where hardware PMU -counters and sampling were unavailable or permission-limited. Do not claim real PMU evidence at -this time. +M29 now means **partial hardware PMU evidence**: the committed artifacts were regenerated on a +bare-metal Apple M2 (aarch64) Fedora Asahi host (`systemd-detect-virt` reports `none`), where +`perf stat` reads genuine `cycles`/`instructions`/`branches`/`branch-misses` counters off the Apple +Avalanche P-core PMU. They are no longer "constrained Docker validation" — but they are not *full* +PMU evidence either, because the Apple Silicon PMU does not expose `cache-references`/`cache-misses`. -Issue #90 tracks full PMU-backed evidence generation on a bare-metal Linux host or a Linux VM/server -with real `perf_event` hardware counter access. Treat this as: problem identified -> limitation -documented -> follow-up issue created -> acceptance criteria defined. This is intentional +Issue #90 now tracks only that residual: a *full* counter set (including cache events) requires a +PMU microarchitecture that exposes those events to Linux (e.g. an x86_64 Intel/AMD host, or an ARM +server core such as Graviton/Ampere) — not "more bare metal." Treat this as: problem identified -> +limitation documented -> follow-up issue created -> acceptance criteria defined. This is intentional engineering transparency, not a repo deficiency. ## Dynamic-analysis limits @@ -1241,8 +1252,10 @@ M45 exchange-grade persistence prototype; M46 recovery benchmarking; M47 contigu storage and cache-locality study; M48 DPDK research/prototype; M49 NIC offload and low-latency networking study. -Issue #90 remains the full hardware-PMU evidence debt. Issues #99 and #110 were addressed by PR -#111. Issues #95, #28, and #26 were addressed by PR #112. Issue #94 is the external technical -review request and remains one of the highest remaining credibility signals. PR #124 completed M49; -the current follow-up branch `perf/linux-host-artifact-refresh` refreshes Linux host artifacts on -Fedora Asahi without adding new networking claims. +Issue #90 remains the full hardware-PMU evidence debt (the cache-counter set specifically). Issues +#99 and #110 were addressed by PR #111. Issues #95, #28, and #26 were addressed by PR #112. Issue +#94 is the external technical review request and remains one of the highest remaining credibility +signals. PR #124 completed M49, PR #125 (d9094df) refreshed the Linux host artifacts on bare-metal +Fedora Asahi, and `v0.2.0` was released (PR #127 ded6e80; resume-anchor sync PR #128 ae93545). There +is no active milestone; the highest-value remaining work is non-code and gated on #94 (external +review) and #90 (full cache-PMU evidence on a PMU-capable microarchitecture). diff --git a/CLAUDE.md b/CLAUDE.md index 13aad01..5c52266 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1196,8 +1196,10 @@ M45 exchange-grade persistence prototype; M46 recovery benchmarking; M47 contigu storage and cache-locality study; M48 DPDK research/prototype; M49 NIC offload and low-latency networking study. -Issue #90 remains the full hardware-PMU evidence debt. Issues #99 and #110 were addressed by PR -#111. Issues #95, #28, and #26 were addressed by PR #112. Issue #94 is the external technical -review request and remains one of the highest remaining credibility signals. PR #124 completed M49; -the current follow-up branch `perf/linux-host-artifact-refresh` refreshes Linux host artifacts on -Fedora Asahi without adding new networking claims. +Issue #90 remains the full hardware-PMU evidence debt (the cache-counter set specifically). Issues +#99 and #110 were addressed by PR #111. Issues #95, #28, and #26 were addressed by PR #112. Issue +#94 is the external technical review request and remains one of the highest remaining credibility +signals. PR #124 completed M49, PR #125 (d9094df) refreshed the Linux host artifacts on bare-metal +Fedora Asahi, and `v0.2.0` was released (PR #127 ded6e80; resume-anchor sync PR #128 ae93545). There +is no active milestone; the highest-value remaining work is non-code and gated on #94 (external +review) and #90 (full cache-PMU evidence on a PMU-capable microarchitecture). diff --git a/PROGRESS.md b/PROGRESS.md index 5f06551..070fece 100644 --- a/PROGRESS.md +++ b/PROGRESS.md @@ -350,6 +350,18 @@ Lower priority: supersedes CodeRabbit's PR #126, whose generated tests covered only trimming and were based on `d8c16b2` (where `qsl_publish_artifact` does not yet exist, so #126 could not merge before #125); #126 was closed as superseded. Do not merge from automation; the human squash-merges PR #125. +- [2026-06-21] Codex-followup resume-anchor sync (`docs/codex-resume-anchor-sync`). Resolved the + Codex review findings left on `main` by PRs #127 and #128: (1) removed PROGRESS.md's stale + "Next action remains" block that still pointed `/resume` at the merged PR #125 on + `perf/linux-host-artifact-refresh`, replacing it with the v0.2.0 between-releases state (#94/#90 + gated); (2) brought AGENTS.md into sync with CLAUDE.md's v0.2.0 partial-PMU reframe — the + constraints bullet, the "correct claim" block, and the "M29 perf evidence status" subsection no + longer call the artifacts "constrained Docker validation," and the stale + `perf/linux-host-artifact-refresh` follow-up line was updated in both AGENTS.md and CLAUDE.md to + the released state; (3) narrowed docs/perf_analysis.md so it no longer implies the Apple Blizzard + (E-core) PMU carries live counts — the `apple_blizzard_pmu/...` rows read `` in + `results/perf_stat_linux.txt` because the single-threaded benchmark stays on the Avalanche P-cores. + Docs/memory only; no code or artifacts changed. - [2026-06-03] M35: implemented a multi-client TCP connection-scaling load test (`scripts/socket_load.sh`, `make socket-load`, Linux-only) driving N concurrent `qsl-client`s against the portable TCP and epoll (M34) gateways; `results/socket_load_summary.txt` is Docker-generated and constrained. A `/code-review` (3 finder agents) caught and fixed real measurement-integrity bugs before the PR: a failed trial's `wall=0` no longer poisons the reported best (only trials whose gateway served count toward the min); the `completed` column reports the WORST per-trial completion, not the last, so partial/total trial failures are surfaced rather than masked; a per-client `timeout` bounds a hang if the gateway dies; and `QSL_LOAD_TRIALS` is validated. Post-PR hardening uses fresh monotonic ports per gateway start, retries transient startup/serve failures on new ports, and refuses to write a partial artifact unless `QSL_LOAD_ALLOW_PARTIAL=1` is set intentionally; the refreshed artifact records `Dirty tree: no`. The scaling-shape claim remains constrained to loopback connection setup, not a demonstrated production-capacity advantage for either transport. Deferred follow-up: a shared `scripts/lib` to remove the dirty-tree / `wait_ready` / gateway-stop duplication across the three socket scripts. - [2026-06-03] M35: started after M34 (#98) squash-merged (commit 9e3750b). Scope: multi-client load / socket-pressure testing of the gateway/feed path (TCP/UDP stress, socket-buffer pressure, connection scaling, backpressure) building on M34's epoll multi-client path and M30's socket tooling. Constraints: scripts/tests document load shape + environment; results must distinguish kernel/socket pressure from user-space engine cost; no production-capacity claims (honest constrained-environment framing, like M29/M30). - [2026-06-04] M35: PR #100 squash-merged to `main` as a86b701 after all CI jobs and review checks were green. M35 is now landed; original M36 NUMA remains deferred until the repository-health refactor analysis is completed or explicitly skipped by the human. @@ -756,14 +768,17 @@ Quant Systems Lab — Linux Systems + Exchange Infrastructure Simulator ## Next action remains -Current action is the Linux host artifact refresh PR #125 on `perf/linux-host-artifact-refresh`: -wait for human review / CI and do not merge from automation. M49 (PR #124) is already merged to -`main` as d8c16b2. The refreshed artifacts are host-specific Linux evidence — partial Apple PMU -counters (cycles/instructions/branches/branch-misses) with cache-reference/cache-miss counters -unsupported — not NIC-offload, latency, or full hardware-PMU evidence. - -Issue #90 remains the evidence debt for full Linux hardware PMU artifacts (cache counters). Work it -only on a PMU-capable Linux host; do not relabel constrained or partial artifacts as full evidence. +There is no active milestone. `v0.2.0` is released (PR #127 ded6e80, tag on ded6e80, marked Latest; +resume-anchor sync PR #128 ae93545). M0–M49, the Linux host artifact refresh (PR #125, d9094df), and +the v0.2.0 release are all merged to `main`. The committed perf artifacts are **partial hardware PMU +evidence** from this bare-metal Apple M2 (aarch64) Fedora Asahi host — real +cycles/instructions/branches/branch-misses with cache-reference/cache-miss counters unsupported by +the Apple Silicon PMU — not NIC-offload, latency, or full hardware-PMU evidence. + +Highest-value remaining work is non-code and gated: issue #94 (independent external review) and +issue #90 (full cache-PMU evidence). Issue #90 needs a PMU **microarchitecture** that exposes cache +counters to Linux (x86_64, or an ARM server core); do not relabel partial artifacts as full +evidence. Do not work either from automation; the human drives them. After each squash merge, return to this file and update state factually. If benchmark numbers are not measured, write `not measured`. Do not guess. Nobody is impressed by imaginary throughput. diff --git a/docs/perf_analysis.md b/docs/perf_analysis.md index 8cb1ccc..3ab3881 100644 --- a/docs/perf_analysis.md +++ b/docs/perf_analysis.md @@ -11,10 +11,14 @@ CI validation, and a reproducible command path. The committed artifacts are now generated on a **bare-metal Linux host** — an Apple MacBook Air (M2, aarch64) running Fedora Asahi Remix, directly on the hardware (`systemd-detect-virt` reports -`none`, no `hypervisor` CPU flag). `perf stat` reads **real hardware counters** off the Apple -Avalanche (P-core) and Blizzard (E-core) PMUs: `cycles`, `instructions`, `branches`, and -`branch-misses` are live. The artifact is therefore classified **partial hardware PMU evidence**, -not constrained-environment validation: the counters that are present are real, not emulated. +`none`, no `hypervisor` CPU flag). On this heterogeneous SoC `perf` opens each event against both +PMU instances — the Apple Avalanche (P-core) and Blizzard (E-core) PMUs — but the single-threaded +benchmark is scheduled on the performance cores, so **the Avalanche counters carry the real +counts**: `cycles`, `instructions`, `branches`, and `branch-misses` are live there. The +corresponding `apple_blizzard_pmu/...` rows read `` in `results/perf_stat_linux.txt` +because the workload never ran on the E-cores — that is expected scheduling behavior, not a missing +counter. The artifact is therefore classified **partial hardware PMU evidence**, not +constrained-environment validation: the counters that are present are real, not emulated. The residual gap is specific and is what issue #90 now tracks: the Apple Silicon PMU, as exposed by the current Asahi kernel driver, does **not** implement the generic `cache-references` / From dfa4da28d3b5ca94efd33c066aeb9725b0de1464 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 22:13:47 -0400 Subject: [PATCH 2/3] docs: record resume-anchor sync in PROGRESS current-state (Codex #129) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The decision-log entry and bottom "Next action remains" block already covered this follow-up, but the top `## Current state` bullets — the first thing `/resume` reads — still presented the v0.2.0 release (PR #127) as the "Last action". A resuming agent could therefore miss that this resume-anchor / PMU sync already happened and duplicate it. Record the sync as the current "Last action" and "Last completed docs sync", demoting the v0.2.0 release detail to a "Prior action" bullet. Docs-only. Co-Authored-By: Claude Opus 4.8 --- PROGRESS.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/PROGRESS.md b/PROGRESS.md index 070fece..5888c0e 100644 --- a/PROGRESS.md +++ b/PROGRESS.md @@ -26,14 +26,22 @@ Do not rely on prior chat memory. - **Last completed milestone:** M49 — NIC offload and low-latency networking study (PR #124, d8c16b2), then the Linux host artifact refresh (PR #125, d9094df) and the v0.2.0 release (PR #127, ded6e80) -- **Last completed docs sync:** v0.2.0 documentation staleness sweep (PR #127): perf evidence - reframed as bare-metal partial PMU, release-readiness rewritten, every doc read and brought current +- **Last completed docs sync:** resume-anchor + PMU-claim sync (`docs/codex-resume-anchor-sync`, + this PR) resolving the Codex findings left on `main` by PRs #127/#128. Prior sweep: v0.2.0 + documentation staleness sweep (PR #127): perf evidence reframed as bare-metal partial PMU, + release-readiness rewritten, every doc read and brought current - **Release:** `v0.1.0` (tag on 9857e1a) and `v0.2.0` (tag on ded6e80, marked Latest) published as GitHub-only releases; no packages published - **`make check` passing:** yes — `make check` 241/241 and `make asan` 241/241 on the bare-metal Apple M2 (aarch64) Fedora Asahi host on 2026-06-21 -- **Last action:** prepared and released `v0.2.0`. Reframed the perf evidence from "constrained - Docker validation" to **partial hardware PMU evidence** on a bare-metal Apple M2 (real +- **Last action:** resume-anchor + PMU-claim sync on `docs/codex-resume-anchor-sync` resolving the + Codex review findings left on `main` by PRs #127/#128 — removed PROGRESS's stale "Next action + remains" block that still pointed `/resume` at the merged PR #125, brought AGENTS.md in line with + CLAUDE.md's v0.2.0 partial-PMU reframe (no more "constrained Docker validation" wording), and + narrowed docs/perf_analysis.md so the Apple Blizzard (E-core) PMU rows are not implied to carry + live counts. Docs/memory only; no code or artifacts changed (`make check` still 241/241). +- **Prior action (v0.2.0 release):** prepared and released `v0.2.0`. Reframed the perf evidence from + "constrained Docker validation" to **partial hardware PMU evidence** on a bare-metal Apple M2 (real cycles/instructions/branches/branch-misses; cache-references/cache-misses unsupported by the Apple Silicon PMU), with a new three-way `perf_stat.sh` classifier and a reframed issue #90. Regenerated all 15 `results/*.txt` on bare metal (`Dirty inputs: no`, MAC-leak grep clean), bumped the project From 4a2aa67dd70769faef37a8a9cf983b302e0742c7 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Mon, 22 Jun 2026 12:28:33 -0400 Subject: [PATCH 3/3] docs: scope partial-PMU claim to perf-stat; perf-record is a software profile (Codex #129) The constraints bullet labeled all perf artifacts as partial hardware PMU evidence, but only results/perf_stat_linux.txt carries real PMU counters (cycles/instructions/branches/branch-misses). results/perf_report_linux.txt is a software cpu-clock sampling profile, not PMU evidence. Scope the claim to the perf-stat artifact and call out perf-record separately, identically in AGENTS.md and CLAUDE.md so the two memories stay in sync. Co-Authored-By: Claude Opus 4.8 --- AGENTS.md | 14 ++++++++------ CLAUDE.md | 14 ++++++++------ 2 files changed, 16 insertions(+), 12 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 3b303db..0a7abc4 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -159,12 +159,14 @@ Known constraints: - The gateway and feed are loopback-only, unauthenticated simulator surfaces. - The core engine cannot depend on wall-clock time or floating-point prices. -- Perf artifacts are now **partial hardware PMU evidence** from a bare-metal Apple M2 (aarch64) - Fedora Asahi host: real `cycles`/`instructions`/`branches`/`branch-misses`, but - `cache-references`/`cache-misses` are unsupported by the Apple Silicon PMU. Issue #90's residual - is the cache-counter set specifically, which needs a PMU microarchitecture that exposes it - (x86_64, or an ARM server core) — bare metal alone is not enough. Do not relabel these as either - "full PMU evidence" or "constrained Docker validation". +- The `perf stat` artifact (`results/perf_stat_linux.txt`) is now **partial hardware PMU evidence** + from a bare-metal Apple M2 (aarch64) Fedora Asahi host: real + `cycles`/`instructions`/`branches`/`branch-misses`, but `cache-references`/`cache-misses` are + unsupported by the Apple Silicon PMU. Issue #90's residual is the cache-counter set specifically, + which needs a PMU microarchitecture that exposes it (x86_64, or an ARM server core) — bare metal + alone is not enough. Do not relabel it "full PMU evidence" or "constrained Docker validation". The + `perf record` hot-symbol report (`results/perf_report_linux.txt`) is a **software cpu-clock + sampling** profile, not PMU evidence. - Issue #94 external review remains one of the highest remaining credibility signals; do not imply independent review has happened until `docs/review_feedback.md` records it. diff --git a/CLAUDE.md b/CLAUDE.md index 5c52266..5fbdf6c 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -159,12 +159,14 @@ Known constraints: - The gateway and feed are loopback-only, unauthenticated simulator surfaces. - The core engine cannot depend on wall-clock time or floating-point prices. -- Perf artifacts are now **partial hardware PMU evidence** from a bare-metal Apple M2 (aarch64) - Fedora Asahi host: real `cycles`/`instructions`/`branches`/`branch-misses`, but - `cache-references`/`cache-misses` are unsupported by the Apple Silicon PMU. Issue #90's residual - is the cache-counter set specifically, which needs a PMU microarchitecture that exposes it - (x86_64, or an ARM server core) — bare metal alone is not enough. Do not relabel these as either - "full PMU evidence" or "constrained Docker validation". +- The `perf stat` artifact (`results/perf_stat_linux.txt`) is now **partial hardware PMU evidence** + from a bare-metal Apple M2 (aarch64) Fedora Asahi host: real + `cycles`/`instructions`/`branches`/`branch-misses`, but `cache-references`/`cache-misses` are + unsupported by the Apple Silicon PMU. Issue #90's residual is the cache-counter set specifically, + which needs a PMU microarchitecture that exposes it (x86_64, or an ARM server core) — bare metal + alone is not enough. Do not relabel it "full PMU evidence" or "constrained Docker validation". The + `perf record` hot-symbol report (`results/perf_report_linux.txt`) is a **software cpu-clock + sampling** profile, not PMU evidence. - Issue #94 external review remains one of the highest remaining credibility signals; do not imply independent review has happened until `docs/review_feedback.md` records it.