Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 32 additions & 17 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,14 @@ Known constraints:

- The gateway and feed are loopback-only, unauthenticated simulator surfaces.
- The core engine cannot depend on wall-clock time or floating-point prices.
- M29 perf artifacts are constrained-environment evidence until issue #90 is completed.
- The `perf stat` artifact (`results/perf_stat_linux.txt`) is now **partial hardware PMU evidence**
from a bare-metal Apple M2 (aarch64) Fedora Asahi host: real
`cycles`/`instructions`/`branches`/`branch-misses`, but `cache-references`/`cache-misses` are
unsupported by the Apple Silicon PMU. Issue #90's residual is the cache-counter set specifically,
which needs a PMU microarchitecture that exposes it (x86_64, or an ARM server core) — bare metal
alone is not enough. Do not relabel it "full PMU evidence" or "constrained Docker validation". The
`perf record` hot-symbol report (`results/perf_report_linux.txt`) is a **software cpu-clock
sampling** profile, not PMU evidence.
- Issue #94 external review remains one of the highest remaining credibility signals; do not imply
independent review has happened until `docs/review_feedback.md` records it.

Expand Down Expand Up @@ -1144,10 +1151,14 @@ aesthetic product work before M24-M49 unless the human explicitly changes priori
The correct claim after this arc is:

> "correctness-first deterministic exchange-systems lab with measured concurrency, allocator,
> constrained Linux perf workflow, and socket-profiling evidence."
> bare-metal partial-PMU Linux perf, and socket-profiling evidence."

Do not claim real hardware PMU evidence until issue #90 is completed on a bare-metal or
PMU-capable Linux target. Current M29 artifacts are constrained-environment validation only.
Real hardware PMU evidence now exists on a bare-metal Apple M2 (aarch64) Fedora Asahi host —
`cycles`/`instructions`/`branches`/`branch-misses` are genuine counters. Do not claim *full* PMU
evidence (the Apple Silicon PMU does not expose `cache-references`/`cache-misses`), and do not call
the current artifacts "constrained Docker validation" either: they are **partial hardware PMU
evidence**. Issue #90's residual is the cache-counter set, which needs a PMU microarchitecture that
exposes it (x86_64, or an ARM server core).

The incorrect claims remain forbidden:

Expand Down Expand Up @@ -1185,18 +1196,20 @@ M29 currently means:
- Metadata-rich profiling artifacts exist.
- Dirty-tree handling exists.
- PMU preflight/validation exists.
- Constrained-environment validation exists.
- Bare-metal partial-PMU validation exists.
- CI validation exists.
- The workflow is reproducible.

M29 does **not** currently mean real hardware PMU evidence has been captured. The committed
artifacts were generated in a constrained Docker Desktop Linux environment where hardware PMU
counters and sampling were unavailable or permission-limited. Do not claim real PMU evidence at
this time.
M29 now means **partial hardware PMU evidence**: the committed artifacts were regenerated on a
bare-metal Apple M2 (aarch64) Fedora Asahi host (`systemd-detect-virt` reports `none`), where
`perf stat` reads genuine `cycles`/`instructions`/`branches`/`branch-misses` counters off the Apple
Avalanche P-core PMU. They are no longer "constrained Docker validation" — but they are not *full*
PMU evidence either, because the Apple Silicon PMU does not expose `cache-references`/`cache-misses`.

Issue #90 tracks full PMU-backed evidence generation on a bare-metal Linux host or a Linux VM/server
with real `perf_event` hardware counter access. Treat this as: problem identified -> limitation
documented -> follow-up issue created -> acceptance criteria defined. This is intentional
Issue #90 now tracks only that residual: a *full* counter set (including cache events) requires a
PMU microarchitecture that exposes those events to Linux (e.g. an x86_64 Intel/AMD host, or an ARM
server core such as Graviton/Ampere) — not "more bare metal." Treat this as: problem identified ->
limitation documented -> follow-up issue created -> acceptance criteria defined. This is intentional
engineering transparency, not a repo deficiency.

## Dynamic-analysis limits
Expand Down Expand Up @@ -1241,8 +1254,10 @@ M45 exchange-grade persistence prototype; M46 recovery benchmarking; M47 contigu
storage and cache-locality study; M48 DPDK research/prototype; M49 NIC offload and low-latency
networking study.

Issue #90 remains the full hardware-PMU evidence debt. Issues #99 and #110 were addressed by PR
#111. Issues #95, #28, and #26 were addressed by PR #112. Issue #94 is the external technical
review request and remains one of the highest remaining credibility signals. PR #124 completed M49;
the current follow-up branch `perf/linux-host-artifact-refresh` refreshes Linux host artifacts on
Fedora Asahi without adding new networking claims.
Issue #90 remains the full hardware-PMU evidence debt (the cache-counter set specifically). Issues
#99 and #110 were addressed by PR #111. Issues #95, #28, and #26 were addressed by PR #112. Issue
#94 is the external technical review request and remains one of the highest remaining credibility
signals. PR #124 completed M49, PR #125 (d9094df) refreshed the Linux host artifacts on bare-metal
Fedora Asahi, and `v0.2.0` was released (PR #127 ded6e80; resume-anchor sync PR #128 ae93545). There
is no active milestone; the highest-value remaining work is non-code and gated on #94 (external
review) and #90 (full cache-PMU evidence on a PMU-capable microarchitecture).
26 changes: 15 additions & 11 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,12 +159,14 @@ Known constraints:

- The gateway and feed are loopback-only, unauthenticated simulator surfaces.
- The core engine cannot depend on wall-clock time or floating-point prices.
- Perf artifacts are now **partial hardware PMU evidence** from a bare-metal Apple M2 (aarch64)
Fedora Asahi host: real `cycles`/`instructions`/`branches`/`branch-misses`, but
`cache-references`/`cache-misses` are unsupported by the Apple Silicon PMU. Issue #90's residual
is the cache-counter set specifically, which needs a PMU microarchitecture that exposes it
(x86_64, or an ARM server core) — bare metal alone is not enough. Do not relabel these as either
"full PMU evidence" or "constrained Docker validation".
- The `perf stat` artifact (`results/perf_stat_linux.txt`) is now **partial hardware PMU evidence**
from a bare-metal Apple M2 (aarch64) Fedora Asahi host: real
`cycles`/`instructions`/`branches`/`branch-misses`, but `cache-references`/`cache-misses` are
unsupported by the Apple Silicon PMU. Issue #90's residual is the cache-counter set specifically,
which needs a PMU microarchitecture that exposes it (x86_64, or an ARM server core) — bare metal
alone is not enough. Do not relabel it "full PMU evidence" or "constrained Docker validation". The
`perf record` hot-symbol report (`results/perf_report_linux.txt`) is a **software cpu-clock
sampling** profile, not PMU evidence.
- Issue #94 external review remains one of the highest remaining credibility signals; do not imply
independent review has happened until `docs/review_feedback.md` records it.

Expand Down Expand Up @@ -1196,8 +1198,10 @@ M45 exchange-grade persistence prototype; M46 recovery benchmarking; M47 contigu
storage and cache-locality study; M48 DPDK research/prototype; M49 NIC offload and low-latency
networking study.

Issue #90 remains the full hardware-PMU evidence debt. Issues #99 and #110 were addressed by PR
#111. Issues #95, #28, and #26 were addressed by PR #112. Issue #94 is the external technical
review request and remains one of the highest remaining credibility signals. PR #124 completed M49;
the current follow-up branch `perf/linux-host-artifact-refresh` refreshes Linux host artifacts on
Fedora Asahi without adding new networking claims.
Issue #90 remains the full hardware-PMU evidence debt (the cache-counter set specifically). Issues
#99 and #110 were addressed by PR #111. Issues #95, #28, and #26 were addressed by PR #112. Issue
#94 is the external technical review request and remains one of the highest remaining credibility
signals. PR #124 completed M49, PR #125 (d9094df) refreshed the Linux host artifacts on bare-metal
Fedora Asahi, and `v0.2.0` was released (PR #127 ded6e80; resume-anchor sync PR #128 ae93545). There
is no active milestone; the highest-value remaining work is non-code and gated on #94 (external
review) and #90 (full cache-PMU evidence on a PMU-capable microarchitecture).
47 changes: 35 additions & 12 deletions PROGRESS.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,22 @@ Do not rely on prior chat memory.
- **Last completed milestone:** M49 — NIC offload and low-latency networking study (PR #124,
d8c16b2), then the Linux host artifact refresh (PR #125, d9094df) and the v0.2.0 release
(PR #127, ded6e80)
- **Last completed docs sync:** v0.2.0 documentation staleness sweep (PR #127): perf evidence
reframed as bare-metal partial PMU, release-readiness rewritten, every doc read and brought current
- **Last completed docs sync:** resume-anchor + PMU-claim sync (`docs/codex-resume-anchor-sync`,
this PR) resolving the Codex findings left on `main` by PRs #127/#128. Prior sweep: v0.2.0
documentation staleness sweep (PR #127): perf evidence reframed as bare-metal partial PMU,
release-readiness rewritten, every doc read and brought current
- **Release:** `v0.1.0` (tag on 9857e1a) and `v0.2.0` (tag on ded6e80, marked Latest) published as
GitHub-only releases; no packages published
- **`make check` passing:** yes — `make check` 241/241 and `make asan` 241/241 on the bare-metal
Apple M2 (aarch64) Fedora Asahi host on 2026-06-21
- **Last action:** prepared and released `v0.2.0`. Reframed the perf evidence from "constrained
Docker validation" to **partial hardware PMU evidence** on a bare-metal Apple M2 (real
- **Last action:** resume-anchor + PMU-claim sync on `docs/codex-resume-anchor-sync` resolving the
Codex review findings left on `main` by PRs #127/#128 — removed PROGRESS's stale "Next action
remains" block that still pointed `/resume` at the merged PR #125, brought AGENTS.md in line with
CLAUDE.md's v0.2.0 partial-PMU reframe (no more "constrained Docker validation" wording), and
narrowed docs/perf_analysis.md so the Apple Blizzard (E-core) PMU rows are not implied to carry
live counts. Docs/memory only; no code or artifacts changed (`make check` still 241/241).
- **Prior action (v0.2.0 release):** prepared and released `v0.2.0`. Reframed the perf evidence from
"constrained Docker validation" to **partial hardware PMU evidence** on a bare-metal Apple M2 (real
cycles/instructions/branches/branch-misses; cache-references/cache-misses unsupported by the Apple
Silicon PMU), with a new three-way `perf_stat.sh` classifier and a reframed issue #90. Regenerated
all 15 `results/*.txt` on bare metal (`Dirty inputs: no`, MAC-leak grep clean), bumped the project
Expand Down Expand Up @@ -350,6 +358,18 @@ Lower priority:
supersedes CodeRabbit's PR #126, whose generated tests covered only trimming and were based on
`d8c16b2` (where `qsl_publish_artifact` does not yet exist, so #126 could not merge before #125);
#126 was closed as superseded. Do not merge from automation; the human squash-merges PR #125.
- [2026-06-21] Codex-followup resume-anchor sync (`docs/codex-resume-anchor-sync`). Resolved the

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Refresh current state for this follow-up

When /resume reads PROGRESS.md, it starts with the ## Current state bullets, which still list the v0.2.0 release / PR #127 as the last action and docs sync. Recording this follow-up only down in the decision log means the next agent can miss that this resume-anchor/PMU sync already happened or duplicate it; update the top current-state bullets alongside this new entry.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in dfa4da2: the top ## Current state bullets now record this resume-anchor/PMU sync as the "Last action" and "Last completed docs sync", with the v0.2.0 release demoted to a "Prior action" bullet — so /resume no longer reads PR #127 as the latest action.

Codex review findings left on `main` by PRs #127 and #128: (1) removed PROGRESS.md's stale
"Next action remains" block that still pointed `/resume` at the merged PR #125 on
`perf/linux-host-artifact-refresh`, replacing it with the v0.2.0 between-releases state (#94/#90
gated); (2) brought AGENTS.md into sync with CLAUDE.md's v0.2.0 partial-PMU reframe — the
constraints bullet, the "correct claim" block, and the "M29 perf evidence status" subsection no
longer call the artifacts "constrained Docker validation," and the stale
`perf/linux-host-artifact-refresh` follow-up line was updated in both AGENTS.md and CLAUDE.md to
the released state; (3) narrowed docs/perf_analysis.md so it no longer implies the Apple Blizzard
(E-core) PMU carries live counts — the `apple_blizzard_pmu/...` rows read `<not counted>` in
`results/perf_stat_linux.txt` because the single-threaded benchmark stays on the Avalanche P-cores.
Docs/memory only; no code or artifacts changed.
- [2026-06-03] M35: implemented a multi-client TCP connection-scaling load test (`scripts/socket_load.sh`, `make socket-load`, Linux-only) driving N concurrent `qsl-client`s against the portable TCP and epoll (M34) gateways; `results/socket_load_summary.txt` is Docker-generated and constrained. A `/code-review` (3 finder agents) caught and fixed real measurement-integrity bugs before the PR: a failed trial's `wall=0` no longer poisons the reported best (only trials whose gateway served count toward the min); the `completed` column reports the WORST per-trial completion, not the last, so partial/total trial failures are surfaced rather than masked; a per-client `timeout` bounds a hang if the gateway dies; and `QSL_LOAD_TRIALS` is validated. Post-PR hardening uses fresh monotonic ports per gateway start, retries transient startup/serve failures on new ports, and refuses to write a partial artifact unless `QSL_LOAD_ALLOW_PARTIAL=1` is set intentionally; the refreshed artifact records `Dirty tree: no`. The scaling-shape claim remains constrained to loopback connection setup, not a demonstrated production-capacity advantage for either transport. Deferred follow-up: a shared `scripts/lib` to remove the dirty-tree / `wait_ready` / gateway-stop duplication across the three socket scripts.
- [2026-06-03] M35: started after M34 (#98) squash-merged (commit 9e3750b). Scope: multi-client load / socket-pressure testing of the gateway/feed path (TCP/UDP stress, socket-buffer pressure, connection scaling, backpressure) building on M34's epoll multi-client path and M30's socket tooling. Constraints: scripts/tests document load shape + environment; results must distinguish kernel/socket pressure from user-space engine cost; no production-capacity claims (honest constrained-environment framing, like M29/M30).
- [2026-06-04] M35: PR #100 squash-merged to `main` as a86b701 after all CI jobs and review checks were green. M35 is now landed; original M36 NUMA remains deferred until the repository-health refactor analysis is completed or explicitly skipped by the human.
Expand Down Expand Up @@ -756,14 +776,17 @@ Quant Systems Lab — Linux Systems + Exchange Infrastructure Simulator

## Next action remains

Current action is the Linux host artifact refresh PR #125 on `perf/linux-host-artifact-refresh`:
wait for human review / CI and do not merge from automation. M49 (PR #124) is already merged to
`main` as d8c16b2. The refreshed artifacts are host-specific Linux evidence — partial Apple PMU
counters (cycles/instructions/branches/branch-misses) with cache-reference/cache-miss counters
unsupported — not NIC-offload, latency, or full hardware-PMU evidence.

Issue #90 remains the evidence debt for full Linux hardware PMU artifacts (cache counters). Work it
only on a PMU-capable Linux host; do not relabel constrained or partial artifacts as full evidence.
There is no active milestone. `v0.2.0` is released (PR #127 ded6e80, tag on ded6e80, marked Latest;
resume-anchor sync PR #128 ae93545). M0–M49, the Linux host artifact refresh (PR #125, d9094df), and
the v0.2.0 release are all merged to `main`. The committed perf artifacts are **partial hardware PMU
evidence** from this bare-metal Apple M2 (aarch64) Fedora Asahi host — real
cycles/instructions/branches/branch-misses with cache-reference/cache-miss counters unsupported by
the Apple Silicon PMU — not NIC-offload, latency, or full hardware-PMU evidence.

Highest-value remaining work is non-code and gated: issue #94 (independent external review) and
issue #90 (full cache-PMU evidence). Issue #90 needs a PMU **microarchitecture** that exposes cache
counters to Linux (x86_64, or an ARM server core); do not relabel partial artifacts as full
evidence. Do not work either from automation; the human drives them.

After each squash merge, return to this file and update state factually. If benchmark numbers are not measured, write `not measured`. Do not guess. Nobody is impressed by imaginary throughput.

Expand Down
12 changes: 8 additions & 4 deletions docs/perf_analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,14 @@ CI validation, and a reproducible command path.

The committed artifacts are now generated on a **bare-metal Linux host** — an Apple MacBook Air
(M2, aarch64) running Fedora Asahi Remix, directly on the hardware (`systemd-detect-virt` reports
`none`, no `hypervisor` CPU flag). `perf stat` reads **real hardware counters** off the Apple
Avalanche (P-core) and Blizzard (E-core) PMUs: `cycles`, `instructions`, `branches`, and
`branch-misses` are live. The artifact is therefore classified **partial hardware PMU evidence**,
not constrained-environment validation: the counters that are present are real, not emulated.
`none`, no `hypervisor` CPU flag). On this heterogeneous SoC `perf` opens each event against both
PMU instances — the Apple Avalanche (P-core) and Blizzard (E-core) PMUs — but the single-threaded
benchmark is scheduled on the performance cores, so **the Avalanche counters carry the real
counts**: `cycles`, `instructions`, `branches`, and `branch-misses` are live there. The
corresponding `apple_blizzard_pmu/...` rows read `<not counted>` in `results/perf_stat_linux.txt`
because the workload never ran on the E-cores — that is expected scheduling behavior, not a missing
counter. The artifact is therefore classified **partial hardware PMU evidence**, not
constrained-environment validation: the counters that are present are real, not emulated.

The residual gap is specific and is what issue #90 now tracks: the Apple Silicon PMU, as exposed
by the current Asahi kernel driver, does **not** implement the generic `cache-references` /
Expand Down
Loading