PROGRESS.md — Quant Systems Lab live state

Single source of truth for current build state. Claude Code updates this at the end of every milestone and before stopping mid-milestone.

On resume:

Read AGENTS.md / CLAUDE.md depending on agent type.
Read this file.
Read MILESTONES.md.
Read HANDOFF.md.
Verify with:
- git status
- git branch --show-current
- git log --oneline -10
- gh pr list

Do not rely on prior chat memory.

Current state

Active milestone: none — v0.2.1 released; project is between releases
Status: ☑ v0.2.1 published (FIX-like text protocol adapter #29, perf flamegraph #32, and a resume-anchor/PMU consistency sweep) on top of v0.2.0
Active branch: none (work lands via scoped PRs from main)
Last completed milestone: M49 — NIC offload and low-latency networking study (PR #124, d8c16b2); since then v0.2.0 (PR #127, ded6e80) and the v0.2.1 content: Codex resume-anchor sweep (PR #129), perf flamegraph #32 (PR #134), and the FIX text adapter #29 (PR #131)
Last completed docs sync: v0.2.1 release prep (this PR): version bump + CHANGELOG [0.2.1] and resume/release anchors brought current
Release: v0.1.0 (tag on 9857e1a), v0.2.0 (tag on ded6e80), and v0.2.1 (tag created on the squash-merge of the release PR, marked Latest) published as GitHub-only releases; no packages published
make check passing: yes — make check 263/263 and make asan 263/263 on the bare-metal Apple M2 (aarch64) Fedora Asahi host on 2026-06-21 (includes the v0.2.1 FIX-adapter and flamegraph renderer tests)
Last action: delivered the v0.2.1 content as scoped PRs and prepared this version-bump release. Two reprioritized backlog items — the FIX-like text protocol adapter (#29) and the perf call-graph flamegraph (#32) — plus the Codex resume-anchor/PMU consistency sweep (#127/#128 follow-up). Ran Codex as an independent reviewer across the stack and resolved every finding: the FIX envelope now requires MsgType as the first body field and rejects duplicate tags; flamegraph.sh classifies zero-sample/partial runs honestly, fails hard on renderer errors, and gates on the folded sample total (not perf's estimate); and the resume anchors were made consistent across PROGRESS/HANDOFF/AGENTS/CLAUDE. Brought every touched file through the CodeScene Code Health gate (table-driven enum maps, a decode_typed skeleton, split parse_envelope, flattened flamegraph.py). make check/make asan 263/263.
Next action: no active milestone. Highest-value remaining work is non-code and gated: issue #94 (independent external review — needs a human reviewer) and issue #90 (full cache-counter PMU evidence — needs a PMU microarchitecture that exposes cache events, e.g. x86_64). The #32 (flamegraph) and #29 (FIX adapter) backlog items are done — shipped in v0.2.1 (PR #134 and PR #131) — so do not reopen them.
Blockers: issue #90 is now a cache-counter PMU gap, not a host-access gap — this bare-metal Apple M2 exposes real cycles/instructions/branches/branch-misses but its PMU does not implement cache-references/cache-misses; closing it needs a PMU microarchitecture that exposes cache events (x86_64, or an ARM server core). Issue #94 remains open for independent external review (human-gated). Hardware NIC/offload latency measurement still requires suitable wired NIC hardware, driver support, timestamping/offload/RSS access, and a measured packet workload; the current wld0 Wi-Fi capability observation is not NIC-offload latency evidence. The legacy backlog is now clear: #32 and #29 shipped in v0.2.1 (PR #134, PR #131); issues #95, #28, and #26 were closed by PR #112.

Milestone status

#	Milestone	Branch	Status	PR	Notes
M0	Scaffold, tooling, CI	`feat/m00-scaffold`	☑ merged	#1	Create repo structure, CMake, CI, Claude commands
M1	Core domain	`feat/m01-core-domain`	☑ merged	#2	Types, ticks, enums, logical clock
M2	Binary protocol	`feat/m02-binary-protocol`	☑ merged	#3	Explicit encode/decode, byte fixtures
M3	Order book	`feat/m03-order-book`	☑ merged	#4	Single-symbol price-time priority
M4	Matching engine	`feat/m04-matching-engine`	☑ merged	#5	Multi-symbol sequencing and snapshots
M5	Risk + gateway	`feat/m05-risk-gateway`	☑ merged	#6	Deterministic checks before engine
M6	Market data	`feat/m06-market-data`	☑ merged	#7	Trade/top-of-book/delta/snapshot publisher
M7	Event log	`feat/m07-event-log`	☑ merged	#8	Append-only records and reader
M8	Replay/recovery	`feat/m08-replay-recovery`	☑ merged	#9	Rebuild engine state from log
M9	TCP gateway	`feat/m09-tcp-gateway`	☑ merged	#10	Binary order gateway over TCP
M10	Network market data	`feat/m10-network-market-data`	☑ merged	#11	Network feed client/publisher
M11	Benchmarks	`feat/m11-benchmarks`	☑ merged	#12	Measured performance outputs
M12	Hardening	`feat/m12-hardening`	☑ merged	#13	Sanitizers and invariant tests
M13	Docs polish	`feat/m13-docs-polish`	☑ merged	#14	README, diagram, demo, recruiting notes
M14	OCaml replay verifier	`feat/m14-ocaml-replay-verifier`	☑ merged	#16	Independent typed-functional replay invariant checker

Status key:

☐ not started
◐ in progress
☑ merged
⚠ blocked

Decision log

One line per real decision. Add an ADR under docs/adr/ for architectural choices.

Initial project scope: deterministic C++20 exchange simulator for quant SWE recruiting.
Initial implementation strategy: build deterministic in-process core before networking.
Price representation: integer ticks only; no floating-point price types.
Benchmark policy: no performance numbers may be claimed until generated by committed benchmark scripts.
Human-in-the-loop workflow: Claude Code creates branches/PRs; human squash-merges.
[M0] Test framework: Catch2 v3.5.2 via FetchContent.
[M0] Build generator: Ninja via CMake presets (dev, release, asan).
[M0] Compiler warnings: strict wall of warnings via interface library target.
[M0] build/test/check auto-configure via a CMakeCache stamp so a cold clone works.
[M1] Domain enums are fixed-width (uint8_t) with is_valid + to_string; out-of-range casts reject deterministically.
[M1] Result is a minimal {bool ok; RejectReason} value type (no std::expected; that is C++23).
[M1] LogicalClock provides a monotonic logical Timestamp; core paths avoid wall-clock time.
[M1] Test targets now link qsl_warnings, so header-only domain code is warning-checked through tests.
[M1] RejectReason stringification test is exhaustive (all enumerators + out-of-range cast).
[M2] Wire format is big-endian, fixed-width: 16-byte header (type/version/body_len/seq_no) + body.
[M2] Serialization is explicit byte-shift (endian.hpp); no reinterpret_cast/memcpy/struct overlay.
[M2] Decoders are non-throwing, bounds-safe, and return a deterministic DecodeError.
[M2] MsgType starts with NewOrder/CancelOrder; registry grows in later milestones.
[M2] DecodeError stringification test is exhaustive (all enumerators + out-of-range cast).
[M2] Signed Price round-trip coverage includes negative and int64 min/max values.
[M2] Typed decoders intentionally parse declared bodies; stream exact-size enforcement is deferred to M9.
[M2] Protocol decode success implies valid NewOrder enum fields; invalid wire enum values return DecodeError::InvalidEnumValue.
[M3] Single-symbol book: price-ordered std::map levels (best-first) + FIFO std::list per level; matching iterates these deterministically.
[M3] OrderId→{side,price,iterator} index for O(1) cancel/modify; never iterated for matching, so its unordered layout does not affect determinism.
[M3] Fills execute at the resting maker's price; GTC rests remainder, IOC discards, market never rests.
[M3] Modify: same-price quantity reduction preserves priority (in place); price change or quantity increase loses priority (cancel + re-add, may cross).
[M3] Distinguish per-order Quantity (u32) from aggregate QuantityTotal (u64); quantity_at() accumulates/returns the 64-bit total so level liquidity never wraps.
[M3] Modify zero-quantity (cancel) and unknown-id (no-op) branches are now tested; partially-filled maker priority retention is explicitly tested.
[M4] EngineEvent is a std::variant (OrderAccepted/OrderCanceled/OrderModified/TradeEvent); each event carries its SeqNo.
[M4] Single monotonic counter assigns each emitted event a strictly-increasing SeqNo.
[M4] Engine keys books in an ordered std::map<SymbolId, OrderBook>; snapshot iterates it for deterministic ordering.
[M4] Unknown symbol / unknown cancel-modify is a no-op at the engine; structured rejection (OrderRejected) is deferred to M5, BookUpdate to M6.
[M4] Global cross-symbol sequence monotonicity is explicitly tested (interleaved AAPL/MSFT share one counter).
[M4] new_market emitted event contents are asserted (OrderAccepted + TradeEvent fields).
[M5] Risk split: pure value checks in engine/risk.hpp (side, price tick, quantity, max qty, max notional); identity checks (unknown symbol, duplicate, unknown order) in the gateway, which knows engine state.
[M5] Notional check is overflow-safe (quantity > max_notional / price), never computing price * quantity.
[M5] "Duplicate" = order id currently active (resting), via MatchingEngine::contains, matching the engine's duplicate-active-id no-op; completed-order ids may be reused.
[M5] Rejections return a structured GatewayResult and never reach the engine, so the sequence counter and state stay clean; rejections are not part of the engine's sequenced event stream.
[M6] Market-data messages are a std::variant (MdTrade, MdTopOfBook); each carries a monotonic md_seq from a single counter, emitted in engine-event order.
[M6] Publisher derives top-of-book by reading the deterministic engine (best_bid/best_ask); MdTopOfBook is emitted only when the top changes.
[M6] MD wire encoding reuses the M2 framing (write_header promoted out of the codec anon namespace); layering is feed -> protocol -> core.
[M6] BookDelta/Snapshot (full depth) deferred to the networked-feed/snapshot work.
[M7] Log record framing: big-endian header (seq_no/record_type/logical_timestamp/payload_size) + opaque payload + FNV-1a u32 checksum; reuses the M2 byte helpers (replay -> protocol -> core).
[M7] read_log is pure and bounds-safe; corruption yields a deterministic LogError (Truncated/BadChecksum/PayloadTooLarge) and returns intact records read so far.
[M7] File I/O uses C stdio (fwrite/fread) so byte buffers pass as void* — no reinterpret_cast. Writer opens in append mode only (no update-in-place).
[M7] Payload is opaque to the log (e.g. a serialized protocol command frame); typed replay interpretation is M8.
[M8] Recorded unit is a Command sum type including RegisterSymbol, so a log replays standalone (same names in order -> same SymbolIds).
[M8] EngineSnapshot extended with per-level aggregates (LevelView bids/asks); replay equivalence compares snapshot (best bid/ask, levels, counts, last_seq) and the full emitted event sequence.
[M8] Synthetic flow uses a seeded mt19937_64; replay is deterministic because the engine is wall-clock independent.
[M8] qsl-replay generate|<file> provides a self-contained recovery demo (write a flow log, then rebuild from it).
[M8] qsl-replay generation fails if any EventLogWriter append fails.
[M8] qsl-replay distinguishes missing/unreadable logs from valid empty logs.
[M9] Protocol/session logic lives in a pure Session (bytes in -> OrderGateway -> bytes out); the TCP transport (TcpServer) is a thin POSIX wrapper, so the gateway logic is unit-tested without sockets.
[M9] A malformed frame flags the session for disconnect (server drops the peer, no crash/desync); a well-formed but risk-rejected order returns a structured Reject.
[M9] Single-threaded accept-and-serve loop (one connection at a time); no thread pool / event loop.
[M9] The only reinterpret_cast in the codebase is the POSIX sockaddr* cast; protocol serialization stays shift-based.
[M9] No authentication; binds 127.0.0.1 only. Local simulator, not a venue (documented in docs/socket_gateway.md).
[M10] Market data is exposed over UDP: one datagram per MdTrade/MdTopOfBook, encoded with the binary protocol; decode_market_data dispatches on header type.
[M10] Gap detection is a pure SequenceTracker (forward gaps only; duplicates/reorder ignored); UDP loss is detected, not recovered — no retransmit/snapshot channel.
[M10] UdpPublisher is a MarketDataSubscriber, so the network feed reuses the M6 publisher unchanged; inet_pton is checked (invalid host -> unusable), applying the M9 bind-validation lesson.
[M10] UDP unicast on 127.0.0.1 only; no multicast/auth/encryption (documented honestly).
[M10] Wire-level UDP test covers out-of-sequence datagrams and gap detection through the real receive/decode/client path.
[M10] Known non-market-data frames decode to no market-data message (decode_market_data -> nullopt).
[M10] UdpFeedClient receive uses a bounded SO_RCVTIMEO so demo subscribers do not hang indefinitely.
[M11] Custom qsl-bench harness (no external benchmark dep); timing uses steady_clock at the benchmark layer only, with a volatile sink to prevent dead-code elimination.
[M11] make bench builds the bench preset and runs scripts/run_benchmarks.sh, which writes results/latest.txt with full metadata; only measured numbers appear anywhere.
[M11] Benchmark preset disables test configuration so make bench does not depend on Catch2/test-only FetchContent.
[M11] Benchmark workloads use the deterministic generate_flow(seed=42); numbers are hardware/compiler/build-dependent (caveat documented in docs/benchmarking.md + docs/linux_performance.md).
[M11] Benchmarks avoid degenerate paths: the gateway-session benchmark crosses deep resting liquidity so a real fill is produced each call (not an empty-book no-op); cross-TU calls + a volatile sink prevent dead-code elimination.
[M11] results/latest.txt records iterations, warmup, seed, units, dirty-tree flag, and an explicit "synthetic microbenchmark, not production throughput" caveat.
[M11] README benchmark section leads with methodology/exclusions; no production/low-latency/exchange-grade claims; numbers labelled single-machine.
[M11] make fmt/fmt-check now also cover apps/ (benchmark and CLI code is formatting-checked).
[M12] Invariants verified over randomized deterministic flows (generate_flow, seeds 1–8): strictly-increasing sequence, no crossed book, executed ≤ submitted, positive quantities; plus focused rejected-cannot-rest, canceled-cannot-trade, and replay-stress (seeds 11/22/33 × 8000 orders) tests.
[M12] Protocol/session fuzz tests feed tens of thousands of random buffers to every decoder and the TCP session; decoders are non-throwing and bounds-safe, so the assertion is "no crash / no UB" — proven under the ASan+UBSan build.
[M12] Added a CI sanitizers job (cmake --preset asan + ctest --preset asan) so ASan/UBSan runs on every PR, not just locally.
[M12] docs/invariants.md enumerates each invariant and where it is tested; reaffirms structural guarantees (integer-only prices, wall-clock-independent core). No new floating-point or clock dependency introduced.
[M12] Added structure-aware mutated-frame fuzz tests so malformed-but-plausible inputs reach body decoders, not only header rejection.
[M12] Added valid-frame chunking tests for TCP Session reassembly.
[M12] Randomized invariant tests include non-vacuity guards so future generator changes cannot silently remove trades/book activity.
[M13] README rewritten for a <60s read with a committed mermaid architecture diagram, clean-clone quickstart, demo section, and honest limitations; benchmark table cites results/latest.txt exactly (no new/unmeasured numbers).
[M13] Added scripts/demo.sh + make demo: deterministic replay/recovery then a loopback TCP gateway round-trip (readiness polling + cleanup trap). Gateway is unauthenticated/loopback-only — flagged in README and the script.
[M13] Expanded docs/recruiting_notes.md with conservative SWE + Linux résumé bullets, measured-only benchmark bullets (with exclusions caveat), and interview-defense notes. No production/profitability claims.
[M14] OCaml toolchain installed via Homebrew (ocaml 5.4.1 + dune 3.23.1); no opam, no third-party packages — verifier uses only the standard library.
[M14] Added a C++ qsl-export-fixture app that drives a deterministic flow through the gateway (low max_qty → real rejects) and writes a normalized textual event-log fixture. No engine code changed.
[M14] OCaml verifier (ocaml/) re-derives replay invariants from the exported log (sequence monotonicity, positive trade qty, canceled-can't-trade, rejected-can't-rest, event-log/summary consistency). It is an independent cross-check, not a re-implementation of matching and not formal verification.
[M14] Committed a generated valid fixture plus two hand-crafted violation fixtures so dune runtest proves the checker both passes clean logs and catches violations. CI runs a dedicated ocaml-verifier job (apt ocaml+dune; opam-free).
[M14] OCaml verifier tracks OrderId lifetimes instead of treating rejected/canceled numeric IDs as globally dead (a later accept reuses an id legally); added valid-reuse fixtures alongside the violation fixtures.
[M14] Malformed verifier fixtures now fail cleanly with exit 2 instead of uncaught parser exceptions.
[M15] Added a differential-fixture exporter (qsl-export-stream + replay::write_stream_fixture) emitting a normalized command stream, engine events, command-scoped gateway rejections, and the full final per-symbol snapshot (best bid/ask, per-price level aggregates, order counts, last_seq, trade count). Schema in docs/differential_testing.md.
[M15] Reused the deterministic generate_flow (no new/random generation — that is M18); a low max_qty yields real rejections. Export is byte-deterministic per seed.
[M15] M15 only exports + tests determinism/parseability; the independent OCaml replay engine (M16) and C++-vs-OCaml snapshot equality (M17) are deferred. M14 verifier (v1 event-log fixtures) is untouched and still green.
[M15] Stream fixtures emit structured modify rejection outcomes instead of dropping rejected modify commands.
[M15] Differential fixture reject lines are command-scoped so M17 can distinguish C++ risk rejection from no-op commands.
[M15] Meta risk fields are parse-tested because M16 needs them to reproduce C++ risk-filtered state.
[M15] Registered-but-empty symbols are part of the snapshot contract and must be mirrored by the OCaml replay engine.
[M15] Byte-for-byte golden pinning for stream fixtures remains deferred to M17; M15 covers deterministic generation, schema parseability, and targeted outcome contracts.
[M13] Demo script uses a portable mktemp template so make demo works on GNU/Linux and macOS.
[M9] TCP server rejects invalid numeric IPv4 bind hosts instead of falling back to 0.0.0.0.
[M9] Socket writes avoid process termination from SIGPIPE where supported by using send/MSG_NOSIGNAL.
[M9] Session tests cover cancel, malformed body, unexpected message type, and closed-peer write behavior where feasible.
[M8] Replay integration tests exercise EventLogWriter -> EventLogReader -> replay through the real M7 byte framing.
[M8] decode_command failure branches are tested.
[M7] Writer enforces the same kMaxPayload cap as the reader.
[M7] Append checks both fwrite and fflush before reporting success.
[M7] Tests cover oversized payload rejection, PayloadTooLarge, and header checksum corruption.
[M6] Publisher treats first empty observation as initialization, not a top-of-book delta.
[M6] Tests cover no-op first touch, cancel-driven TOB updates, MD malformed-frame decode paths, and a deterministic MdTrade byte fixture.
[M5] Nonzero modify commands are re-validated with limit-order value constraints before reaching the engine.
[M5] Modify quantity 0 remains cancel-via-modify.
[M5] Market-order rejection branches are explicitly tested.
[M5] Rejected modifies do not mutate engine state or consume sequence numbers.
[M4] Active resting OrderIds are unique per symbol; duplicate active IDs are no-ops in M4 and become structured DuplicateOrderId rejections in M5.
[M4] Tests cover no orphaned liquidity after duplicate-id attempts.
[M34] Epoll response budgeting is enforced at the Session/gateway boundary: over-cap NewOrder fanout is rejected before appending responses or mutating engine state.
[M34] Epoll transport drains writable backlog before accepting more input for a client, retries EINTR sends, drains EPOLLIN data before honoring EPOLLHUP, treats EPOLLERR as an immediate close, suppresses EPOLLIN after EOF, and closes active clients on fatal listener failure.
[M34] Epoll client events carry fd-generation tokens; stale events from a closed connection cannot act on a new connection that reused the same numeric fd.
[M34] Once a session is closing, epoll does not re-arm EPOLLIN; queued replies are either flushed under the hard cap or discarded immediately when no reply was accepted in the current read.

Measured benchmark results

Fill only after M11. Never estimate.

none yet

Measured by make bench (full metadata + raw output in results/latest.txt). Hardware-, compiler-, and build-dependent — these are from one machine, not a production-latency claim.

Run: arm64, Apple clang 17, Release, seed 42, commit fbb8180 (synthetic, in-process; excludes network/disk/kernel path).
order book add/modify/cancel: ~126 ns/op
protocol NewOrder encode+decode: ~39 ns/op
in-process gateway session (crossing order with fill): ~270 ns/op
matching-engine flow apply: ~121 ns/command
replay from command log: ~132 ns/command

Mid-milestone scratch

If stopping mid-milestone, write exactly what is half-done and the precise next step. Clear this when the milestone merges.

none

Additive Jane Street Targeting State

This section adds Jane Street internship targeting context. Do not delete earlier progress state.

Target roles

Primary:

Jane Street Software Engineer Internship, Hong Kong, December–February

Secondary:

Jane Street Linux Engineer Internship, Hong Kong, December–February

Lower priority:

Strategy and Product Internship
IT Operations Engineer Internship

Current strategic decision

The repo remains a C++20 exchange-systems simulator.
Add Linux/socket/performance documentation as the networking and benchmark milestones mature.
Add an OCaml replay verifier as a late milestone to signal Jane Street SWE alignment.
Do not optimize for IT Ops.
Do not build trading strategies, dashboards, or real-market integrations.

Added milestone rows

#	Milestone	Branch	Status	PR	Notes
M14	OCaml replay verifier	`feat/m14-ocaml-replay-verifier`	☑ merged	#16	Jane Street SWE language/culture signal
M15	Export normalized command streams + final snapshots	`feat/m15-export-command-streams-and-snapshots`	☑ merged	#17	Normalized command stream + final snapshot export for Phase II differential testing
M16	Independent OCaml replay engine	`feat/m16-independent-ocaml-replay-engine`	☑ merged	#19	OCaml computes final snapshot independently
M17	Differential replay tests	`feat/m17-differential-replay-tests`	☑ merged	#20	C++ vs OCaml snapshot equality in CI
M18	Property-based command generator	`feat/m18-property-command-generator`	☑ merged	#21	Seeded randomized market command streams
M19	Shrinker + minimal failing fixture exporter	`feat/m19-shrinker-minimal-failing-fixtures`	☑ merged	#22	Minimal counterexamples for failed properties
M20	Differential testing architecture docs	`feat/m20-differential-testing-docs`	☑ merged	#52	Final docs for differential/property testing system
M21	Repository license and maintainer docs	`feat/m21-repo-license-maintainer-docs`	☑ merged	#53	MIT LICENSE + CONTRIBUTING/SECURITY/CHANGELOG (one-maintainer, honest)
M22	Release readiness audit	`feat/m22-release-readiness-audit`	☑ merged	#54	M13-style final polish/readiness pass after Phase II
M23	Optional v0.1.0 release	`feat/m23-v0-1-0-release-notes`	☑ released	#82 / tag `v0.1.0`	GitHub-only release; no packages
M24	Bounded SPSC ring buffer	`feat/m24-spsc-ring-buffer`	☑ merged	#84	Phase III begins: bounded SPSC queue, memory ordering, backpressure
M25	Memory-ordering and concurrency evidence package	`feat/m25-memory-ordering-evidence`	☑ merged	#85	Ownership model, acquire/release documentation, stress/backpressure tests
M26	Multithreaded gateway-engine-feed pipeline prototype	`claude/serene-fermi-rhuFJ` (env-designated)	☑ merged	#86	Explicit thread boundaries and deterministic shutdown
M27	ThreadSanitizer coverage	`claude/serene-fermi-rhuFJ` (env-designated)	☑ merged	#87	TSan preset/CI for concurrent tests
M28	Memory pool allocator experiment	`feat/m28-memory-pool-allocator`	☑ merged	#88	Hot-path allocation experiment with benchmark evidence
M29	Linux perf profiling workflow and artifacts	`feat/m29-linux-perf-profiling`	☑ merged	#89	perf workflow + constrained validation; full PMU evidence backlogged in #90
M30	Kernel/socket path profiling and Linux socket hardening	`feat/m30-socket-profiling-hardening`	☑ merged	#92	syscall/socket-buffer/UDP pressure evidence; epoll deferred to M34/M35
M31	External review / maintainer signal	`docs/m31-external-review`	☑ merged	#93	Review-request checklist + feedback template; review request opened as issue #94
M32	Pool-backed order-book storage experiment	`feat/m32-pool-backed-order-book-storage`	☑ merged	#96	PMR-backed node allocation in order-book paths; direct intrusive `OrderPool` storage later handled by PR #112
M33	Advanced concurrency validation	`feat/m33-advanced-concurrency-validation`	☑ merged	#97	Scheduling perturbation, longer stress, and stronger concurrency methodology
M34	epoll gateway architecture	`feat/m34-epoll-gateway-architecture`	☑ merged	#98	Event-driven multi-client gateway design
M35	Multi-client load and socket-pressure testing	`feat/m35-multi-client-socket-pressure`	☑ merged	#100	TCP connection-scaling load (portable vs epoll) + M30 UDP pressure
M36	Decompose the epoll event loop and connection lifecycle	`refactor/m36-epoll-event-loop-decomposition`	☑ merged	#104	Repository-health refactor; `epoll_server.cpp` Code Health 5.35 → 10.0
M37	Extract threaded-pipeline stage helpers	`refactor/m37-threaded-pipeline-stage-helpers`	☑ merged	#105	Repository-health refactor; CodeScene 10.0 for `pipeline.hpp`, `test_pipeline`, and `test_backpressure`
M38	Split the command-stream shrinker into named passes	`refactor/m38-shrinker-reduction-passes`	☑ merged	#106	Repository-health refactor; `shrink.cpp` 8.15 → 10.0
M39	Encapsulate order-book matching parameters	`refactor/m39-order-book-matching-parameters`	☑ merged	#107	Repository-health refactor; `order_book.cpp` 8.55 → 9.68, determinism preserved
M40	Consolidate engine correctness test suites	`refactor/m40-engine-test-consolidation`	☑ merged	#108	Repository-health refactor (test-only); `test_order_book`/`matching_engine`/`invariants`/`risk_gateway`
M41	Simplify gateway session frame dispatch	`refactor/m41-session-frame-dispatch`	☑ merged	#109	Repository-health refactor; `session.cpp` 8.99 → 10.0
M42	Extract shared shell-script helpers	`refactor/m42-shared-shell-script-helpers`	☑ merged	#111	Repository-health refactor (manual; shell unscored); expanded to address #99/#110
Follow-up	Intrusive storage, realistic flow, threaded TCP gateway	`feat/close-storage-flow-tcp-followups`	☑ merged	#112	Closed #95/#28/#26
Docs	Systems-engineering roadmap audit	`docs/systems-roadmap-audit`	☑ merged	#113	Docs-only update to future systems roadmap and agent guidance
M43	NUMA awareness study	`feat/m43-numa-awareness-study`	☑ merged	#114	CPU affinity, scheduler migration, NUMA, and cache-locality caveats where hardware exists; constrained Docker artifact generated
M44	Ingress queue memory-ordering and false-sharing study	`feat/m44-ingress-memory-ordering-false-sharing`	☑ merged	#115	Ingress queue ordering/backpressure plus false-sharing validation; not lock-free matching
M45B	Artifact provenance migration follow-up	`perf/m45b-artifact-provenance-migration`	☑ merged	#116	Converted remaining artifact generators from commit identity to source-digest provenance
M45	Exchange-grade persistence prototype	`feat/m45-persistence-prototype`	☑ merged	#117	Durability modes, torn-tail recovery/repair, crash harness; no production-durability claims
M46	Recovery benchmarking	`feat/m46-recovery-benchmarking`	☑ merged	#118	Full-replay restart cost vs in-memory book rebuild; no production recovery-time claims
M47	Contiguous order-book storage and cache-locality study	`feat/m47-contiguous-order-book-storage`	☑ merged	#119	Fixed-band direct-price-index storage compared against baseline, PMR, and intrusive modes
Follow-up	M47 storage benchmark diagnosis	`perf/m47-storage-benchmark-diagnosis`	☑ merged	#122	Workload-shape variants + corrected timing (excludes per-run pool-init setup); overturns the earlier intrusive-slow reading
M48	DPDK research and prototype	`feat/m48-dpdk-research-prototype`	☑ merged	#123	DPDK research notes + non-mutating environment support check; no packet-path benchmark or kernel-bypass claim
M49	NIC offload and low-latency networking study	`feat/m49-nic-offload-study`	☑ merged	#124	Solarflare/Mellanox/RSS/timestamping study + non-mutating capability check; no offload latency claim

Decision log additions

[2026-06-15] M48: added DPDK research notes and a non-mutating make dpdk-check support artifact. The macOS development host classifies as unsupported-host, with no device binding, hugepage mutation, packet-path benchmark, or kernel-bypass performance claim. A prototype remains optional and only valid on a host that can support it cleanly.
[2026-06-17] M49: PR #124 squash-merged to main as d8c16b2. On Fedora Asahi Linux, refreshed host artifacts on perf/linux-host-artifact-refresh: NIC/offload check observes wld0 (brcmfmac, Broadcom BCM4387) read-only with feature listing visible, no RSS/channel support, no hardware timestamping, no setting changes, no packet generator, and no latency benchmark. DPDK remains linux-missing-dpdk; NUMA is single-node constrained; perf-stat has partial Apple PMU cycles/instructions/branches/branch-misses but unsupported cache counters, so issue #90 remains open; perf-record produced a software cpu-clock hot-symbol profile. Loopback socket and crash-recovery artifacts were regenerated on the same Linux host with clean source-digest provenance. These artifacts are host-specific evidence, not production networking claims.
[2026-06-20] PR #125 Linux artifact regeneration (finishing the review-fix follow-up). On the Fedora Asahi host (aarch64, GCC 16.1.1, make check 240/240), first generalized the publish-time MAC sanitizer (fix: commit a34a927): qsl_publish_artifact previously redacted only link/ether / permaddr / altname wlx, so a bridge interface in nic-offload-check's ip -details output still leaked a host MAC via bridge_id / designated_root / group_address. It now redacts every MAC-shaped token except the universal broadcast address, mirroring the limitations-audit leak regex. Because scripts/qsl_common.sh is a declared PROVENANCE_INPUTS entry for all 15 result generators, regenerated every artifact from the clean committed tree: QSL_PERF_ALLOW_PARTIAL=1 make perf-stat (partial Apple Avalanche PMU cycles/instructions/ branches/branch-misses; cache-reference/cache-miss <not supported>, so #90 stays open), make perf-record (cpu-clock software hot-symbol profile, 164 samples), QSL_NUMA_ALLOW_CONSTRAINED=1 make numa-study (single-node constrained), make nic-offload-check (read-only wld0 observation, now MAC-clean), make dpdk-check (linux-missing-dpdk), make false-sharing-study, make profile-io, make socket-stress, make crash-recovery, make socket-load, plus make bench / bench-allocator / bench-storage / bench-recovery / bench-diff. All 15 artifacts report Dirty inputs: no with refreshed source digests, and git grep -nE '([0-9a-fA-F]{2}:){5}[0-9a-fA-F]{2}' -- results/ | grep -v 'ff:ff:ff:ff:ff:ff' returns nothing. Closed the README/recruiting benchmark drift: synced the README benchmark table and docs/recruiting_notes.md to the regenerated results/latest.txt (single source of truth) — order book ~114 ns/op, protocol ~19 ns/op, gateway ~121 ns/op, matching ~99 ns/command, replay ~114 ns/command (the prior macOS-era ~126/39/270/121/132 ns numbers were stale). Added results/*.sqlite to .gitignore so the local qsl-results MCP store is never committed. No runtime C++ changed, so make asan / make tsan were not required. Also added tests/shell/test_qsl_common.sh (registered in CTest; portable) covering both the MAC sanitization (link/ether, permaddr, bridge_id/designated_root, group_address redacted; broadcast and the audit leak-grep invariant preserved; wlx altname redacted) and the trailing-whitespace / blank-line trimming, so the security-relevant publish logic ships with tests — make check 241/241. This supersedes CodeRabbit's PR #126, whose generated tests covered only trimming and were based on d8c16b2 (where qsl_publish_artifact does not yet exist, so #126 could not merge before #125); #126 was closed as superseded. Do not merge from automation; the human squash-merges PR #125.
[2026-06-21] Codex-followup resume-anchor sync (docs/codex-resume-anchor-sync). Resolved the Codex review findings left on main by PRs #127 and #128: (1) removed PROGRESS.md's stale "Next action remains" block that still pointed /resume at the merged PR #125 on perf/linux-host-artifact-refresh, replacing it with the v0.2.0 between-releases state (#94/#90 gated); (2) brought AGENTS.md into sync with CLAUDE.md's v0.2.0 partial-PMU reframe — the constraints bullet, the "correct claim" block, and the "M29 perf evidence status" subsection no longer call the artifacts "constrained Docker validation," and the stale perf/linux-host-artifact-refresh follow-up line was updated in both AGENTS.md and CLAUDE.md to the released state; (3) narrowed docs/perf_analysis.md so it no longer implies the Apple Blizzard (E-core) PMU carries live counts — the apple_blizzard_pmu/... rows read <not counted> in results/perf_stat_linux.txt because the single-threaded benchmark stays on the Avalanche P-cores. Docs/memory only; no code or artifacts changed.
[2026-06-21] Issue #32 flamegraph profiling artifact (perf/flamegraph-artifact, stacked on the Codex-followup branch). Added make flamegraph → scripts/flamegraph.sh, which records perf record --call-graph dwarf -F 4000 -g -e cpu-clock on qsl-bench and renders results/flamegraph.svg (+ results/flamegraph.txt provenance/classification companion). The fold + SVG render live in scripts/flamegraph.py, a dependency-free stdlib-only stackcollapse + flamegraph renderer (no vendored Perl FlameGraph toolkit), deterministic by design (frames sorted by name; colors a pure function of the name; no RNG/timestamps in the drawn body). DWARF call graphs are used because the Release bench preset omits frame pointers; application symbols (OrderBook::add_limit, MatchingEngine::new_limit, the replay path, …) still resolve from the symtab. Added tests/shell/test_flamegraph.sh (CTest-registered, python3-only, skips cleanly if absent) covering folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, determinism, and empty-input handling; make check 242/242. The committed results/flamegraph.svg/.txt were generated on the bare-metal Fedora Asahi host (aarch64) from the clean committed tree (Dirty inputs: no). This is a software cpu-clock sampling hot-symbol profile, not a latency/throughput claim; full hardware cache-PMU evidence stays in #90. Do not merge from automation; human squash-merges.
[2026-06-21] Issue #29 FIX-like text protocol adapter (feat/fix-text-protocol-adapter, stacked on the flamegraph branch). Added include/qsl/protocol/fix.hpp + src/protocol/fix.cpp: a tag=value SOH-framed adapter over the SAME internal structs as the binary codec, with genuine FIX framing (8 BeginString / 9 BodyLength / 35 MsgType / … / 10 mod-256 CheckSum) for the client→gateway order path — NewOrderSingle (35=D)→NewOrder and OrderCancelRequest (35=F)→CancelOrder. Decoding is total/deterministic/noexcept (fixed field table, std::from_chars, string_view; no heap on decode) and reports every malformed input through a FixError taxonomy mirroring the binary DecodeError. Documented, deliberate simplifications: Symbol (55) carries the numeric SymbolId; Price (44) carries integer ticks and is always present, making NewOrder↔FIX a lossless bijection like the binary codec (never float for price). tests/unit/test_fix_protocol.cpp mirrors the binary required tests and adds a cross-codec equivalence test (binary and FIX decode the same order to identical structs across all Side×OrdType×TIF), a byte-pinned fixture (checksum 164 / body-length 50), and rejection of malformed framing / unsupported BeginString / unknown-or-wrong MsgType / BodyLength mismatch / CheckSum mismatch / missing field / invalid field / invalid enum / out-of-range / oversized. Docs in docs/fix_protocol.md (+ pointer from docs/binary_protocol.md). make check 260/260 and make asan 260/260 clean (the parser handles untrusted text). Closes #29. Do not merge from automation; human squash-merges.
[2026-06-21] Post-review code-health pass on #130/#131 after Codex + the CI CodeScene Code Health gate flagged flamegraph.py and fix.cpp below the 10.0 health bar. flamegraph.py: bundled render args into a FlameOptions dataclass + extracted _append_chrome/_frame_svg, flattened fold_perf_script into a _Folder, replaced the nested dso scan with a regex, and dropped an unused _layout arg. fix.cpp: table-driven enum maps via FieldReader::coded, a decode_typed skeleton to remove decoder duplication, and parse_envelope split into tokenize/check-shape/verify-length-checksum. Behavior unchanged (make check/make asan 261/261); both PRs' CodeScene gate now passes. Also fixed three Codex findings (cancel ClOrdID enforcement; flamegraph tab/non-positive collapsed parsing). The local CodeScene MCP token is expired, so the authoritative gate is the CI CodeScene Code Health Review check.
[2026-06-21] Prepared the v0.2.1 release (docs/v0.2.1-release, stacked on the FIX adapter PR): bumped CMakeLists.txt to 0.2.1, added the CHANGELOG [0.2.1] section (FIX adapter #29, perf flamegraph #32, resume-anchor/PMU sweep), and brought the PROGRESS/HANDOFF release anchors current. No code or benchmark artifacts change in the release PR itself. On squash-merge the human tags v0.2.1 on the merge commit and publishes the GitHub release. Do not merge from automation.
[2026-06-22] Second Codex review round on the open v0.2.1 stack — addressed every remaining inline finding across PRs #129/#130/#131/#133. #131 (fix.cpp): the FIX envelope now requires MsgType (35) as the first body field (rejecting 8/9/34/35/.../10) and tokenize rejects duplicate tags (no repeating groups), with a deterministic rejection test for each; the older cancel-ClOrdID finding was already resolved and is covered by an existing test. #130 (flamegraph.sh): classify zero-sized data as a perf limitation (matching perf_record.sh), remove a stale SVG on zero-stack partial runs, accept perf's (~N samples) estimate marker and gate on the folded sample total instead of perf's estimate, fail hard (exit 4) on renderer errors instead of || true, and derive the software/hardware sampling label+caveat from the selected event; regenerated the bare-metal artifact (329 folded samples, Dirty inputs: no). #129: recorded the resume-anchor sync in PROGRESS's top current-state. #133: bumped the v0.2.1 release anchors and removed completed #29/#32 from every backlog list, synced AGENTS.md/CLAUDE.md to the v0.2.1 released state, and refreshed this release-readiness audit to 263 tests. make check/make asan 263/263. CodeScene MCP token still expired; CI is the authoritative gate.
[2026-06-03] M35: implemented a multi-client TCP connection-scaling load test (scripts/socket_load.sh, make socket-load, Linux-only) driving N concurrent qsl-clients against the portable TCP and epoll (M34) gateways; results/socket_load_summary.txt is Docker-generated and constrained. A /code-review (3 finder agents) caught and fixed real measurement-integrity bugs before the PR: a failed trial's wall=0 no longer poisons the reported best (only trials whose gateway served count toward the min); the completed column reports the WORST per-trial completion, not the last, so partial/total trial failures are surfaced rather than masked; a per-client timeout bounds a hang if the gateway dies; and QSL_LOAD_TRIALS is validated. Post-PR hardening uses fresh monotonic ports per gateway start, retries transient startup/serve failures on new ports, and refuses to write a partial artifact unless QSL_LOAD_ALLOW_PARTIAL=1 is set intentionally; the refreshed artifact records Dirty tree: no. The scaling-shape claim remains constrained to loopback connection setup, not a demonstrated production-capacity advantage for either transport. Deferred follow-up: a shared scripts/lib to remove the dirty-tree / wait_ready / gateway-stop duplication across the three socket scripts.
[2026-06-03] M35: started after M34 (#98) squash-merged (commit 9e3750b). Scope: multi-client load / socket-pressure testing of the gateway/feed path (TCP/UDP stress, socket-buffer pressure, connection scaling, backpressure) building on M34's epoll multi-client path and M30's socket tooling. Constraints: scripts/tests document load shape + environment; results must distinguish kernel/socket pressure from user-space engine cost; no production-capacity claims (honest constrained-environment framing, like M29/M30).
[2026-06-04] M35: PR #100 squash-merged to main as a86b701 after all CI jobs and review checks were green. M35 is now landed; original M36 NUMA remains deferred until the repository-health refactor analysis is completed or explicitly skipped by the human.
[2026-06-04] Project-memory sync after M35: PR #101 squash-merged to main as 40f9249. It established the CLAUDE.md / AGENTS.md memory relationship, exact Branch: handling for /start-milestone, and the guard that repository-health planning happens before original M36 NUMA. No CodeScene/MCP analysis has started, no refactor milestones have been inserted, and no NUMA work has started.
[2026-06-04] Post-merge project-memory sync PR #102 squash-merged to main as 7092423.
[2026-06-04] Repository-health refactor phase inserted after M35 (docs-only roadmap renumber, branch docs/roadmap-health-refactor-insertion). A CodeScene Code Health analysis (project 80913, via MCP) of all production and test files found eleven files below 9.0: production epoll_server.cpp 5.35, pipeline.hpp 7.13, shrink.cpp 8.15, order_book.cpp 8.55, session.cpp 8.99; tests test_risk_gateway 6.69, test_order_book 7.32, test_pipeline 8.28, test_backpressure 8.44, test_invariants 8.45, test_matching_engine 8.54. These became seven refactor milestones M36–M42: M36 epoll decomposition, M37 pipeline stage helpers (+ its concurrency tests), M38 shrinker passes, M39 order-book matching parameters, M40 engine test-suite consolidation (test-only; split out per the human), M41 session frame dispatch, M42 shared shell-script helpers (manual — CodeScene cannot score shell; the M35 deferred follow-up). The original networking/persistence roadmap shifted after those refactors; the later systems-roadmap audit extends future scope to M43–M49. Behavior-preserving refactors only; determinism/replay/differential/integer-tick pricing remain invariants. No implementation started.
[2026-06-04] M36: decomposed the Linux epoll transport while preserving public API and wire/session behavior. src/gateway/epoll_server.cpp now separates listener setup, accept outcome handling, ready-event dispatch, client fd-generation lookup, read/write outcomes, output-budget/backpressure handling, and re-arm/close decisions. CodeScene MCP score improved from 5.35 to 10.0 with no remaining review findings; final verification passed git diff --check, Docker Ubuntu Linux test_epoll_gateway (8 test cases / 280 assertions), make check 191/191, and make asan 191/191. PR #104 squash-merged as 0d2b97a.
[2026-06-04] M37: started on refactor/m37-threaded-pipeline-stage-helpers from updated main (0d2b97a). Scope is behavior-preserving threaded-pipeline helper extraction: decompose include/qsl/concurrency/pipeline.hpp and reduce assertion duplication/nesting in tests/concurrency/test_pipeline.cpp and tests/concurrency/test_backpressure.cpp. DoD requires Code Health >= 9.0 for all three files plus make check, make asan, and make tsan.
[2026-06-04] M37: completed behavior-preserving threaded-pipeline decomposition and opened draft PR #105. ThreadedPipeline::run now uses PipelineRunOptions, a run context, and named input/engine/output stage helpers; concurrency tests use shared assertion and producer/consumer helpers without changing scenarios. CodeScene re-score: pipeline.hpp 7.13 -> 10.0, test_pipeline.cpp 8.28 -> 10.0, test_backpressure.cpp 8.44 -> 10.0. PR review feedback fixed reused-probe accounting so shared PipelineProbe counters stay live/cumulative while each returned PipelineResult reports per-run backpressure deltas. Verification passed git diff --check, focused test_pipeline (12 cases / 629 assertions), make check 192/192, make asan 192/192, and make tsan 20/20.
[2026-06-04] M37: PR #105 squash-merged to main as a8c0485 after CI, CodeScene, Codex, and CodeRabbit passed.
[2026-06-04] M38: started on refactor/m38-shrinker-reduction-passes from updated main (a8c0485). Scope is behavior-preserving shrinker decomposition: split src/replay/shrink.cpp shrink / renumber into named reduction passes and id-remap helpers while preserving deterministic, byte-identical shrink output. DoD requires Code Health >= 9.0 for shrink.cpp, shrinker tests + OCaml differential suite green, make check, and PROGRESS.md updated.
[2026-06-04] M38: completed behavior-preserving shrinker decomposition and opened draft PR #106. renumber now delegates duplicate-registration checks, referenced-symbol collection, symbol remapping, and order-id remapping to named helpers; shrink now runs named reduction passes for contiguous removal, quantity simplification, price simplification, and renumbering while preserving the original pass order and fixed-point semantics. CodeScene re-score: src/replay/shrink.cpp 8.15 -> 10.0 with no findings. Deterministic-output verification passed focused test_shrink (6 cases / 23 assertions), test_oracle_selftest (1 case / 7 assertions), make check-fixtures, make check-manifest, dune runtest --root ocaml, make check 192/192, and make asan 192/192.
[2026-06-04] M38: PR #106 squash-merged to main as 9ccf157 after CI, CodeScene, Codex, and CodeRabbit passed.
[2026-06-04] M39: started on refactor/m39-order-book-matching-parameters from updated main (9ccf157). Scope is behavior-preserving order-book matching decomposition: collapse match_against / count_matches parameter lists into context structs and extract fill-loop structure while preserving byte-identical deterministic event streams, snapshots, integer-tick prices, and wall-clock-independent matching. DoD requires Code Health >= 9.0 for src/engine/order_book.cpp, replay/differential/property/invariant tests green, make check, and make asan.
[2026-06-04] M39: completed local behavior-preserving order-book decomposition. match_against now takes a MatchContext, count_matches takes a MatchQuery, and fill/erase/level lookup helpers isolate the matching loop without changing deterministic engine output. CodeScene re-score: src/engine/order_book.cpp 8.55 -> 9.68; the only remaining finding is the unchanged public OrderBook::add_limit argument count. Local warning hygiene also renamed replay RecordType::Command / Event to CommandRecord / EventRecord while preserving numeric log values 1 / 2.
[2026-06-04] M39: opened draft PR #107 from refactor/m39-order-book-matching-parameters after git diff --check, focused engine/replay tests, fixture/manifest checks, dune runtest --root ocaml, make check 192/192, make asan 192/192, and CodeScene pre-commit/change-set quality gates passed.
[2026-06-04] M39: fixed Codex review feedback on PR #107 by making find_level return only the requested side and adding quantity_at is side-specific at the same price coverage. Post-fix verification passed focused test_order_book (20 cases / 87 assertions), make check 193/193, make asan 193/193, and CodeScene score remained 9.68.
[2026-06-05] M39: PR #107 squash-merged to main as 880fbc7.
[2026-06-05] M40: consolidated the four sub-9.0 engine/risk test suites to Code Health >= 9.0 — test_risk_gateway.cpp 6.69 -> 10.0, test_order_book.cpp 7.78 -> 9.09, test_invariants.cpp 8.45 -> 10.0, test_matching_engine.cpp 8.54 -> 9.38 — by extracting behavior-named helpers (expect_reject / expect_accepted_one / expect_trade / expect_top / expect_strictly_increasing / expect_storage_equivalent, a shared Gateway fixture, and per-invariant classify_command / check_events / check_book_invariants), de-duplicating near-identical TEST_CASEs, and flattening the deep property-loop nesting in the invariants suite. Test-only: git diff --stat main touches only the four test files + PROGRESS.md (no include/ or src/). All TEST_CASE names/scenarios, deterministic seeds (1-8, 11/22/33, 1-6), and non-vacuity guards preserved; a few previously-partial trade/reject assertions were strengthened to the full verified contract — one (modify emits OrderModified and any resulting trades) was caught by make check as a wrong quantity (5 vs the post-reduce 3) and corrected. make check 193/193 and make asan green.
[2026-06-05] M40: fixed PR #108 review feedback by collapsing expect_trade expected fields into an ExpectedTrade value object, clearing the CodeScene excess-arguments finding while preserving the same trade assertions. Also reconciled stale M39/M40 PROGRESS.md anchors so PR #108 is the single current action. Verification passed focused matching-engine tests, git diff --check, make check 193/193, make asan 193/193, and CodeScene score for test_matching_engine.cpp is 9.38.
[2026-06-05] M40: PR #108 squash-merged to main as b939730.
[2026-06-05] M41: started on refactor/m41-session-frame-dispatch from updated main (b939730). Baseline CodeScene for src/gateway/session.cpp: 8.99; finding is Session::process_frame complexity 15. Pre-refactor ./build/dev/tests/test_session passed 11 cases / 36 assertions.
[2026-06-05] M41: completed behavior-preserving session dispatch decomposition. Session::process_frame now delegates to per-message helpers for new order, cancel, heartbeat, and unexpected-message paths while preserving malformed-body logout, output-limit logout, bounded new-order preview before mutation, and response ordering. CodeScene for src/gateway/session.cpp: 8.99 -> 10.0 with no findings. Verification passed rebuilt ./build/dev/tests/test_session 11 cases / 36 assertions, git diff --check, make check 193/193, and make asan 193/193.
[2026-06-05] M41: opened PR #109 (refactor: simplify gateway session frame dispatch). No manual CodeRabbit review was requested.
[2026-06-05] M41: PR #109 squash-merged to main as 68061e6.
[2026-06-05] M42: started on refactor/m42-shared-shell-script-helpers from updated main (68061e6). Initial scope was behavior-preserving shell-helper extraction across scripts/socket_load.sh, scripts/socket_stress.sh, scripts/profile_gateway_io.sh, scripts/perf_record.sh, and scripts/perf_stat.sh; shell is not CodeScene-scored. The human later expanded PR #111 to address issue #99 and issue #110 in the same PR.
[2026-06-05] M42: completed shared shell-helper extraction by adding scripts/qsl_common.sh for safe repo-relative dirty-tree exclusions, CPU/compiler/git/date metadata helpers, Linux guard helpers, TCP readiness probes, and process stop escalation. The five target scripts now source that helper while preserving workload logic and artifact fields. Initial local verification passed bash -n scripts/qsl_common.sh scripts/socket_load.sh scripts/socket_stress.sh scripts/profile_gateway_io.sh scripts/perf_record.sh scripts/perf_stat.sh, helper external/internal path smoke check, git diff --check, make check 193/193, make asan 193/193, reduced temporary-output make socket-stress, direct Linux-only script guard checks, and Makefile Linux-only target guard checks. Later Docker validation is recorded below.
[2026-06-05] M42: opened draft PR #111 (refactor: extract shared shell-script helpers).
[2026-06-05] M42 expanded scope: issue #110 is addressed by adding FixtureExportRequest / FixtureExportMode and write_fixture_export, so src/replay/fixture.cpp owns stream fixture export mode semantics while apps/qsl-export-stream/main.cpp only parses CLI arguments and calls the library entrypoint. Existing export bytes are covered by fixture-dispatch regression tests and fixture/OCaml checks.
[2026-06-05] M42 expanded scope: issue #99 is addressed for the portable TCP transport by routing TcpServer::serve_connection through the bounded Session::on_bytes overload with TcpServerOptions::max_response_bytes defaulting to the epoll hard cap (8 MiB, 0 disables). The new TCP regression proves an already-accepted heartbeat reply is flushed while an over-budget high-fanout sweep is rejected before gateway mutation.
[2026-06-05] M42 Docker validation after Docker became available: copied the current working tree into a fresh in-container git repo to avoid host worktree .git indirection, then ran reduced-output Linux make socket-load, make profile-io, and make socket-stress with output redirected to /tmp; all produced constrained artifacts with Dirty tree: no. make perf-stat and make perf-record were also exercised in QSL_PERF_ALLOW_PARTIAL=1 mode using Ubuntu's packaged perf; Docker Desktop denied cycles and cpu-clock events, so the scripts correctly emitted constrained/partial artifacts with Dirty tree: no and no PMU/hot-profile claim.
[2026-06-05] M42 review fix: qsl-export-stream now reports clean usage errors (exit 2) for missing or invalid numeric CLI arguments instead of allowing parse exceptions to abort the process. Added CTest coverage for prop without a seed, an invalid seed, and invalid orders. HANDOFF.md no longer contains the stale M39/PR #107 active-priority line.
[2026-06-06] M42: PR #111 squash-merged to main as 003504f.
[2026-06-06] Follow-up branch feat/close-storage-flow-tcp-followups started from post-M42 main to close #95, #28, and #26 in one feature PR. Scope is explicit intrusive OrderPool-backed resting-order storage, a richer deterministic synthetic flow model, and a portable threaded TCP accept path. M43 NUMA remains next after this follow-up lands.
[2026-06-08] PR #112 squash-merged to main as 2369f84, closing #95, #28, and #26.
[2026-06-08] Roadmap audit branch docs/systems-roadmap-audit started from post-PR #112 main. Scope is documentation-only: expand M43 for CPU affinity/scheduler migration/core-cache locality, expand M44 in place for ingress memory ordering and false sharing because it already owned ingress contention, insert M47 for contiguous order-book storage/cache-locality, shift late DPDK/NIC research to M48/M49, keep issue #94 external review highly visible, and document rejected low-signal additions. Completed milestone history and merged PR references are not rewritten.
[2026-06-08] PR #113 squash-merged to main as f3cc4dd; M43 started on feat/m43-numa-awareness-study. Scope is Linux CPU-affinity / scheduler-migration / NUMA-locality study tooling and docs, with artifacts self-classified as full-linux-numa, linux-constrained, or unsupported-host.
[2026-06-08] M43: implemented make numa-study and scripts/numa_affinity_study.sh with early env validation, repo-local output exclusion for dirty-tree metadata, Linux-only Makefile guard, taskset pinned/unpinned attempts, optional perf stat for context-switches,cpu-migrations, topology capture via lscpu/numactl where available, and artifact self-classification. Docker Desktop/LinuxKit generated results/numa_affinity_study.txt as Evidence class: linux-constrained from clean source commit 40919de: taskset succeeded, numactl topology was unavailable, perf was unavailable, and no full NUMA or production-latency claim is made. Verification passed bash -n scripts/numa_affinity_study.sh, bad-env parser checks, macOS Linux-only guard, Docker constrained make numa-study, Docker absolute-output dirty-tree check, git diff --check, and make check 204/204. make asan was not run because M43 changed Bash/docs/results only.
[2026-06-08] M43: opened PR #114 (docs: study NUMA and CPU affinity effects).
[2026-06-09] M43 review fixes: numa_affinity_study.sh now records dynamic build-type metadata, refuses to bless constrained artifacts when benchmark runs fail, requires pinned and unpinned perf counter capture before full evidence, and requires actual node-local/remote NUMA binding before full-linux-numa classification. AGENTS.md and CLAUDE.md command lists now include make numa-study. Regenerated results/numa_affinity_study.txt from clean source commit 4e7c598 with Dirty tree: no, Evidence class: linux-constrained, explicit constrained rerun command, benchmark completion status, and NUMA binding status. Verification passed parser checks, macOS Linux-only guard, Docker failing-benchmark regression, Docker absolute-output dirty-tree/rerun-command check, Docker constrained artifact regeneration, git diff --check, and make check 204/204.
[2026-06-09] M43 review follow-up: make numa-study now lets scripts/numa_affinity_study.sh emit the promised unsupported-host artifact on non-Linux hosts instead of stopping at a Makefile guard, while still exiting 2. NUMA_NODES now falls back to the parsed numactl --hardware node list when lscpu lacks NUMA node(s), so valid local/remote binding evidence is not mislabeled solely because lscpu is incomplete. Regenerated results/numa_affinity_study.txt from clean source commit d77c98a with Dirty tree: no. Verification passed parser checks, QSL_NUMA_OUT=/tmp/... make numa-study unsupported-host artifact check, Docker failing-benchmark regression, Docker synthetic numactl node-count fallback, Docker constrained artifact regeneration, git diff --check, and make check 204/204.
[2026-06-09] M43 provenance follow-up: numa_affinity_study.sh now writes Artifact provenance: generated from the clean source commit above; output path excluded from dirty-tree check: ... in both supported and unsupported artifacts. Regenerated results/numa_affinity_study.txt from clean source commit b3f316f with Dirty tree: no and the explicit output-path exclusion. Verification passed parser checks, unsupported-host provenance through make numa-study, Docker failing-benchmark regression, Docker synthetic numactl fallback, Docker constrained artifact regeneration, git diff --check, and make check 204/204.
[2026-06-09] PR #114 squash-merged to main as 29ed491; M44 started on feat/m44-ingress-memory-ordering-false-sharing. Scope is benchmark-only packed-vs-padded SPSC queue-cursor contention evidence plus memory-ordering/false-sharing documentation. The production SpscRing layout and deterministic matching ownership must remain unchanged.
[2026-06-09] M44: added make false-sharing-study, scripts/run_false_sharing_study.sh, and qsl-bench false-sharing. The benchmark compares benchmark-only packed vs cache-line-padded SPSC queue cursor layouts with producer-owned tail / consumer-owned head release/acquire traffic; production SpscRing layout and matching ownership are unchanged. Regenerated results/false_sharing_study.txt from clean source commit e15a4ed with Dirty tree: no and Evidence class: research-notes. Verification passed bash -n scripts/run_false_sharing_study.sh, make false-sharing-study, git diff --check, make check 204/204, make asan 204/204, and make tsan 20/20 concurrency-labelled tests.
[2026-06-09] M44: opened PR #115 (perf: study ingress memory ordering and false sharing).
[2026-06-09] M44 review fixes: scripts/run_false_sharing_study.sh now emits the required Dataset metadata line and clarifies that the committed artifact is generated from the clean source commit above in a later artifact-only commit. Regenerated results/false_sharing_study.txt from clean source commit 2838a90 with Dirty tree: no.
[2026-06-09] M44 review fixes: qsl-bench now links Threads::Threads when benchmarks are enabled, so the two-thread false-sharing benchmark has portable pthread flags on Linux/libstdc++ toolchains. docs/concurrency_model.md now states that the benchmark-only padded control uses 128-byte separation, while production SpscRing still pads to 64 bytes and is not validated by this artifact on wider-coherency-line hosts. Regenerated results/false_sharing_study.txt from clean source commit 1b2f342 with Dirty tree: no.
[2026-06-09] M44 review fixes: scripts/run_false_sharing_study.sh now records the compiler from the bench preset's CMAKE_CXX_COMPILER rather than c++ from PATH. Regenerated results/false_sharing_study.txt from clean source commit f02b8ac with Dirty tree: no.
[2026-06-10] M44 review fixes: false-sharing artifacts temporarily added a machine-checkable source-tree hash while the generated output was excluded from dirty-tree checks. This commit-hash-oriented workaround is superseded by the 2026-06-11 source-digest provenance policy.
[2026-06-11] M44 review follow-up: results/false_sharing_study.txt was regenerated from the current PR head as an interim fix. This is superseded by the source-digest policy below, which makes declared source inputs, not branch-only commit objects, the authoritative provenance identity.
[2026-06-11] Artifact provenance process fix: to eliminate repeated stale-commit review churn, migrated artifacts use Provenance version: 1 with Source digest as the authoritative identity and Git commit (informational) as non-authoritative context. The valid stale-artifact checks are source-digest mismatch or Dirty inputs: yes, not commit-hash equality after rebase/squash. M45A was intentionally narrow and converted only the current pain points (make false-sharing-study and make numa-study); M45B migrates perf, socket, allocator, storage, differential, and core benchmark artifacts after the schema was proven.
[2026-06-11] M45A verification in PR #115: bash -n scripts/qsl_common.sh scripts/run_false_sharing_study.sh scripts/numa_affinity_study.sh, helper regressions for stable output exclusion / dirty-input detection / external output paths, make false-sharing-study, Docker QSL_NUMA_ALLOW_CONSTRAINED=1 make numa-study, git diff --check, and make check 204/204 passed. make asan was not rerun because this slice changed Bash/docs/results only.
[2026-06-11] M45A Codex review fix in PR #115: qsl_source_digest now uses a portable SHA-256 stdin helper (sha256sum first, shasum -a 256 fallback) so Linux/coreutils containers without Perl shasum can still generate provenance. Regenerated results/false_sharing_study.txt and Docker results/numa_affinity_study.txt from the fixed source with Dirty inputs: no. Verification re-ran shell syntax checks, helper regressions, make false-sharing-study, Docker constrained make numa-study, git diff --check, and make check 204/204.
[2026-06-11] PR #115 squash-merged to main as cd05b37. M45B process follow-up started on perf/m45b-artifact-provenance-migration to migrate the remaining perf, socket, allocator, storage, differential, and core benchmark artifact generators to the source-digest provenance schema. This follow-up is not M45 persistence work and does not renumber the roadmap.
[2026-06-11] M45B migrated run_benchmarks.sh, run_diff_benchmarks.sh, run_allocator_experiment.sh, run_storage_benchmarks.sh, perf_stat.sh, perf_record.sh, profile_gateway_io.sh, socket_load.sh, and socket_stress.sh to Provenance version: 1. Regenerated results/latest.txt, differential.txt, allocator_experiment.txt, pool_backed_storage.txt, perf_stat_linux.txt, perf_report_linux.txt, socket_profile_loopback.txt, socket_load_summary.txt, and socket_stress_summary.txt with Dirty inputs: no. Verification passed shell syntax checks, source-digest helper regressions, old-provenance scan over results/*.txt, make bench, make bench-diff, make bench-allocator, make bench-storage, make socket-stress, Docker constrained make perf-stat, Docker constrained make perf-record, Docker make profile-io, Docker make socket-load, git diff --check, and make check 204/204.
[2026-06-11] M45B opened PR #116 (perf: migrate artifact provenance metadata). Do not merge from automation; wait for Codex no-bugs review before treating M45B as complete.
[2026-06-11] PR #116 squash-merged to main as b9ea27a, completing M45B. M45 started on feat/m45-persistence-prototype. Scope: durability strategy beyond the current append-only lab log, WAL analysis, and automated crash/recovery validation. Constraints: no production-durability claims; deterministic replay, integer-tick prices, and wall-clock-independent core remain invariants; M46 recovery benchmarking is out of scope here.
[2026-06-11] M45: EventLogWriter gains an explicit caller-chosen DurabilityMode (BufferedOnly / FlushOnAppend / FsyncOnAppend) plus a sync() group-commit point; the default FlushOnAppend preserves pre-M45 behavior so existing call sites are unchanged. Fsync uses F_FULLFSYNC on macOS with fsync fallback, and FsyncOnAppend also fsyncs the parent directory at open so a new log's directory entry is durable, not just its bytes.
[2026-06-11] M45: recovery classifies a log tail as CleanTail / TornTail / Corrupt (recover_log, recover_log_file). A partial next-record header is torn; once a full header has declared a payload size, a truncated frame is corrupt because that untrusted size could span later valid records. BadChecksum is torn only when the failing frame ends exactly at end of file; PayloadTooLarge headers are never trusted. repair_log_file truncates torn tails to the last valid record boundary and fsyncs the truncation; it refuses Corrupt logs because truncating mid-file damage would silently discard acknowledged records beyond it — a human decision, not automation. The fsync-mode contract: an acknowledged append is never removed by tail repair unless the storage stack lied.
[2026-06-11] M45: qsl-replay gains recover <file> [--repair] and append-loop <file> <buffered|flush|fsync> [max_records] subcommands; argument parsing moved to exception-free from_chars (fixing the previously unguarded std::stoull generate-seed parse, the same bug class M42 review fixed in qsl-export-stream).
[2026-06-11] M45 CodeScene cleanup: qsl-replay command dispatch now uses local command-line, request, path, seed, repair, and max-record value objects instead of raw primitive/string helper arguments. Local CodeScene review scores apps/qsl-replay/main.cpp 10.0 with no findings, addressing the hosted primitive-obsession and string-heavy argument findings without changing the CLI or persistence semantics.
[2026-06-11] M45: added make crash-recovery / scripts/crash_recovery_validation.sh (portable Linux/macOS): SIGKILLs live append-loop writers mid-stream per durability mode and asserts recovered records ∈ [acked, acked+1] for flush/fsync, repairs provably torn tails to a clean appendable log, and refuses ambiguous full-header truncations as corrupt instead of auto-repairing them; the buffered trial demonstrates (without asserting) acknowledged-data loss. Committed results/crash_recovery_validation.txt with Provenance version: 1 and Dirty inputs: no; the latest run showed buffered mode losing 94 acknowledged records under SIGKILL, one torn tail repaired to a clean appendable log, and one explicit ambiguous full-header fixture refused as corrupt while all flush/fsync trials preserved every acknowledged record. The artifact is explicitly process-kill evidence only: SIGKILL leaves the page cache intact, so power-loss/OS-crash durability is exercised but not falsifiable and is not claimed.
[2026-06-11] M45: unit tests extend test_event_log.cpp with a truncation sweep at every byte offset (exact valid-prefix recovery + classification), final-record vs mid-file checksum-damage classification, untrusted-header corruption, in-range full-header truncation corruption, repair semantics (torn repaired/appendable, corrupt refused/untouched, clean no-op), durability-mode round trips, and missing-file recovery. 219/219 with make check and make asan.
[2026-06-12] M45 PR #117 review fixes: qsl-replay now recognizes known subcommands before the replay fallback and enforces exact arity for generate, recover, and append-loop; CTest covers missing/extra operand failures. scripts/crash_recovery_validation.sh now includes include/qsl/protocol/endian.hpp in the source-digest scope and constructs an explicit corrupted in-range payload-size fixture so the artifact's ambiguous full-header repair-refusal claim is directly exercised. The harness also captures qsl-replay recover exit statuses: clean tails must exit 0, torn/corrupt tails must fail before repair, torn repair must produce a clean log, and corrupt repair must fail.
[2026-06-11] M45: docs/persistence.md documents the buffering-layer ladder per mode, the tail-classification/repair contract, parent-directory fsync requirement for newly created logs at the first durable point, the residual final-record-BadChecksum-vs-bit-rot ambiguity, and a WAL analysis: the lab log is log-behind (gateway acks are not coupled to durability), and closing that gap was deliberately rejected for M45 as a pipeline rearchitecture for a durability property the simulator does not claim. ADR 0011 records the durability-mode and repair-only-provably-torn decisions. M46 will measure full-replay recovery cost before any segmentation/snapshot design.
[2026-06-11] PR #117 squash-merged to main as d10bfb0, completing M45. M46 started on feat/m46-recovery-benchmarking. Scope: recovery benchmarking — replay performance, snapshot restoration performance, and recovery-objective framing. Constraints: benchmarks generated only by committed scripts with Provenance version: 1 metadata and dirty-inputs state; docs must state exactly what recovery objective was measured; no production-recovery or RTO claims beyond the measured synthetic workloads.
[2026-06-12] M46: added OrderBook::resting_orders() / MatchingEngine::resting_orders(symbol) — deterministic priority-order enumeration of resting state (bids best-first then asks, FIFO within level) across all three storage modes, with Order::operator==. Re-adding the sequence into an empty book reproduces levels and intra-level time priority exactly; unit tests cover per-mode enumeration, partial-fill/cancel/priority-losing-modify effects, and a generated-flow rebuild-equivalence test. This is the minimal read-only API a snapshot path needs; no snapshot persistence was added.
[2026-06-12] M46: added qsl-bench recovery (benchmarks/bench_recovery.cpp), scripts/run_recovery_benchmarks.sh, and make bench-recovery producing results/recovery_benchmarks.txt. Measured phases per log length (5k/20k/80k commands): recover_log_file read+verify+classify, replay decode+apply into a fresh engine, and the combined full restart; plus a benchmark-only in-memory snapshot-restoration prototype (capture resting state, rebuild book) at the flow's live state and at controlled synthetic depths (1k/10k/50k resting orders), because the realistic flow leaves only ~24–37 resting orders and per-order timings on such small books are noisy. Every phase self-verifies against the reference snapshot and the harness aborts rather than report numbers from a wrong rebuild. The committed artifact (clean declared inputs, Dirty inputs: no) shows full-replay restart cost linear in history (237–286 ns/command end-to-end on this host; ~23 ms at 80k commands) while book rebuild is linear in live state (109–187 ns/order; ~5.4 ms at 50k resting orders). Explicitly framed: restart cost (RTO-style) measured here; loss bounds (RPO-style) belong to the M45 durability modes; prototype numbers are an in-memory lower bound (no serialization, disk I/O, or tail replay); no production recovery-time claim.
[2026-06-12] M46 CodeScene pass: an initial bench_recovery_at_size scored 8.23 (complex/long method, bumpy road, five arguments, complex conditional); decomposed into named reference/ log-writing/restart-phase/capture-rebuild/verification helpers shared by the flow and depth scenarios. benchmarks/bench_recovery.cpp now scores 10.0 with no findings; src/engine/order_book.cpp stays 9.68 and src/engine/matching_engine.cpp stays 9.09 (both unchanged from pre-M46 baselines). The artifact was regenerated from the refactored clean source in an artifact-only commit so its Source digest matches the committed generator.
[2026-06-12] PR #118 squash-merged to main as aeba72c, completing M46. M47 started on feat/m47-contiguous-order-book-storage. Scope: flat/contiguous order-book storage study against baseline, PMR pooled, and intrusive pooled modes — explicit symbol/price-domain assumptions, replay/differential equivalence (identical event streams, EngineSnapshot, last_seq), engine-level benchmark artifacts from committed scripts, cache-locality analysis only where tooling supports it, and honest documentation of negative/neutral results. No speedup or cache-locality claim without measured evidence; matching determinism and integer-tick prices remain invariants.
[2026-06-12] M47 implemented OrderBook::Storage::Contiguous: a fixed direct price-index band [1, 1024], occupancy bitmaps for best-level discovery, and contiguous per-level FIFO vectors. Out-of-band prices may still cross in-band liquidity, but GTC remainders that would rest outside the band are refused before engine mutation. Baseline remains default.
[2026-06-12] M47 verification passed focused test_matching_engine and test_order_book, make check 225/225, make asan 225/225, and make bench-storage. The regenerated results/pool_backed_storage.txt records source digest sha256:9bc7cc42609e75feacbf9f3db7b2c10e27104c4e38428b6a71f8875e11ca122c and Dirty inputs: no; the artifact supports no portable speedup claim.
[2026-06-12] M47 CodeScene follow-up: storage-mode public dispatch was consolidated through internal helpers, and contiguous fill_count was split into narrower match-count helpers to reduce repeated wrapper structure and local branch complexity without changing semantics.
[2026-06-12] M47 PR #119 review fix (Codex P2): a contiguous reprice whose re-add remainder would rest out of band was refused inside OrderBook::modify after the engine had already emitted OrderModified, so the event stream could report a modify the book never applied. The refusal is now visible before the event: OrderBook::can_apply_modify (true for baseline/PMR/intrusive, band check for contiguous) pre-gates MatchingEngine::modify exactly like can_store_limit pre-gates new_limit, the gateway rejects such modifies with structured StorageExhausted, and the store-level refusal remains as defense in depth for direct book callers (the original order keeps resting). Side effect: the engine-level pre-gate cleared the pre-existing CodeScene Code Duplication finding between MatchingEngine::new_market and modify. New tests pin the engine no-event/no-seq refusal, the crossing out-of-band reprice that still applies, the gateway rejection, and the direct-book refusal.
[2026-06-12] M47 PR #119 CodeScene gate fix: adding the modify pre-gate pushed src/engine/order_book.cpp to the brain-class function-count threshold, so the M47 ContiguousStore moved to an internal header src/engine/contiguous_store.hpp (included only by order_book.cpp; no public API or CMake change). Both files score 9.68 with only the pre-existing argument-count findings, and analyze_change_set vs origin/main passes the quality gates.
[2026-06-12] M47 PR #119 final local validation: split the new modify-pre-gate regression assertions into small helpers so the tests pin event-stream/book-state consistency without introducing new CodeScene assertion-block noise. Focused test_matching_engine, test_order_book, and test_risk_gateway passed; git diff --check, make check 229/229, and make asan 229/229 passed locally.
[2026-06-12] M47 PR #119 Codex follow-up: direct OrderBook callers using contiguous storage now run the out-of-band residual preflight before matching, so a partially crossing GTC order that would leave an un-restable remainder is refused without removing maker liquidity. Added a direct-book regression; focused storage/gateway tests passed, then git diff --check, make check 230/230, and make asan 230/230 passed locally.
[2026-06-12] M47 PR #119 CodeRabbit follow-up: applied the same residual preflight to IntrusiveStore::add_limit — a direct OrderBook{Storage::IntrusivePooled} caller that partially crossed a GTC order and then hit pool exhaustion previously dropped the remainder despite the rest-the-remainder contract; it now refuses the whole order via can_store_limit before matching (engine/gateway callers were already pre-gated, so their behavior is unchanged). Added a direct-book regression that fills the intrusive pool and asserts a no-capacity bid is refused with maker liquidity intact. Narrowed the Storage doc comment so the "preserves matching semantics" claim is scoped to each mode's declared domain (out of it, IntrusivePooled/Contiguous can refuse a GTC remainder the others would rest), per CodeRabbit. Include hygiene: added <utility> to contiguous_store.hpp (uses std::move) and dropped the now-unused <bit> from order_book.cpp (the bit-scan code moved into the header). The Codex contiguous-residual and benchmark-provenance comments were already resolved by the earlier f7c40fe/f0f268b commits on this branch.
[2026-06-12] M47 PR #119 review round 3 fixes (docs/roadmap only; committed artifact numbers unchanged): corrected a benchmark-claim error in docs/pool_backed_storage.md (it named contiguous the fastest row, but the committed artifact has PMR 209.6 < contiguous 222.4 < baseline 273.7 < intrusive 373.0 ns/cmd — PMR is fastest, contiguous second) and fixed stale MILESTONES.md statuses so resume/finish workflows route to the right milestone: M45 is merged via PR #117 (not PR #119), M44 (#115) and M46 (#118) are also merged, and M47 is the active PR #119.
[2026-06-15] M47 follow-up started after PR #119 squash-merged to main as 93d5062. Branch perf/m47-storage-benchmark-diagnosis diagnoses the storage artifact ordering rather than forcing contiguous storage to win. Implemented deterministic storage workload variants (general generated, dense bounded, sparse wide, cancel/modify-heavy, match/traversal-heavy), non-timed workload-shape metrics, median/min/max timing output, and a compact all-mode benchmark-mix equivalence regression. Fixed small intrusive overheads: can_store_limit now returns immediately for IOC or when pool capacity exists, priority-losing modifies erase via the already-found locator instead of doing a second cancel lookup, and rest uses checked locator emplace with cleanup on unexpected insertion failure. Regenerated results/pool_backed_storage.txt through make bench-storage; source digest is sha256:c34b52a84fad30f446938b120ebf9ad0e5c0769f486c3f2015fb9d9f18243b08 and Dirty inputs: no. Focused Docker verification passed the benchmark-mix storage test, CodeScene passed for the changed C++ files and the branch diff, make bench-storage regenerated the artifact from clean source inputs, and final Docker verification passed make check 240/240 and make asan 240/240. Docker Desktop Linux does not provide bare-metal PMU/cache evidence.
[2026-06-15] M47 follow-up review correction (PR #122): a Codex review found the storage benchmark timed the full per-replay run_once, including MatchingEngine construction and the RegisterSymbol prefix. Book construction is eager (OrderBook builds its IntrusiveStore / ContiguousStore in its constructor), so for the pooled modes that prefix runs OrderPool / RawPool free-list initialization over 65536 slots per book — a fixed per-run setup cost that was charged to per-command time and amortized over only ~5k commands, scaling with symbol count and inflating the intrusive mode most. The fix (bench_storage.cpp, run_storage_benchmarks.sh) applies engine construction, the registration prefix, and the end-of-run snapshot outside the timed interval and normalizes over timed commands. A macOS before/after on the same host confirmed the effect is intrusive-specific and scales with symbol count (4-symbol flows dropped ~80-92 ns/cmd, 2-symbol ~45), leaving baseline/PMR/contiguous within noise. The Docker-regenerated artifact (digest sha256:81ff74300a1633d0d9ddaed68f8880f121bd03cddc568ab212056b8eddd53b1b, Dirty inputs: no, informational commit 476ba71) supersedes the c34b52a… artifact and overturns the earlier "intrusive is the slow outlier / PMR-fastest" reading (including the M47 single-flow ranking, which was contaminated the same way): with setup excluded the four modes cluster into a tight ~40-120 ns/cmd band, intrusive and contiguous are the two fastest (trading the lead by workload shape), and intrusive still carries a large fixed init cost the per-command metric deliberately excludes. docs/pool_backed_storage.md interpretation rewritten accordingly.
[2026-06-15] M47 follow-up review correction #2 (PR #122, same root cause as #1): a second Codex finding noted the non-timed characterize shape pass still walked the full command stream and derived top_probe_calls from a formula on total commands, so the dense shape line printed top_probe_calls=20016 against the measured probes/run=20008 (the 2 registration commands x 2 symbols x bid/ask). Root cause: the first fix moved the registration prefix out of the timed path but not the characterization path, leaving the shape line describing a different sequence than the rows. Fix: characterize now applies the registration prefix unobserved and observes only the trading range, sharing one registration_prefix_len boundary with apply_registration and counting probes with the same should_probe predicate over the same range — so commands and top_probe_calls on the shape line match the per-run cmds/probes/run by construction. Audited the class: top_probe_calls and commands were the only live instances (events/resting/ last_seq are immune because RegisterSymbol emits no events and rests nothing), so this closes the registration-prefix accounting class at the source. The timed path is unchanged, so medians moved only within noise; artifact regenerated in Docker Linux (digest sha256:e12d141670f00f56846697529987006e14aedf7bac2c4f44c994e687ec8cc38f, Dirty inputs: no, informational commit d3ed253), and the storage-doc and PR-body tables were rebuilt mechanically from that one artifact.
[2026-06-15] M47 follow-up CodeRabbit nits (PR #122): (1) time_storage now resolves the timed-command count once (identical across reps) and returns early when reps == 0 or a workload has no post-registration commands, avoiding empty-vector indexing / divide-by-zero in the sampling math; (2) docs/benchmarking.md no longer describes storage timing as "full workload replays" -- it times only the post-registration command path, matching the storage doc and harness. Behavior on the real workloads is unchanged (those degenerate inputs cannot occur with the fixed reps and multi-thousand-command workloads). Artifact regenerated in Docker Linux (digest sha256:b606452b1bbff3d1c4eed8f59839701590cfbc824207f7b707c03ca66766353a, Dirty inputs: no, informational commit cf0396f), tables rebuilt from that one artifact; ranking unchanged.
[2026-06-05] Repo review policy: added .coderabbit.yaml to disable CodeRabbit docstring coverage because this repo uses sparse "why" comments rather than blanket function docstrings. CodeRabbit Infer is disabled because the trusted C++ analysis path is CMake/CI/sanitizers/CodeScene and CodeRabbit's Infer run currently lacks the compile context needed for useful C++ analysis.
[2026-06-04] Local MCP/tooling memory: Codex client has CodeScene, Playwright, filesystem, sequential-thinking, memory, Docker, Context7, and node_repl MCP servers configured. Postgres and Perplexity MCP servers are intentionally not configured; do not assume database or Perplexity access unless the human configures them later.
[2026-06-02] M34: started after M33 (#97) squash-merged (commit fe8679a). Scope: Linux epoll gateway architecture prototype only — event-driven multi-client readiness, nonblocking accept/read/write behavior, deterministic Session semantics preserved. Do not start M35 load/socket-pressure testing and do not make production-capacity claims.
[2026-06-02] M34: added EpollServer, a Linux-only event-driven transport with one epoll loop, nonblocking accept4/read/write, per-client outbound buffers, and one existing deterministic Session per connection. qsl-gateway <port> --epoll opts in; the blocking TcpServer remains the default.
[2026-06-02] M34: epoll tests are platform-scoped. macOS verifies unsupported mode; Docker Ubuntu Linux verifies availability, invalid bind-host rejection, and two simultaneous loopback clients handled by one event loop without thread-per-connection design.
[2026-06-02] M34: local verification passed: make check 190/190, make asan 190/190, git diff --check, and Docker Ubuntu Linux test_epoll_gateway 3 tests / 36 assertions.
[2026-06-02] M34: opened draft PR #98; do not merge from the automation side.
[2026-06-03] M34: Codex review of #98 iterated several rounds (CI green throughout), each fix verified on macOS and Linux Docker as noted in the current-state block: read-backpressure; --epoll flag-or-port parsing with whole-token/range/duplicate validation; a soft high-water mark (stop reading) plus a hard outbound cap; O(n²) front-erase flush replaced with a write offset; EINTR-on-send retry; survival of transient accept4 errors (ECONNABORTED/pending network errors) instead of tearing down the loop; bounded high-fanout NewOrder response budgeting before gateway mutation; EPOLLERR immediate close; EPOLLHUP drain of already-readable bytes before close; no EPOLLIN re-arm once a session is closing; fd-generation checks for stale events after fd reuse; and queued-reply preservation when an over-cap frame follows earlier accepted frames in the same read. Issue #99 was opened as a broader follow-up for shared streaming/byte-budgeted response generation outside the epoll-specific bounded path and is addressed later by PR #111.
[2026-06-02] M33: PR #97 squash-merged (commit fe8679a); CI passed all 6 jobs and Codex review found no major issues. M33 delivered deterministic pipeline scheduling perturbation, opt-in repeated concurrency stress, and docs framing TSan/perturbation/stress as evidence rather than proof.
[2026-06-02] M33: started after M32 (#96) squash-merged (commit f122ee8). Scope: advanced concurrency validation only — deterministic scheduling perturbation and/or longer stress modes, stronger concurrency methodology docs, opt-in long-running/Linux checks where appropriate. Do not claim proof; TSan and stress tests remain dynamic evidence over executed schedules.
[2026-06-02] M33: added deterministic PipelinePerturbation yield hooks to the threaded pipeline and a regression test that compares perturbed pipeline output against the single-threaded reference across seeded property flows, queue capacities, and per-stage yield patterns.
[2026-06-02] M33: added make concurrency-stress / scripts/concurrency_stress.sh as an opt-in repeated concurrency-label test loop. Normal CI remains non-flaky; longer local/Linux runs are documented through explicit knobs rather than hidden in the default gate.
[2026-06-02] M33: local verification passed: make check 189/189, make asan 189/189, make tsan 19/19 concurrency tests, QSL_CONCURRENCY_STRESS_LOOPS=2 make concurrency-stress 2/2 loops, bash -n scripts/concurrency_stress.sh, and git diff --check.
[2026-06-02] M33: opened draft PR #97 and triggered @codex review; do not merge from the automation side.
[2026-06-02] M33: PR #97 CI passed all 6 jobs (build-test, sanitizers, thread-sanitizer, determinism, differential-sweep, ocaml-verifier) and Codex review found no major issues.
[2026-06-02] M32: PR #96 squash-merged (commit f122ee8); Codex review found no major issues and CI passed all jobs. M32 delivered PMR-backed order-book node allocation, an engine-level storage benchmark, docs/ADR, and issue #95 for the later intrusive/custom-node OrderPool<Capacity> storage path now handled by feat/close-storage-flow-tcp-followups.
[2026-06-02] M32: started after M31 (#93) squash-merged (commit b7926ac). Corrected scope: integrate pool-backed order-book node allocation using PMR, informed by the M28 allocator experiment, and measure baseline-vs-pool-backed engine-level workloads — not another allocator microbenchmark. Direct OrderPool<Capacity> order-book integration was deferred because the current std::list<Order> design allocated implementation-defined list nodes, not bare engine::Order objects; the later #95 follow-up handles the intrusive/custom-node path separately. Matching must remain deterministic (replay/differential tests stay green); no storage-architecture claim beyond what the committed measured artifact supports; document even a negative result.
[2026-06-02] M32: implemented the scoped PMR path: OrderBook::Storage::{Baseline,Pooled}, per-book std::pmr::unsynchronized_pool_resource, PMR list/map/unordered_map node allocation, and MatchingEngine(OrderBook::Storage) propagation via books_.try_emplace(id, book_storage_). Baseline remains the default.
[2026-06-02] M32: added a generated-flow equivalence invariant test proving baseline and pooled storage produce identical per-command event streams, final EngineSnapshot, and last_seq; the test includes non-vacuity guards for real trades and resting liquidity.
[2026-06-02] M32: added engine-level storage benchmarking (make bench-storage, qsl-bench storage, scripts/run_storage_benchmarks.sh, results/pool_backed_storage.txt). This compares baseline engine storage against PMR pooled node allocation; it is not an isolated allocator-only benchmark and not a production-latency claim.
[2026-06-02] M31: PR #93 squash-merged (commit b7926ac); Codex auto-reviewed on open and reacted 👍 with no findings (docs-only, deliberately honest/conservative). The external-review request was then opened as GitHub issue #94 (labels backlog/documentation/help wanted), satisfying the "review request issue opened" alternative in the M31 DoD; review_feedback.md still records that no external feedback has been received yet.
[2026-06-02] M31: started after M30 (#92) squash-merged (commit 3f88a1f). Scope is the external-review package only: a review-request checklist, a review-feedback template, an optional issue template, and a small README link. Non-negotiable: no fabricated endorsements and no claim that review has happened until it has — review_feedback.md states plainly that no external review has occurred yet.
[2026-06-02] M30: PR #92 squash-merged (commit 3f88a1f) after Codex review converged clean ("Didn't find any major issues") across four rounds of P2 fixes, each verified (happy + negative paths) in containerized Linux and on macOS before re-requesting review: (1) profile_gateway_io.sh fails loudly (no artifact) when strace captures no syscall summary; (2) socket_stress.sh reports loss as published − received (captures end-of-burst tail drops the interior sequence-gap counter misses); (3) profile_gateway_io.sh Pass 2 launches the gateway under strace (launch form) so tracing works under Yama ptrace_scope=1 without CAP_SYS_PTRACE, stopping the traced gateway with SIGTERM/SIGKILL (never strace) so the -c report is flushed; (4) both scripts fail loudly (no artifact) on port conflicts / helper failures.
[2026-06-02] M30: started after PR #91 (post-M29 docs sync) squash-merged to main (commit 86443f0). Scope is Linux kernel/socket-path profiling + socket hardening only; M31–M41 are not implemented here. Also corrected the stale HANDOFF.md "Current handoff" block (it still said "M29 is PR #89 and should land"; the review bot flagged this on #91, which merged before the comment was addressed).
[2026-06-02] M30: added an optional SO_RCVBUF knob to UdpFeedClient(port, recv_buffer_bytes=0) that requests the buffer and reads back the kernel-granted size via getsockopt; qsl-mdfeed subscribe gains [rcvbuf_bytes], qsl-mdfeed publish gains [orders], and the subscriber idle-breaks after a few empty receives so burst experiments terminate promptly. Backward-compatible (defaulted param); unit-tested for monotonic effective-size growth.
[2026-06-02] M30: scripts/socket_stress.sh (make socket-stress) is portable (Linux+macOS) — UDP burst + receive-buffer experiment over loopback, multi-trial. Measured on this macOS host: a 2 KiB buffer drops datagrams under a ~16.9k-datagram burst while the OS default (~768 KiB) and an 8 MiB buffer lose nothing. Loss is reported as published − received (per-trial, varies run-to-run), not a fixed number.
[2026-06-02] M30 (Codex review on PR #92, two rounds): (1) fixed profile_gateway_io.sh to capture strace's exit status and require a real syscall summary, failing loudly (exit 3, no artifact) when strace cannot attach — previously a blocked-ptrace host could write a successful-looking artifact with an empty syscall section (commit 551a9c5). (2) fixed socket_stress.sh to report loss as published − received instead of only the SequenceTracker gap count: an end-of-burst tail drop has no later datagram to reveal a gap, so the gap counter alone undercounts loss; the interior gap count is kept as a secondary signal. Both were P2 findings.
[2026-06-02] M30: scripts/profile_gateway_io.sh (make profile-io) is Linux-only (skips with exit 2 elsewhere, like the M29 perf scripts). It backgrounds the gateway directly (owns its PID), reads rusage from /proc/<pid>/{stat,status} (Pass 1) and attaches strace -f -c (Pass 2) — no GNU time / pkill dependency, with SIGKILL escalation so it cannot hang. The committed results/socket_profile_loopback.txt was generated in containerized Linux (Docker, --cap-add=SYS_PTRACE): user-space matching CPU fell below the 10 ms tick while the measurable CPU was the kernel/socket path, with ~1 voluntary context switch per connection and a syscall mix of exactly accept/read/sendto/close.
[2026-06-02] M30: both socket artifacts are loopback-only, constrained-environment evidence (ADR 0008), same honesty policy as ADR 0007 for perf; the gateway profile carries Dirty tree: yes because it was generated mid-development in a container — regenerate on a clean Linux checkout for a clean-tree version. No NIC/driver/real-network or production-capacity claim is made.
[2026-06-02] M30: deferred the optional epoll adapter to M34 (multi-client pressure to M35) and io_uring to discuss-only. Rationale: epoll is Linux-only and cannot be compiled or tested on the macOS development host, and committing untested platform-specific code violates the no-untested-C++ bar; M30 profiles and hardens the existing one-connection-at-a-time gateway instead of rewriting it.
[2026-06-01] M29: started after M28 merged (PR #88, squash commit 03b4d9a). M29 scope is Linux perf evidence only: scripts/docs/artifacts for perf stat and perf record/report; no engine optimization and no M30 socket profiling.
[2026-06-01] M29: make perf-stat and make perf-record fail before building on non-Linux hosts; Linux scripts capture hardware/kernel/compiler/perf/build/git/dirty-tree metadata and keep generated M29 result files out of the dirty-tree calculation.
[2026-06-01] M29: local verification on macOS passed make check (186/186). Docker Desktop Linux can run perf and qsl-bench, but does not expose hardware PMU counters; committed artifacts must keep that caveat visible instead of substituting unsupported counter values.
[2026-06-01] M29 review follow-up: Docker preflight (perf stat -e cycles,instructions,branches,branch-misses,cache-references,cache-misses -- true) reports all requested hardware counters as <not supported> on LinuxKit, so PR #89 artifacts are constrained-environment validation, not full hardware-PMU evidence.
[2026-06-01] M29: full Linux hardware PMU evidence is backlogged as follow-up issue #90 because current macOS/Docker Desktop environments do not expose the required counters. PR #89 lands the workflow and constrained validation only.
[2026-06-02] M29 review fix: perf scripts now run qsl-bench outside perf first so QSL_PERF_ALLOW_PARTIAL=1 cannot hide benchmark failures; perf record artifacts require a minimum sample count before being labeled hot-symbol profiles, otherwise they are constrained/insufficient-sample validation.
[2026-06-02] M29 review fix: raised the default perf record sample frequency to 2000 Hz so the short default benchmark harness can realistically clear the 100-sample hot-profile floor on hosts where sampling is permitted.
[2026-06-02] M29 review fix: perf_record.sh now parses perf's abbreviated sample counts (K/M suffixes and comma separators), so a valid # Samples: 2K ... report is counted as 2000 samples rather than 2.
[2026-06-02] M29 review fix: dirty-tree metadata now excludes generated perf artifacts only when their output paths are inside the repository; external absolute paths such as /tmp/report.txt are never passed to Git pathspecs, and dirty-check failures abort instead of recording Dirty tree: no.
[2026-06-02] M29 review fix: perf docs now state that default make perf-stat / make perf-record artifacts profile only the default qsl-bench suite; differential (qsl-bench diff) and allocator (qsl-bench pool) workloads require separate explicit perf runs before supporting hotspot conclusions.
[2026-06-02] M29 documentation sync: PR #89 currently contains Linux perf workflow, Linux-only tooling, metadata-rich artifacts, dirty-tree handling, PMU preflight/validation logic, constrained-environment validation, CI validation, and a reproducible workflow; it does not contain real hardware PMU evidence.
[2026-06-02] M29 documentation sync: issue #90 tracks full Linux hardware PMU evidence generation. Current committed artifacts were generated in Docker Desktop Linux where hardware PMU counters and sampling were unavailable/permission-limited, so the repository must not claim real PMU evidence yet.
[2026-06-02] M29 documentation sync: TSan coverage is dynamic-analysis evidence over executed schedules, not a correctness proof over all possible thread interleavings; advanced concurrency validation is added as M33.
[2026-06-02] M29 documentation sync: M28 allocator evidence did not alter order-book storage architecture; pool-backed order-book integration is added as M32 for engine-level memory-architecture evaluation.
[2026-06-02] M29 documentation sync: roadmap extended through M41 (pool-backed storage, advanced concurrency validation, epoll gateway, multi-client socket pressure, NUMA, lock-free ingress, persistence, recovery benchmarking, DPDK, NIC offload study) with priority order #90, M30, M31, M32, M33.
[2026-06-01] M28: added a fixed-capacity OrderPool<Capacity> for engine::Order; exhaustion returns nullptr, releases are validated, and there is no silent heap fallback.
[2026-06-01] M28: added an isolated allocator benchmark path (qsl-bench pool / make bench-allocator) comparing raw operator new/placement construction against pool acquire/release, with full hardware/compiler/build/commit/dirty-tree metadata in results/allocator_experiment.txt.
[2026-06-01] M28: kept order-book storage unchanged; the pool is an allocation experiment for future storage decisions, not a semantic refactor or an end-to-end engine latency claim.
[2026-06-01] M28 review fix: changed OrderPool<Capacity> from default-constructed std::array<engine::Order> slots to raw aligned storage with explicit std::construct_at on acquire and std::destroy_at on release/reset/destructor, preserving per-acquire object lifetime while keeping exhaustion explicit and avoiding heap fallback.
[2026-06-01] M27: started after M26 merged (PR #86, squash commit 8ec4967). Reset the env-designated branch to origin/main rather than create a feat/m27-* branch (the environment forbids pushing to a different branch); the branch now carries only M27.
[2026-06-01] M27: ThreadSanitizer is a separate sanitizer preset because -fsanitize=thread is incompatible with the ASan preset's -fsanitize=address; cmake/Sanitizers.cmake errors if both are enabled at once. Labeled the three tests/concurrency/ executables concurrency so make tsan runs only the genuinely multithreaded tests (TSan on single-threaded tests adds nothing). TSan is a data-race correctness gate, not a performance tool — no benchmark numbers are collected under it.
[2026-06-01] M26: addressed a Codex review on PR #86 — the "backpressure occurred (spins >= 1)" assertions were timing-dependent (std::this_thread::yield() can let the consumer keep pace, so a spin count of 0 is legitimately possible on a fast/lightly-loaded run). Replaced them with a deterministic barrier: added an optional PipelineProbe (live spin counters the pipeline bumps as backpressure happens) and a gated-consumer test that blocks the downstream stage, waits on the probe until BOTH queues have provably back-pressured, then releases — and made the lag/saturation tests correctness-only. Confirmed non-flaky over 50 repeated runs; make check 182/182, make asan 182/182.
[2026-06-01] M26: reconciled stale PROGRESS state — M25 was already merged (PR #85, commit 9360364 on main), not "ready for PR"; M24 (#84) + M25 (#85) confirmed merged, satisfying the M26 prerequisites.
[2026-06-01] M26: the threaded pipeline is a new header-only qsl::concurrency::ThreadedPipeline<InboundCapacity, OutboundCapacity> (include/qsl/concurrency/pipeline.hpp). The engine thread is the sole owner of MatchingEngine+OrderGateway, so the concurrency boundary is a value hand-off and deterministic matching semantics are unchanged (no engine code modified).
[2026-06-01] M26: the downstream publisher/log stage consumes self-contained ProcessedCommand records and never reads the engine — the M6 publisher derives top-of-book from the engine, which would race the engine thread, so keeping the sink engine-free makes "publisher lag cannot corrupt engine state" structurally true, not merely tested.
[2026-06-01] M26: both hand-off queues use the lossless spin/yield policy (orders + event-log/feed records never dropped) and drain-then-stop shutdown via per-stage atomic<bool> done-flags; rings live on run()'s stack and are joined before return (SPSC lifetime bracket). Determinism is proven by asserting the threaded result equals a single-threaded reference and a replay of the concurrently-written command log, across seeds and queue capacities (2..4096) — capacity/timing change only backpressure, never the result.
[2026-06-01] M26: branch-name deviation — the managed remote environment mandates development on claude/serene-fermi-rhuFJ and forbids pushing to a different branch, so M26 keeps the milestone intent (one branch, one PR titled feat: add threaded gateway-engine-feed pipeline prototype) but not the feat/mNN-slug branch name.
[2026-06-01] M25: split the concurrency docs — kept the high-level model (ownership lifecycle, visibility, backpressure, shutdown, limits) in docs/concurrency_model.md and moved the C++ memory-model deep dive (ordering table, happens-before proof both directions, wait-free-by-construction argument) into a new docs/memory_ordering.md, cross-linked, to avoid one drifting from the other.
[2026-06-01] M25: justified the "wait-free per operation" / "lock-free" claim by construction (bounded step count, no loops, no CAS) rather than dropping it, and explicitly scoped it to the queue op — a caller spinning on backpressure is application-level and not wait-free.
[2026-06-01] M25: introduced tests/concurrency/ (separate from tests/unit/) for sustained stress + backpressure/shutdown evidence; deferred dynamic data-race detection (ThreadSanitizer) to M27 per the roadmap rather than adding a TSan preset now.
[2026-06-01] M25: qualified the wait-free claim further so it only covers payload types with bounded, non-blocking copy/move assignment; SpscRing remains SPSC-only and the queue protocol proof does not overpromise for arbitrary T.
[2026-05-31] Cut GitHub-only v0.1.0 release after the release-readiness gate; no packages published.
[2026-05-31] Promoted old issues into the Phase III/IV roadmap (issue → milestone): #24 → M24, #26 → M26, #27 → M27, #25 → M28, #32 → M29 — instead of leaving them as loose backlog.
[2026-05-31] Added Phase IV milestones M29–M31 for Linux perf/socket profiling and external review signal (with M28 memory-pool experiment closing Phase III).
[2026-05-31] Deferred generic product items #28–#31 and #33 (realistic flow model, FIX adapter, dashboard, Docker, Pages) until after the systems-credibility arc; #32 (perf/flamegraph) is promoted to M29.
[2026-05-29] Target Jane Street SWE first and Linux Engineering second; avoid optimizing for IT Operations because it weakens software-engineering signal.
[2026-05-29] Preserve the C++20 exchange simulator as the core project; add OCaml only as a replay-verifier subproject, not as a replacement for the engine.
[2026-05-29] Add Linux performance and socket gateway documentation requirements to strengthen Linux Engineering fit.

Resume framing notes

SWE title

Quant Systems Lab — C++20 Exchange Simulator + OCaml Replay Verifier

Linux title

Quant Systems Lab — Linux Systems + Exchange Infrastructure Simulator

Next action remains

There is no active milestone. v0.2.1 is the current release, on top of v0.2.0 (PR #127 ded6e80) and v0.1.0. The v0.2.1 content is squash-merged to main: the Codex resume-anchor sweep (PR #129), the perf flamegraph #32 (PR #134, superseding the auto-closed #130), the FIX text adapter #29 (PR #131), and the version-bump release PR (#133), with v0.2.1 tagged on the release merge commit. The committed perf artifacts remain partial hardware PMU evidence from this bare-metal Apple M2 (aarch64) Fedora Asahi host — real cycles/instructions/branches/branch-misses with cache-reference/cache-miss counters unsupported by the Apple Silicon PMU — not NIC-offload, latency, or full hardware-PMU evidence.

Highest-value remaining work is non-code and gated: issue #94 (independent external review) and issue #90 (full cache-PMU evidence). Issue #90 needs a PMU microarchitecture that exposes cache counters to Linux (x86_64, or an ARM server core); do not relabel partial artifacts as full evidence. Do not work either from automation; the human drives them.

After each squash merge, return to this file and update state factually. If benchmark numbers are not measured, write not measured. Do not guess. Nobody is impressed by imaginary throughput.

Additive deep-testing roadmap replacing old optional M15

The old optional M15 — Jane Street application polish is removed. It is replaced by technical milestones M15–M20. The purpose is to add actual depth rather than recruiter-facing decoration.

M15 exports normalized command streams and final C++ snapshots.
M16 implements an independent OCaml replay engine.
M17 compares C++ and OCaml final snapshots in differential tests.
M18 adds seeded property-based command generation.
M19 adds shrinking and minimal failing fixture export.
M20 documents the differential testing architecture.

Decision log additions:

[2026-05-30] Removed optional Jane Street application-polish milestone because recruiter-facing polish is lower signal than technical depth.
[2026-05-30] Added M15–M20 to turn the OCaml verifier into independent replay/differential-testing infrastructure.
[2026-05-30] Property-based generation and shrinking are now the intended final “deep idea” layer: the repo should test market-state systems, not merely implement one.
[Roadmap] Repository hygiene is deferred until after Phase II differential testing; no CODE_OF_CONDUCT.md or GOVERNANCE.md unless the project becomes multi-maintainer.
[Roadmap] Packaging is intentionally deferred; the repo is a technical artifact, not an installable product.
[Roadmap] A GitHub v0.1.0 release is optional and only after the release-readiness audit.
[Roadmap] M22 is the Phase II equivalent of M13: final README/demo/docs/readiness polish with conservative claims and reproducible checks.
[M16] OCaml replay_engine.ml is an independent immutable matching engine (price-time, GTC/IOC/market/cancel/modify, gateway risk, active-lifetime ids) that replays the M15 command stream and computes its own snapshot — it does not read the C++ events/snapshot during replay.
[M16] Sequence numbers count emitted events (accept + trades; cancel; modify + trades) so the OCaml last_seq matches the C++ engine; snapshot includes every registered symbol (mirrors try_emplace).
[M16] On the committed stream_seed7.txt, the OCaml-computed snapshot matches the C++ snapshot exactly (last_seq 47, trades 13, per-symbol best bid/ask + counts); the automated C++-vs-OCaml equality check in CI is deferred to M17.
[M17] Differential test compares the OCaml-computed snapshot against the C++ snapshot embedded in each fixture (best bid/ask, level aggregates, order counts, last_seq, trade count) via canonical snapshot_to_lines, printing a readable computed-vs-expected diff on mismatch; runs under the existing ocaml-verifier CI job (no new job).
[M17] Added a C++ qsl-export-stream ioc scenario (hand-built IOC + market + partial-maker) so the differential test covers IOC, which the GTC-only synthetic flow never exercises; OCaml reproduces it exactly (last_seq 9, trades 3).
[M17] A deliberately corrupted-snapshot fixture (stream_bad_snapshot.txt) is asserted to be detected as a mismatch, proving the check fails on divergence.
[M18] Added generate_property_flow (C++) + qsl-export-stream prop <seed> producing enriched seeded streams (IOC, invalid price/qty, duplicate/reused/unknown ids, cancel/modify active+inactive, market, multi-symbol); committed prop_seed1..8.txt (later expanded to prop_seed1..50 in #49). C++ and OCaml snapshots agree on all 8 seeds, exercising every reject reason plus real trades.
[M18] The differential test discovers all prop_*.txt via Sys.readdir and checks snapshot equality + a no-crossed-book invariant per fixture, reporting the failing seed.
[M18] Restored the M17 parser-rejection check (bad_snapshot_level_symbol.txt / expect_parse_error) that the M18 differential-test rewrite had dropped; this also cleared a warning-as-error (unused value) that had broken the OCaml build.
[M18] Broadened negative coverage (stream_bad_lastseq.txt, stream_bad_orders.txt) and added a golden fixture-regeneration guard (scripts/check_fixtures.sh, make check-fixtures, CI build-test step) so committed fixtures must match current C++ output — closing the M17-review gap where the differential could compare OCaml against a stale C++ snapshot.
[M19] Added a deterministic, greedy command-stream shrinker (replay::shrink): chunk removal, single-command removal, and field simplification (lower quantities), iterated to a fixed point, preserving a pluggable failure predicate; count_trades is a gateway-driven predicate helper.
[M19] Demonstrated against an artificial "produces a trade" predicate (the engines currently agree, so there is no real divergence to shrink); the shrinker is predicate-agnostic and a divergence predicate plugs in unchanged.
[M19] qsl-export-stream shrink <seed> writes a minimized differential fixture + shrink report; committed shrunk_seed1.txt (123→5 commands) is golden-checked and replayed by the OCaml differential test. Limitations (greedy/not-globally-minimal, qty-only field simplification, no symbol/id renumbering) documented in docs/differential_testing.md.
[M20] Finalized docs/differential_testing.md with a top-level architecture overview, a mermaid pipeline diagram, a minimized-fixture example, and an explicit "what this proves / does not prove" section (agreement over tested seeds; not formal verification; shared-assumption risk acknowledged).
[M20] Added docs/property_testing.md (generator coverage + delta-debug shrinker + determinism/golden + honest limits); added a README "Differential testing (OCaml)" <60s section + diagram links and conservative differential-testing résumé bullets. Docs-only; no code change.
[M21] Added MIT LICENSE (Copyright (c) 2026 Moustafa Nasr), CONTRIBUTING.md (branch-per-milestone workflow + checks + no fabricated perf), SECURITY.md (no bounty; qsl-gateway/qsl-mdfeed unauthenticated + loopback-only; honest systems-lab-not-production), CHANGELOG.md ([Unreleased] M3–M20 history), and README links.
[M21] Deliberately omitted CODE_OF_CONDUCT.md and GOVERNANCE.md (single-maintainer; no community-process theater) and skipped SUPPORT.md; no packaging and no release (deferred to optional M23).
[M22] Release-readiness audit recorded in docs/release_readiness.md: verified make check/asan/check-fixtures/dune runtest/demo/bench all green, all README + doc-to-doc links resolve, no overclaiming (forbidden phrases appear only as negations or avoid-lists), benchmark language measured/synthetic/reproducible, differential vocabulary distinct. make bench was rerun to confirm reproduction; committed results/latest.txt retained (single-machine variance). No GitHub release created (that is the optional, human-approved M23).
[M20–M22 re-finalization] After the #34–#51 backlog landed, re-ran the Phase II finalization milestones against the current state: M20 differential/property docs + Mermaid diagrams + ADRs refreshed (PR #78); M21 CONTRIBUTING checks (added check-manifest/determinism/bench-diff/divergence-demo) and per-issue branch naming; M22 release_readiness.md re-audited — make check 157/157, asan 157/157, check-fixtures/check-manifest clean, determinism byte-identical across gcc/clang over all 50 property seeds, divergence-demo OK, dune runtest 5 suites, demo clean; benchmarks reproduce and committed results/latest.txt + results/differential.txt retained.
[M17] Snapshot parsing validates per-level symbol ownership so malformed embedded C++ snapshots cannot be normalized into equality.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PROGRESS.md — Quant Systems Lab live state

Current state

Milestone status

Decision log

Measured benchmark results

Mid-milestone scratch

Additive Jane Street Targeting State

Target roles

Current strategic decision

Added milestone rows

Decision log additions

Resume framing notes

SWE title

Linux title

Next action remains

Additive deep-testing roadmap replacing old optional M15

FilesExpand file tree

PROGRESS.md

Latest commit

History

PROGRESS.md

File metadata and controls

PROGRESS.md — Quant Systems Lab live state

Current state

Milestone status

Decision log

Measured benchmark results

Mid-milestone scratch

Additive Jane Street Targeting State

Target roles

Current strategic decision

Added milestone rows

Decision log additions

Resume framing notes

SWE title

Linux title

Next action remains

Additive deep-testing roadmap replacing old optional M15