From 587778dd8453c236e44524b7151e4466cf784cc3 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 02:23:23 -0400 Subject: [PATCH 01/22] docs: sync resume anchors and PMU claims to v0.2.0 (Codex #127/#128 follow-up) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Resolve the Codex review findings left on `main` by PRs #127 and #128: - PROGRESS.md: remove the stale "Next action remains" block that still steered /resume to the merged PR #125 on `perf/linux-host-artifact-refresh`; replace with the v0.2.0 between-releases state (no active milestone; #94/#90 gated). - AGENTS.md: bring it into sync with CLAUDE.md's v0.2.0 partial-PMU reframe. The constraints bullet, the "correct claim" block, and the "M29 perf evidence status" subsection no longer label the artifacts "constrained Docker validation"; the stale `perf/linux-host-artifact-refresh` follow-up line is updated (also in CLAUDE.md) to the released state. - docs/perf_analysis.md: narrow the PMU claim so it no longer implies the Apple Blizzard (E-core) PMU carries live counts. The `apple_blizzard_pmu/...` rows read `` in results/perf_stat_linux.txt because the single-threaded benchmark stays on the Avalanche P-cores — expected scheduling, not a counter. Docs/memory only; no code or artifacts changed. Co-Authored-By: Claude Opus 4.8 --- AGENTS.md | 47 +++++++++++++++++++++++++++---------------- CLAUDE.md | 12 ++++++----- PROGRESS.md | 31 ++++++++++++++++++++-------- docs/perf_analysis.md | 12 +++++++---- 4 files changed, 68 insertions(+), 34 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 429bd1e..3b303db 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -159,7 +159,12 @@ Known constraints: - The gateway and feed are loopback-only, unauthenticated simulator surfaces. - The core engine cannot depend on wall-clock time or floating-point prices. -- M29 perf artifacts are constrained-environment evidence until issue #90 is completed. +- Perf artifacts are now **partial hardware PMU evidence** from a bare-metal Apple M2 (aarch64) + Fedora Asahi host: real `cycles`/`instructions`/`branches`/`branch-misses`, but + `cache-references`/`cache-misses` are unsupported by the Apple Silicon PMU. Issue #90's residual + is the cache-counter set specifically, which needs a PMU microarchitecture that exposes it + (x86_64, or an ARM server core) — bare metal alone is not enough. Do not relabel these as either + "full PMU evidence" or "constrained Docker validation". - Issue #94 external review remains one of the highest remaining credibility signals; do not imply independent review has happened until `docs/review_feedback.md` records it. @@ -1144,10 +1149,14 @@ aesthetic product work before M24-M49 unless the human explicitly changes priori The correct claim after this arc is: > "correctness-first deterministic exchange-systems lab with measured concurrency, allocator, -> constrained Linux perf workflow, and socket-profiling evidence." +> bare-metal partial-PMU Linux perf, and socket-profiling evidence." -Do not claim real hardware PMU evidence until issue #90 is completed on a bare-metal or -PMU-capable Linux target. Current M29 artifacts are constrained-environment validation only. +Real hardware PMU evidence now exists on a bare-metal Apple M2 (aarch64) Fedora Asahi host — +`cycles`/`instructions`/`branches`/`branch-misses` are genuine counters. Do not claim *full* PMU +evidence (the Apple Silicon PMU does not expose `cache-references`/`cache-misses`), and do not call +the current artifacts "constrained Docker validation" either: they are **partial hardware PMU +evidence**. Issue #90's residual is the cache-counter set, which needs a PMU microarchitecture that +exposes it (x86_64, or an ARM server core). The incorrect claims remain forbidden: @@ -1185,18 +1194,20 @@ M29 currently means: - Metadata-rich profiling artifacts exist. - Dirty-tree handling exists. - PMU preflight/validation exists. -- Constrained-environment validation exists. +- Bare-metal partial-PMU validation exists. - CI validation exists. - The workflow is reproducible. -M29 does **not** currently mean real hardware PMU evidence has been captured. The committed -artifacts were generated in a constrained Docker Desktop Linux environment where hardware PMU -counters and sampling were unavailable or permission-limited. Do not claim real PMU evidence at -this time. +M29 now means **partial hardware PMU evidence**: the committed artifacts were regenerated on a +bare-metal Apple M2 (aarch64) Fedora Asahi host (`systemd-detect-virt` reports `none`), where +`perf stat` reads genuine `cycles`/`instructions`/`branches`/`branch-misses` counters off the Apple +Avalanche P-core PMU. They are no longer "constrained Docker validation" — but they are not *full* +PMU evidence either, because the Apple Silicon PMU does not expose `cache-references`/`cache-misses`. -Issue #90 tracks full PMU-backed evidence generation on a bare-metal Linux host or a Linux VM/server -with real `perf_event` hardware counter access. Treat this as: problem identified -> limitation -documented -> follow-up issue created -> acceptance criteria defined. This is intentional +Issue #90 now tracks only that residual: a *full* counter set (including cache events) requires a +PMU microarchitecture that exposes those events to Linux (e.g. an x86_64 Intel/AMD host, or an ARM +server core such as Graviton/Ampere) — not "more bare metal." Treat this as: problem identified -> +limitation documented -> follow-up issue created -> acceptance criteria defined. This is intentional engineering transparency, not a repo deficiency. ## Dynamic-analysis limits @@ -1241,8 +1252,10 @@ M45 exchange-grade persistence prototype; M46 recovery benchmarking; M47 contigu storage and cache-locality study; M48 DPDK research/prototype; M49 NIC offload and low-latency networking study. -Issue #90 remains the full hardware-PMU evidence debt. Issues #99 and #110 were addressed by PR -#111. Issues #95, #28, and #26 were addressed by PR #112. Issue #94 is the external technical -review request and remains one of the highest remaining credibility signals. PR #124 completed M49; -the current follow-up branch `perf/linux-host-artifact-refresh` refreshes Linux host artifacts on -Fedora Asahi without adding new networking claims. +Issue #90 remains the full hardware-PMU evidence debt (the cache-counter set specifically). Issues +#99 and #110 were addressed by PR #111. Issues #95, #28, and #26 were addressed by PR #112. Issue +#94 is the external technical review request and remains one of the highest remaining credibility +signals. PR #124 completed M49, PR #125 (d9094df) refreshed the Linux host artifacts on bare-metal +Fedora Asahi, and `v0.2.0` was released (PR #127 ded6e80; resume-anchor sync PR #128 ae93545). There +is no active milestone; the highest-value remaining work is non-code and gated on #94 (external +review) and #90 (full cache-PMU evidence on a PMU-capable microarchitecture). diff --git a/CLAUDE.md b/CLAUDE.md index 13aad01..5c52266 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1196,8 +1196,10 @@ M45 exchange-grade persistence prototype; M46 recovery benchmarking; M47 contigu storage and cache-locality study; M48 DPDK research/prototype; M49 NIC offload and low-latency networking study. -Issue #90 remains the full hardware-PMU evidence debt. Issues #99 and #110 were addressed by PR -#111. Issues #95, #28, and #26 were addressed by PR #112. Issue #94 is the external technical -review request and remains one of the highest remaining credibility signals. PR #124 completed M49; -the current follow-up branch `perf/linux-host-artifact-refresh` refreshes Linux host artifacts on -Fedora Asahi without adding new networking claims. +Issue #90 remains the full hardware-PMU evidence debt (the cache-counter set specifically). Issues +#99 and #110 were addressed by PR #111. Issues #95, #28, and #26 were addressed by PR #112. Issue +#94 is the external technical review request and remains one of the highest remaining credibility +signals. PR #124 completed M49, PR #125 (d9094df) refreshed the Linux host artifacts on bare-metal +Fedora Asahi, and `v0.2.0` was released (PR #127 ded6e80; resume-anchor sync PR #128 ae93545). There +is no active milestone; the highest-value remaining work is non-code and gated on #94 (external +review) and #90 (full cache-PMU evidence on a PMU-capable microarchitecture). diff --git a/PROGRESS.md b/PROGRESS.md index 5f06551..070fece 100644 --- a/PROGRESS.md +++ b/PROGRESS.md @@ -350,6 +350,18 @@ Lower priority: supersedes CodeRabbit's PR #126, whose generated tests covered only trimming and were based on `d8c16b2` (where `qsl_publish_artifact` does not yet exist, so #126 could not merge before #125); #126 was closed as superseded. Do not merge from automation; the human squash-merges PR #125. +- [2026-06-21] Codex-followup resume-anchor sync (`docs/codex-resume-anchor-sync`). Resolved the + Codex review findings left on `main` by PRs #127 and #128: (1) removed PROGRESS.md's stale + "Next action remains" block that still pointed `/resume` at the merged PR #125 on + `perf/linux-host-artifact-refresh`, replacing it with the v0.2.0 between-releases state (#94/#90 + gated); (2) brought AGENTS.md into sync with CLAUDE.md's v0.2.0 partial-PMU reframe — the + constraints bullet, the "correct claim" block, and the "M29 perf evidence status" subsection no + longer call the artifacts "constrained Docker validation," and the stale + `perf/linux-host-artifact-refresh` follow-up line was updated in both AGENTS.md and CLAUDE.md to + the released state; (3) narrowed docs/perf_analysis.md so it no longer implies the Apple Blizzard + (E-core) PMU carries live counts — the `apple_blizzard_pmu/...` rows read `` in + `results/perf_stat_linux.txt` because the single-threaded benchmark stays on the Avalanche P-cores. + Docs/memory only; no code or artifacts changed. - [2026-06-03] M35: implemented a multi-client TCP connection-scaling load test (`scripts/socket_load.sh`, `make socket-load`, Linux-only) driving N concurrent `qsl-client`s against the portable TCP and epoll (M34) gateways; `results/socket_load_summary.txt` is Docker-generated and constrained. A `/code-review` (3 finder agents) caught and fixed real measurement-integrity bugs before the PR: a failed trial's `wall=0` no longer poisons the reported best (only trials whose gateway served count toward the min); the `completed` column reports the WORST per-trial completion, not the last, so partial/total trial failures are surfaced rather than masked; a per-client `timeout` bounds a hang if the gateway dies; and `QSL_LOAD_TRIALS` is validated. Post-PR hardening uses fresh monotonic ports per gateway start, retries transient startup/serve failures on new ports, and refuses to write a partial artifact unless `QSL_LOAD_ALLOW_PARTIAL=1` is set intentionally; the refreshed artifact records `Dirty tree: no`. The scaling-shape claim remains constrained to loopback connection setup, not a demonstrated production-capacity advantage for either transport. Deferred follow-up: a shared `scripts/lib` to remove the dirty-tree / `wait_ready` / gateway-stop duplication across the three socket scripts. - [2026-06-03] M35: started after M34 (#98) squash-merged (commit 9e3750b). Scope: multi-client load / socket-pressure testing of the gateway/feed path (TCP/UDP stress, socket-buffer pressure, connection scaling, backpressure) building on M34's epoll multi-client path and M30's socket tooling. Constraints: scripts/tests document load shape + environment; results must distinguish kernel/socket pressure from user-space engine cost; no production-capacity claims (honest constrained-environment framing, like M29/M30). - [2026-06-04] M35: PR #100 squash-merged to `main` as a86b701 after all CI jobs and review checks were green. M35 is now landed; original M36 NUMA remains deferred until the repository-health refactor analysis is completed or explicitly skipped by the human. @@ -756,14 +768,17 @@ Quant Systems Lab — Linux Systems + Exchange Infrastructure Simulator ## Next action remains -Current action is the Linux host artifact refresh PR #125 on `perf/linux-host-artifact-refresh`: -wait for human review / CI and do not merge from automation. M49 (PR #124) is already merged to -`main` as d8c16b2. The refreshed artifacts are host-specific Linux evidence — partial Apple PMU -counters (cycles/instructions/branches/branch-misses) with cache-reference/cache-miss counters -unsupported — not NIC-offload, latency, or full hardware-PMU evidence. - -Issue #90 remains the evidence debt for full Linux hardware PMU artifacts (cache counters). Work it -only on a PMU-capable Linux host; do not relabel constrained or partial artifacts as full evidence. +There is no active milestone. `v0.2.0` is released (PR #127 ded6e80, tag on ded6e80, marked Latest; +resume-anchor sync PR #128 ae93545). M0–M49, the Linux host artifact refresh (PR #125, d9094df), and +the v0.2.0 release are all merged to `main`. The committed perf artifacts are **partial hardware PMU +evidence** from this bare-metal Apple M2 (aarch64) Fedora Asahi host — real +cycles/instructions/branches/branch-misses with cache-reference/cache-miss counters unsupported by +the Apple Silicon PMU — not NIC-offload, latency, or full hardware-PMU evidence. + +Highest-value remaining work is non-code and gated: issue #94 (independent external review) and +issue #90 (full cache-PMU evidence). Issue #90 needs a PMU **microarchitecture** that exposes cache +counters to Linux (x86_64, or an ARM server core); do not relabel partial artifacts as full +evidence. Do not work either from automation; the human drives them. After each squash merge, return to this file and update state factually. If benchmark numbers are not measured, write `not measured`. Do not guess. Nobody is impressed by imaginary throughput. diff --git a/docs/perf_analysis.md b/docs/perf_analysis.md index 8cb1ccc..3ab3881 100644 --- a/docs/perf_analysis.md +++ b/docs/perf_analysis.md @@ -11,10 +11,14 @@ CI validation, and a reproducible command path. The committed artifacts are now generated on a **bare-metal Linux host** — an Apple MacBook Air (M2, aarch64) running Fedora Asahi Remix, directly on the hardware (`systemd-detect-virt` reports -`none`, no `hypervisor` CPU flag). `perf stat` reads **real hardware counters** off the Apple -Avalanche (P-core) and Blizzard (E-core) PMUs: `cycles`, `instructions`, `branches`, and -`branch-misses` are live. The artifact is therefore classified **partial hardware PMU evidence**, -not constrained-environment validation: the counters that are present are real, not emulated. +`none`, no `hypervisor` CPU flag). On this heterogeneous SoC `perf` opens each event against both +PMU instances — the Apple Avalanche (P-core) and Blizzard (E-core) PMUs — but the single-threaded +benchmark is scheduled on the performance cores, so **the Avalanche counters carry the real +counts**: `cycles`, `instructions`, `branches`, and `branch-misses` are live there. The +corresponding `apple_blizzard_pmu/...` rows read `` in `results/perf_stat_linux.txt` +because the workload never ran on the E-cores — that is expected scheduling behavior, not a missing +counter. The artifact is therefore classified **partial hardware PMU evidence**, not +constrained-environment validation: the counters that are present are real, not emulated. The residual gap is specific and is what issue #90 now tracks: the Apple Silicon PMU, as exposed by the current Asahi kernel driver, does **not** implement the generic `cache-references` / From 0c3b401ff4bccc3c9565d6689db4caed1cb47b53 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 02:34:36 -0400 Subject: [PATCH 02/22] perf: add flamegraph generator and make target (#32) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32. The perf stat/record text workflow already existed; this renders a perf call-graph flamegraph. - scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a results/flamegraph.txt provenance/classification companion (top folded stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs unwind correctly despite the Release `bench` preset omitting frame pointers. - scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG renderer, so the artifact is reproducible from the repo without vendoring the Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure function of the name, no RNG/timestamps in the drawn body. - tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, determinism, empty-input handling. - docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md, AGENTS.md), MILESTONES.md backlog, PROGRESS.md log. `make check` 242/242. Full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 --- AGENTS.md | 1 + CLAUDE.md | 1 + MILESTONES.md | 4 +- Makefile | 9 +- PROGRESS.md | 16 ++ docs/perf_analysis.md | 34 +++- results/README.md | 6 + scripts/flamegraph.py | 306 +++++++++++++++++++++++++++++++++ scripts/flamegraph.sh | 238 +++++++++++++++++++++++++ tests/CMakeLists.txt | 7 + tests/shell/test_flamegraph.sh | 137 +++++++++++++++ 11 files changed, 755 insertions(+), 4 deletions(-) create mode 100755 scripts/flamegraph.py create mode 100755 scripts/flamegraph.sh create mode 100644 tests/shell/test_flamegraph.sh diff --git a/AGENTS.md b/AGENTS.md index 3b303db..63cae6e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -383,6 +383,7 @@ Keep this synchronized with the Makefile. - `make bench-recovery` — run M46 recovery benchmarking (full-replay restart vs book rebuild) - `make perf-stat` — run Linux `perf stat` workflow where supported - `make perf-record` — run Linux `perf record/report` workflow where supported +- `make flamegraph` — render a Linux `perf` call-graph flamegraph (SVG) where supported - `make numa-study` — run Linux CPU-affinity / scheduler-migration / NUMA-locality study where supported - `make false-sharing-study` — run benchmark-only packed-vs-padded SPSC cursor contention study - `make profile-io` — run Linux syscall/socket-path profiling where supported diff --git a/CLAUDE.md b/CLAUDE.md index 5c52266..ef85ff0 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -383,6 +383,7 @@ Keep this synchronized with the Makefile. - `make bench-recovery` — run M46 recovery benchmarking (full-replay restart vs book rebuild) - `make perf-stat` — run Linux `perf stat` workflow where supported - `make perf-record` — run Linux `perf record/report` workflow where supported +- `make flamegraph` — render a Linux `perf` call-graph flamegraph (SVG) where supported - `make numa-study` — run Linux CPU-affinity / scheduler-migration / NUMA-locality study where supported - `make false-sharing-study` — run benchmark-only packed-vs-padded SPSC cursor contention study - `make profile-io` — run Linux syscall/socket-path profiling where supported diff --git a/MILESTONES.md b/MILESTONES.md index 01d240c..c32aec8 100644 --- a/MILESTONES.md +++ b/MILESTONES.md @@ -484,7 +484,9 @@ Do not pull backlog items into earlier PRs. - FIX-like text protocol adapter. (#29) - Web dashboard for visualization. (#30) - Docker packaging. (#31) -- Perf/flamegraph docs. (#32) +- Perf/flamegraph docs. (#32) — **done**: `make flamegraph` renders a perf call-graph flamegraph + via the dependency-free `scripts/flamegraph.py` (`results/flamegraph.svg` + `.txt`), unit-tested in + `tests/shell/test_flamegraph.sh`. Full hardware cache-PMU evidence stays in #90. - GitHub Pages documentation site. (#33) ### Differential-testing follow-ups (prioritized) diff --git a/Makefile b/Makefile index 426e0bb..8c2e932 100644 --- a/Makefile +++ b/Makefile @@ -1,4 +1,4 @@ -.PHONY: configure build test check fmt fmt-check tidy bench bench-diff bench-allocator bench-storage bench-recovery perf-stat perf-record numa-study false-sharing-study profile-io socket-stress socket-load dpdk-check nic-offload-check crash-recovery concurrency-stress asan tsan demo check-fixtures check-manifest determinism divergence-demo clean +.PHONY: configure build test check fmt fmt-check tidy bench bench-diff bench-allocator bench-storage bench-recovery perf-stat perf-record flamegraph numa-study false-sharing-study profile-io socket-stress socket-load dpdk-check nic-offload-check crash-recovery concurrency-stress asan tsan demo check-fixtures check-manifest determinism divergence-demo clean BUILD_DIR := build/dev @@ -63,6 +63,13 @@ perf-record: cmake --build --preset bench --target qsl-bench QSL_BENCH_BIN=build/bench/qsl-bench bash scripts/perf_record.sh +# Issue #32: render a perf call-graph flamegraph (SVG) from the benchmark harness. Linux-only. +flamegraph: + @test "$$(uname -s)" = "Linux" || { echo "error: make flamegraph requires Linux perf; current OS is $$(uname -s)." >&2; exit 2; } + cmake --preset bench + cmake --build --preset bench --target qsl-bench + QSL_BENCH_BIN=build/bench/qsl-bench bash scripts/flamegraph.sh + # M43: CPU-affinity / scheduler-migration / NUMA locality study. Linux-only. numa-study: @if test "$$(uname -s)" != "Linux"; then \ diff --git a/PROGRESS.md b/PROGRESS.md index 070fece..7293498 100644 --- a/PROGRESS.md +++ b/PROGRESS.md @@ -362,6 +362,22 @@ Lower priority: (E-core) PMU carries live counts — the `apple_blizzard_pmu/...` rows read `` in `results/perf_stat_linux.txt` because the single-threaded benchmark stays on the Avalanche P-cores. Docs/memory only; no code or artifacts changed. +- [2026-06-21] Issue #32 flamegraph profiling artifact (`perf/flamegraph-artifact`, stacked on the + Codex-followup branch). Added `make flamegraph` → `scripts/flamegraph.sh`, which records + `perf record --call-graph dwarf -F 4000 -g -e cpu-clock` on `qsl-bench` and renders + `results/flamegraph.svg` (+ `results/flamegraph.txt` provenance/classification companion). The + fold + SVG render live in `scripts/flamegraph.py`, a dependency-free stdlib-only stackcollapse + + flamegraph renderer (no vendored Perl FlameGraph toolkit), deterministic by design (frames sorted + by name; colors a pure function of the name; no RNG/timestamps in the drawn body). DWARF call + graphs are used because the Release `bench` preset omits frame pointers; application symbols + (`OrderBook::add_limit`, `MatchingEngine::new_limit`, the replay path, …) still resolve from the + symtab. Added `tests/shell/test_flamegraph.sh` (CTest-registered, python3-only, skips cleanly if + absent) covering folding (offset/dso stripping, perf-order reversal, comm-at-base, count + aggregation, sortedness), SVG well-formedness, XML escaping, determinism, and empty-input + handling; `make check` 242/242. The committed `results/flamegraph.svg`/`.txt` were generated on + the bare-metal Fedora Asahi host (aarch64) from the clean committed tree (`Dirty inputs: no`). + This is a software cpu-clock sampling hot-symbol profile, not a latency/throughput claim; full + hardware cache-PMU evidence stays in #90. Do not merge from automation; human squash-merges. - [2026-06-03] M35: implemented a multi-client TCP connection-scaling load test (`scripts/socket_load.sh`, `make socket-load`, Linux-only) driving N concurrent `qsl-client`s against the portable TCP and epoll (M34) gateways; `results/socket_load_summary.txt` is Docker-generated and constrained. A `/code-review` (3 finder agents) caught and fixed real measurement-integrity bugs before the PR: a failed trial's `wall=0` no longer poisons the reported best (only trials whose gateway served count toward the min); the `completed` column reports the WORST per-trial completion, not the last, so partial/total trial failures are surfaced rather than masked; a per-client `timeout` bounds a hang if the gateway dies; and `QSL_LOAD_TRIALS` is validated. Post-PR hardening uses fresh monotonic ports per gateway start, retries transient startup/serve failures on new ports, and refuses to write a partial artifact unless `QSL_LOAD_ALLOW_PARTIAL=1` is set intentionally; the refreshed artifact records `Dirty tree: no`. The scaling-shape claim remains constrained to loopback connection setup, not a demonstrated production-capacity advantage for either transport. Deferred follow-up: a shared `scripts/lib` to remove the dirty-tree / `wait_ready` / gateway-stop duplication across the three socket scripts. - [2026-06-03] M35: started after M34 (#98) squash-merged (commit 9e3750b). Scope: multi-client load / socket-pressure testing of the gateway/feed path (TCP/UDP stress, socket-buffer pressure, connection scaling, backpressure) building on M34's epoll multi-client path and M30's socket tooling. Constraints: scripts/tests document load shape + environment; results must distinguish kernel/socket pressure from user-space engine cost; no production-capacity claims (honest constrained-environment framing, like M29/M30). - [2026-06-04] M35: PR #100 squash-merged to `main` as a86b701 after all CI jobs and review checks were green. M35 is now landed; original M36 NUMA remains deferred until the repository-health refactor analysis is completed or explicitly skipped by the human. diff --git a/docs/perf_analysis.md b/docs/perf_analysis.md index 3ab3881..7400f02 100644 --- a/docs/perf_analysis.md +++ b/docs/perf_analysis.md @@ -55,6 +55,30 @@ default is intentional: many CI, VM, and container environments do not expose ha to unprivileged processes, and the benchmark harness is short enough that a lower frequency can miss the minimum sample count needed for meaningful hot-symbol ordering. +Render a flamegraph (issue #32): + +```bash +make flamegraph +``` + +This runs `scripts/flamegraph.sh`, which records call-graph samples +(`perf record --call-graph dwarf -F 4000 -g -e cpu-clock`), folds them, and renders an SVG to +`results/flamegraph.svg` plus a text companion `results/flamegraph.txt` (provenance, classification, +and the top folded stacks). DWARF call graphs are used so stacks unwind correctly even though the +`bench` (Release) preset omits frame pointers — the application symbols (`OrderBook::add_limit`, +`MatchingEngine::new_limit`, the replay path, …) resolve from the symbol table without changing the +optimization level under measurement. + +The folding and SVG rendering live in `scripts/flamegraph.py`, a dependency-free Python script +(standard library only) that reimplements the `stackcollapse` + flamegraph data model rather than +vendoring Brendan Gregg's Perl toolkit, so the artifact is reproducible from this repository alone. +The renderer is deterministic — frames are sorted by name and colors are a pure function of the +frame name (no RNG, no timestamps in the drawn body) — and is unit-tested in +`tests/shell/test_flamegraph.sh` (registered with CTest, runs under `make check`). Frame width is +proportional to on-CPU samples; this is a software cpu-clock sampling profile for **hot-symbol +investigation**, not a latency or throughput measurement. Set `QSL_FLAMEGRAPH_EVENT=cycles` to +sample the hardware PMU cycles event instead, where the host exposes it. + ## Required Environment Both scripts are Linux-only and fail before running on non-Linux hosts. `perf stat` also fails @@ -113,8 +137,14 @@ counters, permission-limited sampling, or a sample report that is explicitly mar - `results/perf_report_linux.txt` records benchmark output, `perf record` stderr, and `perf report --stdio` output. It is useful as a hot-symbol profile only when `No samples: no`, `Insufficient samples: no`, and `Sample count` is at least `Minimum samples for hot profile`. -- `build/perf/qsl-bench.perf.data` is generated by `make perf-record` and is intentionally not - committed; it is host-specific binary profiler data. +- `results/flamegraph.svg` is the rendered flamegraph from `make flamegraph`; `results/flamegraph.txt` + is its provenance/classification companion (and lists the top folded stacks). Treat frame widths as + a hot-symbol guide only when the `.txt` reports a `flamegraph (...)` `Artifact:` and a `Sample + count` at least `Minimum samples for hot profile`; a `constrained-environment validation` label + means sampling did not capture enough stacks to trust. +- `build/perf/qsl-bench.perf.data` and `build/perf/qsl-bench.flame.data` are generated by + `make perf-record` / `make flamegraph` and are intentionally not committed; they are host-specific + binary profiler data. Each artifact includes hardware, kernel, compiler, perf version, build type, dataset, command, event set, and source-digest provenance. The `Source digest` is the authoritative source identity; diff --git a/results/README.md b/results/README.md index 49bd9d2..0f8b7aa 100644 --- a/results/README.md +++ b/results/README.md @@ -23,6 +23,12 @@ Benchmark results produced by `make bench` and scripts under `scripts/`. - `perf_report_linux.txt` — Linux `perf record/report` hot-symbol output for the benchmark harness (`make perf-record`). It is useful as a hot-symbol profile only when the file says `No samples: no`, `Insufficient samples: no`, and the sample count meets the reported minimum. +- `flamegraph.svg` / `flamegraph.txt` — Linux `perf` call-graph flamegraph (`make flamegraph`, + issue #32) rendered by the dependency-free `scripts/flamegraph.py`. The `.svg` is the visual + (frame width ∝ on-CPU samples) with provenance in a leading XML comment; the `.txt` carries + provenance, the `Artifact:` classification, and the top folded stacks. It is a software cpu-clock + sampling profile for hot-symbol investigation, not a latency/throughput claim — trust frame widths + only when the `.txt` reports a `flamegraph (...)` artifact with enough samples. - `numa_affinity_study.txt` — Linux CPU-affinity / scheduler-migration / NUMA-locality study output (`make numa-study`). It must self-classify as `full-linux-numa`, `linux-constrained`, or `unsupported-host`; only `full-linux-numa` is full NUMA evidence. diff --git a/scripts/flamegraph.py b/scripts/flamegraph.py new file mode 100755 index 0000000..966d0c7 --- /dev/null +++ b/scripts/flamegraph.py @@ -0,0 +1,306 @@ +#!/usr/bin/env python3 +"""Self-contained flamegraph generator for QSL perf profiles. + +Reads `perf script` output on stdin, folds it into collapsed stacks +(stackcollapse), and renders a deterministic SVG flamegraph on stdout. + +This is intentionally dependency-free (Python standard library only) so the +profiling artifact is reproducible from the repository alone, without vendoring +Brendan Gregg's Perl FlameGraph toolkit. The data model is identical: a +"collapsed stack" is `root;...;leafcount`, and the flamegraph is a +proportional, sorted, recursive layout of those stacks. + +Modes: + flamegraph.py perf script (stdin) -> SVG (stdout) + flamegraph.py --collapse-only perf script (stdin) -> collapsed stacks (stdout) + flamegraph.py --from-collapsed collapsed stacks (stdin) -> SVG (stdout) + +The rendering is deterministic: frames are sorted by name, and colors are a pure +function of the frame name (no RNG, no timestamps in the drawn body). The driver +script (scripts/flamegraph.sh) records run provenance separately so the SVG stays +reproducible for a given input. +""" + +from __future__ import annotations + +import argparse +import html +import re +import sys +import zlib + +# perf-script stack frame line: leading whitespace, hex address, symbol, "(dso)". +# C++ symbols contain spaces and parentheses, so the dso is taken as the final +# parenthesized group and the symbol is everything between the address and it. +_FRAME_RE = re.compile(r"^\s+(?P[0-9a-fA-F]+)\s+(?P.*\S)\s*$") +_OFFSET_RE = re.compile(r"\+0x[0-9a-fA-F]+$") + + +def _clean_symbol(rest: str) -> str: + """Turn a perf-script frame body into a folded frame name. + + Drops the trailing `(dso)` and the `+0xoffset`, matching stackcollapse-perf. + """ + # Strip the final "(...)" dso group if present (balanced at end of line). + if rest.endswith(")"): + depth = 0 + for i in range(len(rest) - 1, -1, -1): + if rest[i] == ")": + depth += 1 + elif rest[i] == "(": + depth -= 1 + if depth == 0: + rest = rest[:i].rstrip() + break + rest = _OFFSET_RE.sub("", rest).strip() + if not rest: + return "[unknown]" + return rest + + +def fold_perf_script(lines) -> dict[str, int]: + """Collapse `perf script` output into {stack_string: sample_count}.""" + folded: dict[str, int] = {} + comm = "" + stack: list[str] = [] + + def flush() -> None: + nonlocal stack, comm + if stack: + frames = list(reversed(stack)) + if comm: + frames.insert(0, comm) + key = ";".join(frames) + folded[key] = folded.get(key, 0) + 1 + stack = [] + + for raw in lines: + line = raw.rstrip("\n") + if not line.strip(): + flush() + comm = "" + continue + if line[0].isspace(): + m = _FRAME_RE.match(line) + if m: + stack.append(_clean_symbol(m.group("rest"))) + continue + # Header line: "comm pid timestamp: period event:" -> capture comm. + flush() + comm = line.split()[0] + flush() + return folded + + +def parse_collapsed(lines) -> dict[str, int]: + """Parse pre-collapsed `stack count` lines.""" + folded: dict[str, int] = {} + for raw in lines: + line = raw.rstrip("\n") + if not line.strip(): + continue + stack, _, count = line.rpartition(" ") + if not stack: + stack, _, count = line.rpartition("\t") + try: + n = int(count) + except ValueError: + continue + folded[stack] = folded.get(stack, 0) + n + return folded + + +class _Node: + __slots__ = ("name", "value", "children") + + def __init__(self, name: str) -> None: + self.name = name + self.value = 0 + self.children: dict[str, _Node] = {} + + +def build_tree(folded: dict[str, int], root_name: str) -> _Node: + root = _Node(root_name) + for stack, count in folded.items(): + root.value += count + node = root + for frame in stack.split(";"): + if not frame: + continue + child = node.children.get(frame) + if child is None: + child = _Node(frame) + node.children[frame] = child + child.value += count + node = child + return root + + +def _color(name: str) -> str: + """Deterministic warm 'hot' palette derived purely from the frame name.""" + h = zlib.crc32(name.encode("utf-8")) & 0xFFFFFFFF + r = 205 + (h % 51) + g = (h >> 8) % 231 + b = (h >> 16) % 56 + return f"rgb({r},{g},{b})" + + +def _layout(node: _Node, depth: int, x: int, total: int, out: list) -> None: + out.append((node, depth, x)) + cursor = x + for name in sorted(node.children): + child = node.children[name] + _layout(child, depth + 1, cursor, total, out) + cursor += child.value + + +def render_svg( + root: _Node, + *, + title: str, + subtitle: str, + width: int = 1200, + frame_height: int = 16, + min_px: float = 0.1, + countname: str = "samples", +) -> str: + total = root.value or 1 + placed: list = [] + _layout(root, 0, 0, total, placed) + max_depth = max((d for _, d, _ in placed), default=0) + + pad_top = 54 + pad_bottom = 16 + side = 10 + plot_width = width - 2 * side + height = pad_top + (max_depth + 1) * frame_height + pad_bottom + + def px(samples: int) -> float: + return samples / total * plot_width + + parts: list[str] = [] + parts.append( + f'\n' + f'' + ) + parts.append( + '' + ) + parts.append(_SEARCH_JS) + parts.append(f'') + parts.append( + f'{html.escape(title)}' + ) + parts.append( + f'' + f'{html.escape(subtitle)}' + ) + parts.append( + f'Search' + ) + parts.append( + f' ' + ) + + for node, depth, x in placed: + w = px(node.value) + if w < min_px: + continue + x_px = side + px(x) + y = pad_top + (max_depth - depth) * frame_height + pct = node.value / total * 100.0 + label = node.name + # Approx 7px per char at this font; reserve 6px padding. + maxchars = int((w - 6) / 7) + text = "" + if maxchars >= 3: + text = label if len(label) <= maxchars else label[: maxchars - 2] + ".." + tip = f"{label} ({node.value} {countname}, {pct:.2f}%)" + parts.append(f'') + parts.append(f"{html.escape(tip)}") + parts.append( + f'' + ) + if text: + parts.append( + f'{html.escape(text)}' + ) + parts.append("") + + parts.append("\n") + return "".join(parts) + + +# Minimal, self-contained search affordance (highlight matches, report % of +# matched samples). No external assets; deterministic; no zoom to keep the +# artifact robust across renderers. +_SEARCH_JS = ( + "" +) + + +def main(argv=None) -> int: + ap = argparse.ArgumentParser(description=__doc__) + ap.add_argument("--collapse-only", action="store_true", + help="emit collapsed stacks instead of SVG") + ap.add_argument("--from-collapsed", action="store_true", + help="read collapsed stacks instead of perf script output") + ap.add_argument("--title", default="QSL Flame Graph") + ap.add_argument("--subtitle", default="") + ap.add_argument("--countname", default="samples") + ap.add_argument("--root-name", default="all") + ap.add_argument("--width", type=int, default=1200) + args = ap.parse_args(argv) + + if args.from_collapsed: + folded = parse_collapsed(sys.stdin) + else: + folded = fold_perf_script(sys.stdin) + + if args.collapse_only: + for stack in sorted(folded): + sys.stdout.write(f"{stack} {folded[stack]}\n") + return 0 + + if not folded: + sys.stderr.write("flamegraph.py: no stacks parsed from input\n") + return 1 + + root = build_tree(folded, args.root_name) + sys.stdout.write( + render_svg( + root, + title=args.title, + subtitle=args.subtitle, + width=args.width, + countname=args.countname, + ) + ) + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/scripts/flamegraph.sh b/scripts/flamegraph.sh new file mode 100755 index 0000000..7324f64 --- /dev/null +++ b/scripts/flamegraph.sh @@ -0,0 +1,238 @@ +#!/usr/bin/env bash +# Generate a Linux perf flamegraph from the benchmark harness. +# +# Records call-graph samples with `perf record --call-graph dwarf`, folds them +# with scripts/flamegraph.py (a dependency-free stackcollapse + SVG renderer), +# and writes: +# results/flamegraph.svg -- the visual flamegraph (provenance embedded as a +# leading XML comment + a visible subtitle) +# results/flamegraph.txt -- provenance + classification + top folded stacks +# +# Defaults to software cpu-clock sampling so the artifact stays a portable +# hot-symbol *investigation* aid, not a latency/throughput claim. This is the +# missing-flamegraph follow-up tracked by issue #32 (the perf stat/record text +# workflow already exists; full hardware-PMU cache evidence stays in #90). +set -euo pipefail + +cd "$(dirname "$0")/.." +# shellcheck source=scripts/qsl_common.sh +source scripts/qsl_common.sh + +BIN="${QSL_BENCH_BIN:-build/bench/qsl-bench}" +OUT_SVG="${QSL_FLAMEGRAPH_SVG:-results/flamegraph.svg}" +OUT_TXT="${QSL_FLAMEGRAPH_TXT:-results/flamegraph.txt}" +DATA="${QSL_FLAMEGRAPH_DATA:-build/perf/qsl-bench.flame.data}" +EVENT="${QSL_FLAMEGRAPH_EVENT:-cpu-clock}" +FREQ="${QSL_FLAMEGRAPH_FREQ:-4000}" +CALLGRAPH="${QSL_FLAMEGRAPH_CALLGRAPH:-dwarf}" +MIN_SAMPLES="${QSL_FLAMEGRAPH_MIN_SAMPLES:-200}" +TOP_STACKS="${QSL_FLAMEGRAPH_TOP_STACKS:-15}" +BUILD_DIR="$(dirname "$BIN")" +PROVENANCE_SCOPE="flamegraph-benchmark" +PROVENANCE_INPUTS=( + Makefile + CMakeLists.txt + CMakePresets.json + cmake + include + src + apps/qsl-bench + benchmarks + scripts/flamegraph.sh + scripts/flamegraph.py + scripts/qsl_common.sh +) + +perf_version_line() { + perf --version 2>&1 | head -1 || true +} + +parse_sample_count_token() { + awk -v raw="$1" ' + BEGIN { + gsub(/,/, "", raw) + suffix = substr(raw, length(raw), 1) + mult = 1 + if (suffix == "K" || suffix == "k") { mult = 1000; raw = substr(raw, 1, length(raw) - 1) } + else if (suffix == "M" || suffix == "m") { mult = 1000000; raw = substr(raw, 1, length(raw) - 1) } + if (raw ~ /^[0-9]+([.][0-9]+)?$/) printf "%d\n", raw * mult + }' +} + +qsl_require_linux "scripts/flamegraph.sh" "perf" + +if ! command -v perf >/dev/null 2>&1; then + echo "error: perf not found. Install linux perf tooling for this kernel." >&2 + exit 2 +fi +if ! command -v python3 >/dev/null 2>&1; then + echo "error: python3 is required to render the flamegraph." >&2 + exit 2 +fi +if [[ ! -x "$BIN" ]]; then + echo "error: $BIN not found; build the benchmark preset first (make flamegraph)." >&2 + exit 1 +fi + +mkdir -p "$(dirname "$OUT_SVG")" "$(dirname "$DATA")" + +BENCH_OUT="$(mktemp)" +RECORD_BENCH_OUT="$(mktemp)" +RECORD_ERR="$(mktemp)" +SCRIPT_OUT="$(mktemp)" +SCRIPT_ERR="$(mktemp)" +FOLDED="$(mktemp)" +SVG_TMP="$(mktemp)" +TXT_TMP="$(mktemp)" +trap 'rm -f "$BENCH_OUT" "$RECORD_BENCH_OUT" "$RECORD_ERR" "$SCRIPT_OUT" "$SCRIPT_ERR" "$FOLDED" "$SVG_TMP" "$TXT_TMP"' EXIT + +# Fail fast if the benchmark itself is broken (partial mode must not mask this). +BENCH_STATUS=0 +"$BIN" >"$BENCH_OUT" 2>&1 || BENCH_STATUS=$? +if [[ "$BENCH_STATUS" -ne 0 ]]; then + echo "error: benchmark command failed before perf record (status $BENCH_STATUS); partial mode cannot override this." >&2 + cat "$BENCH_OUT" >&2 + exit 4 +fi + +RECORD_STATUS=0 +perf record --call-graph "$CALLGRAPH" -F "$FREQ" -g -e "$EVENT" -o "$DATA" -- "$BIN" \ + >"$RECORD_BENCH_OUT" 2>"$RECORD_ERR" || RECORD_STATUS=$? + +SCRIPT_STATUS=0 +if [[ "$RECORD_STATUS" -eq 0 ]]; then + perf script -i "$DATA" >"$SCRIPT_OUT" 2>"$SCRIPT_ERR" || SCRIPT_STATUS=$? +fi + +PERF_LIMITATION=no +if grep -Eiq 'No samples|failed to open|Permission denied|Operation not permitted|perf_event_open|not supported|Operation not supported|perf not found for kernel|linux-tools' \ + "$RECORD_ERR" "$SCRIPT_ERR"; then + PERF_LIMITATION=yes +fi + +SAMPLE_TOKEN="$(sed -nE 's/.*\(([0-9][0-9.,]*[KkMm]?) samples\).*/\1/p' "$RECORD_ERR" | head -1)" +SAMPLE_COUNT="$(parse_sample_count_token "$SAMPLE_TOKEN")" +[[ -z "$SAMPLE_COUNT" ]] && SAMPLE_COUNT=0 + +# Fold to collapsed stacks for the text summary and as an SVG precondition. +STACK_COUNT=0 +if [[ "$SCRIPT_STATUS" -eq 0 && -s "$SCRIPT_OUT" ]]; then + python3 scripts/flamegraph.py --collapse-only <"$SCRIPT_OUT" >"$FOLDED" 2>/dev/null || true + STACK_COUNT="$(wc -l <"$FOLDED" | tr -d ' ')" +fi + +INSUFFICIENT_SAMPLES=no +if [[ "$RECORD_STATUS" -eq 0 && "$SCRIPT_STATUS" -eq 0 && "$SAMPLE_COUNT" -lt "$MIN_SAMPLES" ]]; then + INSUFFICIENT_SAMPLES=yes +fi + +ARTIFACT_TYPE="flamegraph ($EVENT software sampling hot-symbol profile)" +if [[ "$EVENT" == "cycles" ]]; then + ARTIFACT_TYPE="flamegraph (cycles hardware-PMU sampling hot-symbol profile)" +fi +if [[ "$RECORD_STATUS" -ne 0 || "$SCRIPT_STATUS" -ne 0 || "$STACK_COUNT" -eq 0 ]]; then + ARTIFACT_TYPE="constrained-environment validation (partial; no clean sample report)" +elif [[ "$INSUFFICIENT_SAMPLES" == "yes" ]]; then + ARTIFACT_TYPE="constrained-environment validation (partial; insufficient samples for hot-symbol conclusions)" +fi + +PROVENANCE="$(qsl_emit_provenance "$PROVENANCE_SCOPE" "$OUT_SVG" "${PROVENANCE_INPUTS[@]}")" +HOST="$(uname -s) $(uname -m)" +DATE="$(qsl_utc_timestamp)" +SUBTITLE="$ARTIFACT_TYPE | $HOST | $EVENT @ ${FREQ}Hz | ${SAMPLE_COUNT} samples | ${STACK_COUNT} stacks | $DATE" + +# Render the SVG (deterministic for a fixed folded input + fixed subtitle). +if [[ "$STACK_COUNT" -gt 0 ]]; then + { + echo '' + # Keep the delimiters on their own lines and squeeze any "--" + # out of the interior: a double hyphen is illegal inside an XML comment. + echo "" + # Drop the renderer's own XML declaration; we emitted ours above. + python3 scripts/flamegraph.py \ + --title "QSL Matching-Engine Flame Graph (qsl-bench)" \ + --subtitle "$SUBTITLE" \ + --countname "$EVENT samples" \ + --from-collapsed <"$FOLDED" | tail -n +2 + } >"$SVG_TMP" + qsl_publish_artifact "$SVG_TMP" "$OUT_SVG" +fi + +# Text companion: provenance + classification + top folded stacks (human/queryable). +{ + echo "Command: make flamegraph" + echo "Artifact: $ARTIFACT_TYPE" + echo "Hardware: $(uname -m)" + echo "OS: $(uname -s) $(uname -r)" + echo "CPU: $(qsl_cpu_model)" + echo "Compiler: $(qsl_build_compiler_version "$BUILD_DIR")" + echo "Perf: $(perf_version_line)" + echo "Perf paranoid: $(cat /proc/sys/kernel/perf_event_paranoid 2>/dev/null || echo unknown)" + echo "Build type: $(qsl_build_type "$BUILD_DIR")" + echo "$PROVENANCE" + echo "Benchmark binary: $BIN" + echo "Dataset: qsl-bench default synthetic benchmark suite" + echo "Call graph: $CALLGRAPH" + echo "Record event: $EVENT" + echo "Sample freq: $FREQ Hz" + echo "Sample count: $SAMPLE_COUNT" + echo "Folded stacks: $STACK_COUNT" + echo "Minimum samples for hot profile: $MIN_SAMPLES" + echo "Insufficient samples: $INSUFFICIENT_SAMPLES" + echo "Record status: $RECORD_STATUS" + echo "Script status: $SCRIPT_STATUS" + echo "Perf access limitation: $PERF_LIMITATION" + echo "Flamegraph SVG: $(qsl_repo_relative_or_empty "$OUT_SVG")" + echo "Perf data: $DATA (generated, not intended for commit)" + echo + if [[ "$ARTIFACT_TYPE" == flamegraph* ]]; then + echo "Caveat: this flamegraph is a software cpu-clock sampling profile for hot-symbol" + echo "investigation. Frame width is proportional to on-CPU samples, not wall-clock" + echo "latency or throughput, and is hardware/kernel/compiler/build dependent." + else + echo "Caveat: constrained/partial perf validation, not a hot-symbol flamegraph. Treat" + echo "frame widths as unusable until sampling succeeds and Sample count meets the" + echo "Minimum samples for hot profile." + fi + echo + echo "Top $TOP_STACKS folded stacks (count stack):" + if [[ -s "$FOLDED" ]]; then + # The final awk limits to $TOP_STACKS rows by reading all input (NR<=top) + # rather than `head`, so `sort` is never sent SIGPIPE under `pipefail`. + awk '{ n=$NF; $NF=""; sub(/[[:space:]]+$/,""); printf "%s\t%s\n", n, $0 }' "$FOLDED" | + sort -t"$(printf '\t')" -k1,1nr | + awk -F"$(printf '\t')" -v top="$TOP_STACKS" 'NR<=top { printf "%8d %s\n", $1, $2 }' + else + echo " (none)" + fi + echo + echo "Benchmark output:" + cat "$BENCH_OUT" +} >"$TXT_TMP" +qsl_publish_artifact "$TXT_TMP" "$OUT_TXT" +echo "wrote $OUT_TXT" +[[ "$STACK_COUNT" -gt 0 ]] && echo "wrote $OUT_SVG" + +if [[ ("$RECORD_STATUS" -ne 0 || "$SCRIPT_STATUS" -ne 0) && "$PERF_LIMITATION" != "yes" ]]; then + echo "error: perf record/script failed for a reason other than a perf access limitation." >&2 + exit 3 +fi +if [[ "$STACK_COUNT" -eq 0 || "$INSUFFICIENT_SAMPLES" == "yes" ]]; then + if [[ "${QSL_PERF_ALLOW_PARTIAL:-0}" != "1" ]]; then + echo "error: flamegraph did not capture enough samples for a clean profile." >&2 + echo " Re-run on Linux with perf sampling access, or set QSL_PERF_ALLOW_PARTIAL=1" >&2 + echo " only when intentionally documenting a constrained environment." >&2 + exit 3 + fi +fi diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt index 4e95e46..cb617a9 100644 --- a/tests/CMakeLists.txt +++ b/tests/CMakeLists.txt @@ -89,6 +89,13 @@ add_test( NAME qsl_common_publish_artifact COMMAND bash "${CMAKE_CURRENT_LIST_DIR}/shell/test_qsl_common.sh") +# Shell unit tests for the dependency-free flamegraph renderer (scripts/flamegraph.py: +# perf-script folding + deterministic SVG rendering) behind `make flamegraph` (#32). +# Portable: needs only python3 (skips cleanly if absent); does not require perf. +add_test( + NAME qsl_flamegraph_render + COMMAND bash "${CMAKE_CURRENT_LIST_DIR}/shell/test_flamegraph.sh") + if(EXISTS "/dev/full") add_test( NAME qsl_replay_generate_append_failure diff --git a/tests/shell/test_flamegraph.sh b/tests/shell/test_flamegraph.sh new file mode 100644 index 0000000..2ba305d --- /dev/null +++ b/tests/shell/test_flamegraph.sh @@ -0,0 +1,137 @@ +#!/usr/bin/env bash +# Unit tests for scripts/flamegraph.py — the dependency-free stackcollapse + SVG +# renderer behind `make flamegraph` (issue #32). +# +# The shell driver (scripts/flamegraph.sh) needs Linux `perf`, which CI does not +# have, so these tests exercise the deterministic, portable core instead: +# 1. `perf script` output folds into correct collapsed stacks (innermost-first +# perf order reversed to root-first, comm at the base, dso + "+0xoffset" +# stripped, C++ symbols with spaces/parens preserved). +# 2. identical stacks aggregate their counts. +# 3. collapsed output is sorted and deterministic. +# 4. the SVG render is well-formed, escapes XML metacharacters, contains the +# expected frames, and is byte-identical across runs (no RNG, no timestamps). +# 5. empty input is handled (exit 1 for SVG, empty for --collapse-only). +# +# Registered with CTest (see tests/CMakeLists.txt); runs under `make check`. +# Run directly: bash tests/shell/test_flamegraph.sh + +set -uo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)" +FG="$REPO_ROOT/scripts/flamegraph.py" + +if ! command -v python3 >/dev/null 2>&1; then + echo "SKIP: python3 not found; flamegraph renderer tests skipped" + exit 0 +fi + +PASS=0 +FAIL=0 + +expect_eq() { + local name="$1" expected="$2" actual="$3" + if [[ "$actual" == "$expected" ]]; then + printf 'PASS: %s\n' "$name" + PASS=$((PASS + 1)) + else + printf 'FAIL: %s\n expected: %q\n actual: %q\n' "$name" "$expected" "$actual" + FAIL=$((FAIL + 1)) + fi +} + +expect_contains() { + local name="$1" needle="$2" haystack="$3" + if [[ "$haystack" == *"$needle"* ]]; then + printf 'PASS: %s\n' "$name" + PASS=$((PASS + 1)) + else + printf 'FAIL: %s\n missing: %q\n' "$name" "$needle" + FAIL=$((FAIL + 1)) + fi +} + +expect_not_contains() { + local name="$1" needle="$2" haystack="$3" + if [[ "$haystack" != *"$needle"* ]]; then + printf 'PASS: %s\n' "$name" + PASS=$((PASS + 1)) + else + printf 'FAIL: %s\n unexpected: %q\n' "$name" "$needle" + FAIL=$((FAIL + 1)) + fi +} + +# Build a synthetic `perf script` block. Frame lines must start with a TAB; the +# header line for each sample must start in column 0. +TAB=$'\t' +make_perf_script() { + printf '%s\n' \ + "qsl-bench 100 1.0: 1000 cpu-clock:u:" \ + "${TAB}415cd0 qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side)+0x310 (/path/qsl-bench)" \ + "${TAB}402887 main+0x127 (/path/qsl-bench)" \ + "" \ + "qsl-bench 100 2.0: 1000 cpu-clock:u:" \ + "${TAB}415cd0 qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side)+0x300 (/path/qsl-bench)" \ + "${TAB}402887 main+0x100 (/path/qsl-bench)" \ + "" \ + "qsl-bench 100 3.0: 1000 cpu-clock:u:" \ + "${TAB}aaaa cfree+0x5 (/usr/lib64/libc.so.6)" \ + "${TAB}402887 main+0x10 (/path/qsl-bench)" \ + "" +} + +# --- Folding (stackcollapse) ------------------------------------------------ + +FOLDED="$(make_perf_script | python3 "$FG" --collapse-only)" + +# Innermost-first perf order is reversed to root-first, comm prepended, dso and +# "+0xoffset" stripped. The two add_limit samples (different offsets) collapse to +# one stack with count 2. +expect_contains "add_limit stack folds with comm at base, offset+dso stripped, count 2" \ + 'qsl-bench;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side) 2' \ + "$FOLDED" +expect_contains "libc leaf folds to one sample" \ + 'qsl-bench;main;cfree 1' \ + "$FOLDED" +expect_not_contains "dso paths are stripped from frames" "/usr/lib64/libc.so.6" "$FOLDED" +expect_not_contains "raw +0x offsets are stripped from frames" "+0x" "$FOLDED" + +# Collapsed output is sorted (deterministic) and stable across runs. +FOLDED2="$(make_perf_script | python3 "$FG" --collapse-only)" +expect_eq "collapse-only is deterministic" "$FOLDED" "$FOLDED2" +SORTED="$(printf '%s\n' "$FOLDED" | LC_ALL=C sort)" +expect_eq "collapse-only output is sorted" "$SORTED" "$FOLDED" + +# --- SVG rendering ---------------------------------------------------------- + +SVG="$(make_perf_script | python3 "$FG" --title "T" --subtitle "S")" +expect_contains "svg has XML declaration" '' "$SVG" +expect_contains "svg carries the title" '>T' "$SVG" +expect_contains "svg renders the add_limit frame" 'add_limit' "$SVG" +expect_contains "svg renders rect frames" 'class="frame"' "$SVG" + +# Deterministic: byte-identical across two renders of the same input. +SVG2="$(make_perf_script | python3 "$FG" --title "T" --subtitle "S")" +expect_eq "svg render is deterministic" "$SVG" "$SVG2" + +# XML metacharacters in frame names are escaped, not emitted raw. +ESC_SVG="$(printf 'bench;a&c 3\n' | python3 "$FG" --from-collapsed)" +expect_contains "frame names are XML-escaped" '<b>&c' "$ESC_SVG" +expect_not_contains "raw unescaped angle bracket is not emitted in a frame title" 'a<b>' "$ESC_SVG" + +# --- Empty input ------------------------------------------------------------ + +EMPTY_COLLAPSE="$(printf '' | python3 "$FG" --collapse-only)" +expect_eq "empty input yields empty collapse" "" "$EMPTY_COLLAPSE" + +printf '' | python3 "$FG" >/dev/null 2>&1 +rc=$? +expect_eq "empty input fails SVG render with exit 1" "1" "$rc" + +# --- Summary ---------------------------------------------------------------- + +printf '\nResults: %d passed, %d failed\n' "$PASS" "$FAIL" +[[ "$FAIL" -eq 0 ]] From beec2d0c115c9b53cfd07784335149b652de414d Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 02:37:31 -0400 Subject: [PATCH 03/22] perf: add generated flamegraph artifact on bare-metal Fedora Asahi (#32) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph` from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a latency/throughput claim; full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --- results/flamegraph.svg | 31 ++++++++++++++++++++++ results/flamegraph.txt | 58 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 89 insertions(+) create mode 100644 results/flamegraph.svg create mode 100644 results/flamegraph.txt diff --git a/results/flamegraph.svg b/results/flamegraph.svg new file mode 100644 index 0000000..2281d5b --- /dev/null +++ b/results/flamegraph.svg @@ -0,0 +1,31 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!-- +QSL flamegraph provenance + Provenance version: 1 + Git commit (informational): 0c3b401 + Source digest: sha256:0d8061b5c92b9a8a1f3bffd14a340e733f28674b14d5716c2eaa6bdb00b31242 + Source digest scope: flamegraph-benchmark + Dirty inputs: no + Generated output: results/flamegraph.svg + Date: 2026-06-21T06:36:51Z + Command: make flamegraph + Artifact: flamegraph (cpu-clock software sampling hot-symbol profile) + Record: perf record [call-graph dwarf | -F 4000 | -g | -e cpu-clock] + Samples: 397 | Folded stacks: 171 + Caveat: software cpu-clock sampling shows on-CPU time by symbol; it is + not a latency or throughput measurement and is hardware/build dependent. +--> +<svg version="1.1" width="1200" height="310" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 1200 310" font-family="Verdana,Helvetica,sans-serif" font-size="12"><style>.frame:hover{stroke:#000;stroke-width:0.5} .hl{stroke:#000;stroke-width:1}</style><script type="text/ecmascript"><![CDATA[ +function qslSearch(){ + var term=prompt('Search frame (regex):',''); + var detail=document.getElementById('qsl-detail'); + var gs=document.getElementsByClassName('func'); + var i; + if(!term){for(i=0;i<gs.length;i++){gs[i].getElementsByTagName('rect')[0].classList.remove('hl');}if(detail)detail.textContent=' ';return;} + var re;try{re=new RegExp(term);}catch(e){return;} + for(i=0;i<gs.length;i++){var r=gs[i].getElementsByTagName('rect')[0]; + if(re.test(gs[i].getAttribute('data-name'))){r.classList.add('hl');} + else{r.classList.remove('hl');}} + if(detail)detail.textContent='Search: '+term; +} +]]></script><rect width="1200" height="310" fill="#f8f8f8"/><text x="600" y="24" text-anchor="middle" font-size="17" font-weight="bold">QSL Matching-Engine Flame Graph (qsl-bench)</text><text x="600" y="40" text-anchor="middle" fill="#555">flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 397 samples | 171 stacks | 2026-06-21T06:36:51Z</text><text id="qsl-search" x="1190" y="24" text-anchor="end" fill="#990000" onclick="qslSearch()" style="cursor:pointer">Search</text><text id="qsl-detail" x="10" y="306" fill="#333"> </text><g class="func" data-name="all"><title>all (397 cpu-clock samples, 100.00%)allqsl-bench (397 cpu-clock samples, 100.00%)qsl-bench[unknown] (300 cpu-clock samples, 75.57%)[unknown][unknown] (278 cpu-clock samples, 70.03%)[unknown][unknown] (221 cpu-clock samples, 55.67%)[unknown][unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (1 cpu-clock samples, 0.25%)check_match (1 cpu-clock samples, 0.25%)do_lookup_x (2 cpu-clock samples, 0.50%)__libc_start_call_main (218 cpu-clock samples, 54.91%)__libc_start_call_mainmain (218 cpu-clock samples, 54.91%)mainqsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (30 cpu-clock samples, 7.56%)qsl::engi..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (4 cpu-clock samples, 1.01%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (21 cpu-clock samples, 5.29%)qsl::e..operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (13 cpu-clock samples, 3.27%)qs..std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (10 cpu-clock samples, 2.52%)s..std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (5 cpu-clock samples, 1.26%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (5 cpu-clock samples, 1.26%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (2 cpu-clock samples, 0.50%)std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const (1 cpu-clock samples, 0.25%)std::pmr::(anonymous namespace)::newdel_res_t::do_allocate(unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::cancel(unsigned long) (23 cpu-clock samples, 5.79%)qsl::e..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (22 cpu-clock samples, 5.54%)declty..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (13 cpu-clock samples, 3.27%)qs..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.76%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (5 cpu-clock samples, 1.26%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (7 cpu-clock samples, 1.76%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (3 cpu-clock samples, 0.76%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (56 cpu-clock samples, 14.11%)qsl::gateway::Sessio..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (56 cpu-clock samples, 14.11%)qsl::gateway::Sessio..__memcpy_generic (1 cpu-clock samples, 0.25%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (46 cpu-clock samples, 11.59%)qsl::gateway::Se..qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (10 cpu-clock samples, 2.52%)q..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (3 cpu-clock samples, 0.76%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::Fill const&) (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (28 cpu-clock samples, 7.05%)qsl::gate..qsl::engine::MatchingEngine::can_store_limit(unsigned int, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (2 cpu-clock samples, 0.50%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (18 cpu-clock samples, 4.53%)qsl:..operator new(unsigned long) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (9 cpu-clock samples, 2.27%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (2 cpu-clock samples, 0.50%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (3 cpu-clock samples, 0.76%)qsl::engine::check_limit(qsl::engine::RiskConfig const&, qsl::core::Side, long, unsigned int) (2 cpu-clock samples, 0.50%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (6 cpu-clock samples, 1.51%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (5 cpu-clock samples, 1.26%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (15 cpu-clock samples, 3.78%)qsl..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (4 cpu-clock samples, 1.01%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (32 cpu-clock samples, 8.06%)qsl::repla..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::cancel(unsigned long) (3 cpu-clock samples, 0.76%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (4 cpu-clock samples, 1.01%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (18 cpu-clock samples, 4.53%)qsl:..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.27%)qs..__memcpy_generic (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 1.01%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (6 cpu-clock samples, 1.51%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 1.01%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (3 cpu-clock samples, 0.76%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.25%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::contains(unsigned long) const (3 cpu-clock samples, 0.76%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (16 cpu-clock samples, 4.03%)qsl..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::contains(unsigned long) const (5 cpu-clock samples, 1.26%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (5 cpu-clock samples, 1.26%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.25%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (32 cpu-clock samples, 8.06%)qsl::repla..qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (30 cpu-clock samples, 7.56%)qsl::repl..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::cancel(unsigned long) (3 cpu-clock samples, 0.76%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.76%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (2 cpu-clock samples, 0.50%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.50%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (18 cpu-clock samples, 4.53%)qsl:..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (8 cpu-clock samples, 2.02%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.50%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (4 cpu-clock samples, 1.01%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (1 cpu-clock samples, 0.25%)operator new(unsigned long) (4 cpu-clock samples, 1.01%)malloc@plt (4 cpu-clock samples, 1.01%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (14 cpu-clock samples, 3.53%)qsl..[unknown] (14 cpu-clock samples, 3.53%)[un..[unknown] (14 cpu-clock samples, 3.53%)[un..[unknown] (9 cpu-clock samples, 2.27%)[unknown] (2 cpu-clock samples, 0.50%)_mid_memalign (2 cpu-clock samples, 0.50%)__posix_memalign (7 cpu-clock samples, 1.76%)malloc (5 cpu-clock samples, 1.26%)operator new(unsigned long, std::align_val_t) (5 cpu-clock samples, 1.26%)__posix_memalign (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (11 cpu-clock samples, 2.77%)q..[unknown] (9 cpu-clock samples, 2.27%)[unknown] (9 cpu-clock samples, 2.27%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (3 cpu-clock samples, 0.76%)_mid_memalign (3 cpu-clock samples, 0.76%)__posix_memalign (2 cpu-clock samples, 0.50%)malloc (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t) (4 cpu-clock samples, 1.01%)__posix_memalign (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (13 cpu-clock samples, 3.27%)qs..[unknown] (11 cpu-clock samples, 2.77%)[..[unknown] (11 cpu-clock samples, 2.77%)[..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (9 cpu-clock samples, 2.27%)malloc (5 cpu-clock samples, 1.26%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (14 cpu-clock samples, 3.53%)qsl..[unknown] (14 cpu-clock samples, 3.53%)[un..[unknown] (14 cpu-clock samples, 3.53%)[un..cfree@GLIBC_2.17 (7 cpu-clock samples, 1.76%)operator new(unsigned long) (7 cpu-clock samples, 1.76%)malloc (5 cpu-clock samples, 1.26%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)malloc@plt (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)posix_memalign@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (1 cpu-clock samples, 0.25%)_mid_memalign (1 cpu-clock samples, 0.25%)__posix_memalign (1 cpu-clock samples, 0.25%)malloc (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)__posix_memalign (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)operator new(unsigned long) (5 cpu-clock samples, 1.26%)malloc (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)_mid_memalign (4 cpu-clock samples, 1.01%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (4 cpu-clock samples, 1.01%)[unknown] (4 cpu-clock samples, 1.01%)[unknown] (4 cpu-clock samples, 1.01%)cfree@GLIBC_2.17 (4 cpu-clock samples, 1.01%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t)@plt (2 cpu-clock samples, 0.50%)__libc_start_call_main (7 cpu-clock samples, 1.76%)[unknown] (7 cpu-clock samples, 1.76%)[unknown] (7 cpu-clock samples, 1.76%)cfree@GLIBC_2.17 (7 cpu-clock samples, 1.76%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (6 cpu-clock samples, 1.51%)[unknown] (6 cpu-clock samples, 1.51%)[unknown] (6 cpu-clock samples, 1.51%)cfree@GLIBC_2.17 (6 cpu-clock samples, 1.51%)main (16 cpu-clock samples, 4.03%)main[unknown] (11 cpu-clock samples, 2.77%)[..[unknown] (11 cpu-clock samples, 2.77%)[..operator new(unsigned long) (11 cpu-clock samples, 2.77%)o..malloc (9 cpu-clock samples, 2.27%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (4 cpu-clock samples, 1.01%)operator new(unsigned long) (5 cpu-clock samples, 1.26%)malloc@plt (5 cpu-clock samples, 1.26%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (6 cpu-clock samples, 1.51%)[unknown] (4 cpu-clock samples, 1.01%)[unknown] (4 cpu-clock samples, 1.01%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)malloc (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (2 cpu-clock samples, 0.50%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (14 cpu-clock samples, 3.53%)qsl..[unknown] (10 cpu-clock samples, 2.52%)[..[unknown] (10 cpu-clock samples, 2.52%)[..[unknown] (5 cpu-clock samples, 1.26%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)_mid_memalign (2 cpu-clock samples, 0.50%)__posix_memalign (2 cpu-clock samples, 0.50%)malloc (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t) (5 cpu-clock samples, 1.26%)__posix_memalign (4 cpu-clock samples, 1.01%)memcpy@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (17 cpu-clock samples, 4.28%)qsl:..free@plt (2 cpu-clock samples, 0.50%)operator delete(void*, std::align_val_t)@plt (6 cpu-clock samples, 1.51%)operator delete(void*, unsigned long, std::align_val_t)@plt (6 cpu-clock samples, 1.51%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)@plt (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook()@plt (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (2 cpu-clock samples, 0.50%)free@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (3 cpu-clock samples, 0.76%)operator new(unsigned long)@plt (3 cpu-clock samples, 0.76%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)malloc (2 cpu-clock samples, 0.50%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (3 cpu-clock samples, 0.76%)memcpy@plt (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (6 cpu-clock samples, 1.51%)free@plt (2 cpu-clock samples, 0.50%)operator delete(void*, unsigned long, std::align_val_t)@plt (4 cpu-clock samples, 1.01%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (5 cpu-clock samples, 1.26%)operator new(unsigned long, std::align_val_t)@plt (5 cpu-clock samples, 1.26%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)operator delete(void*, std::align_val_t)@plt (1 cpu-clock samples, 0.25%) diff --git a/results/flamegraph.txt b/results/flamegraph.txt new file mode 100644 index 0000000..4560ad7 --- /dev/null +++ b/results/flamegraph.txt @@ -0,0 +1,58 @@ +Command: make flamegraph +Artifact: flamegraph (cpu-clock software sampling hot-symbol profile) +Hardware: aarch64 +OS: Linux 6.19.14-400.asahi.fc44.aarch64+16k +CPU: Avalanche-M2 +Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2) +Perf: perf version 6.19.14-400.asahi.fc44.aarch64 +Perf paranoid: 2 +Build type: Release +Provenance version: 1 +Git commit (informational): 0c3b401 +Source digest: sha256:0d8061b5c92b9a8a1f3bffd14a340e733f28674b14d5716c2eaa6bdb00b31242 +Source digest scope: flamegraph-benchmark +Dirty inputs: no +Generated output: results/flamegraph.svg +Date: 2026-06-21T06:36:51Z +Benchmark binary: build/bench/qsl-bench +Dataset: qsl-bench default synthetic benchmark suite +Call graph: dwarf +Record event: cpu-clock +Sample freq: 4000 Hz +Sample count: 397 +Folded stacks: 171 +Minimum samples for hot profile: 200 +Insufficient samples: no +Record status: 0 +Script status: 0 +Perf access limitation: no +Flamegraph SVG: results/flamegraph.svg +Perf data: build/perf/qsl-bench.flame.data (generated, not intended for commit) + +Caveat: this flamegraph is a software cpu-clock sampling profile for hot-symbol +investigation. Frame width is proportional to on-CPU samples, not wall-clock +latency or throughput, and is hardware/kernel/compiler/build dependent. + +Top 15 folded stacks (count stack): + 15 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) + 9 qsl-bench;main;[unknown];[unknown];operator new(unsigned long);malloc + 7 qsl-bench;__libc_start_call_main;[unknown];[unknown];cfree@GLIBC_2.17 + 7 qsl-bench;[unknown];[unknown];qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];cfree@GLIBC_2.17 + 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main + 6 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 + 6 qsl-bench;qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);operator delete(void*, std::align_val_t)@plt + 6 qsl-bench;qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);operator delete(void*, unsigned long, std::align_val_t)@plt + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::protocol::decode_new_order(std::span) + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&);qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 5 qsl-bench;operator new(unsigned long);malloc@plt + 5 qsl-bench;std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&);operator new(unsigned long, std::align_val_t)@plt + 5 qsl-bench;[unknown];qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&);[unknown];[unknown];operator new(unsigned long);malloc + 5 qsl-bench;[unknown];[unknown];qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long);[unknown];[unknown];[unknown];__posix_memalign;malloc + 5 qsl-bench;[unknown];[unknown];qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long);[unknown];[unknown];operator new(unsigned long);malloc + +Benchmark output: +order_book add/mod/cancel 200000 ops 151.3 ns/op 6607667 ops/sec +protocol encode+decode 500000 ops 21.8 ns/op 45829279 ops/sec +gateway session (fill) 200000 ops 132.3 ns/op 7556487 ops/sec +matching engine flow 5004 items 104.7 ns/item 9553139 items/sec +replay command log 5004 items 115.1 ns/item 8690129 items/sec From 59f0fc32271e242b6f62bb9580e3596a541f1415 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 02:52:37 -0400 Subject: [PATCH 04/22] feat: add FIX-like text protocol adapter (#29) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a human-readable `tag=value` text protocol alongside the binary codec, mapping the same internal message structs (issue #29, reprioritized by the human from the backlog). - include/qsl/protocol/fix.hpp + src/protocol/fix.cpp: SOH-framed `tag=value` adapter with genuine FIX framing — 8 BeginString / 9 BodyLength / 35 MsgType / ... / 10 mod-256 CheckSum — for the client->gateway order path: NewOrderSingle (35=D) -> NewOrder, OrderCancelRequest (35=F) -> CancelOrder. Decoding is total, deterministic, and noexcept (fixed field table, std::from_chars, std::string_view; no heap on the decode path) and reports every malformed input through a FixError taxonomy mirroring the binary codec's DecodeError. Documented, deliberate simplifications: Symbol (55) carries the numeric SymbolId; Price (44) carries integer ticks and is always present, so NewOrder<->FIX is a lossless bijection like the binary codec (price is never a float). - tests/unit/test_fix_protocol.cpp: mirrors the binary required tests and adds a cross-codec equivalence test (binary and FIX decode the same order to identical structs across all Side x OrdType x TIF), a byte-pinned fixture, and rejection of malformed framing / unsupported BeginString / unknown-or-wrong MsgType / BodyLength mismatch / CheckSum mismatch / missing field / invalid field / invalid enum / out-of-range / oversized messages. - docs/fix_protocol.md (+ pointer from docs/binary_protocol.md); MILESTONES.md and PROGRESS.md updated. make check 260/260; make asan 260/260 (the adapter parses untrusted text). Closes #29. Co-Authored-By: Claude Opus 4.8 --- CMakeLists.txt | 3 +- MILESTONES.md | 15 +- PROGRESS.md | 18 ++ docs/binary_protocol.md | 6 + docs/fix_protocol.md | 95 ++++++++ include/qsl/protocol/fix.hpp | 110 +++++++++ src/protocol/fix.cpp | 369 +++++++++++++++++++++++++++++++ tests/CMakeLists.txt | 2 +- tests/unit/test_fix_protocol.cpp | 268 ++++++++++++++++++++++ 9 files changed, 879 insertions(+), 7 deletions(-) create mode 100644 docs/fix_protocol.md create mode 100644 include/qsl/protocol/fix.hpp create mode 100644 src/protocol/fix.cpp create mode 100644 tests/unit/test_fix_protocol.cpp diff --git a/CMakeLists.txt b/CMakeLists.txt index 54b3098..383a4e6 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -10,7 +10,8 @@ include(cmake/ProjectOptions.cmake) include(cmake/CompilerWarnings.cmake) include(cmake/Sanitizers.cmake) -add_library(qsl_core src/core/types.cpp src/protocol/codec.cpp src/engine/order_book.cpp +add_library(qsl_core src/core/types.cpp src/protocol/codec.cpp src/protocol/fix.cpp + src/engine/order_book.cpp src/engine/matching_engine.cpp src/engine/risk.cpp src/gateway/order_gateway.cpp src/feed/market_data.cpp src/feed/publisher.cpp src/replay/event_log.cpp src/replay/command.cpp diff --git a/MILESTONES.md b/MILESTONES.md index c32aec8..34ac385 100644 --- a/MILESTONES.md +++ b/MILESTONES.md @@ -469,10 +469,11 @@ Sequential, dependency-ordered. **Build them in order.** Each milestone is one f > perf/flamegraph). M25 (memory-ordering evidence), M30 (socket profiling/hardening), and M31 > (external review) are new milestones with no prior issue. PR #112 closed > the remaining tractable systems items **#26** (portable TCP serving beyond one-connection-at-a-time -> accept) and **#28** (realistic synthetic order-flow model). The genuinely **deferred** product/API -> items remain **#29** (FIX adapter), **#30** (web dashboard), **#31** (Docker packaging), and -> **#33** (Pages site) — do not start them before the Phase III/IV systems roadmap unless the human -> explicitly reprioritizes. +> accept) and **#28** (realistic synthetic order-flow model). The human later reprioritized two +> backlog items, now **done**: **#32** (perf/flamegraph) and **#29** (FIX-like text protocol +> adapter). The genuinely **deferred** product/API items remain **#30** (web dashboard), **#31** +> (Docker packaging), and **#33** (Pages site) — do not start them before the Phase III/IV systems +> roadmap unless the human explicitly reprioritizes. Do not pull backlog items into earlier PRs. @@ -481,7 +482,11 @@ Do not pull backlog items into earlier PRs. - Multithreaded gateway and market data pipeline, plus portable threaded TCP serving follow-up. (#26) - ThreadSanitizer coverage. (#27) - More realistic synthetic order-flow model. (#28) -- FIX-like text protocol adapter. (#29) +- FIX-like text protocol adapter. (#29) — **done**: `tag=value` SOH-framed adapter + (`include/qsl/protocol/fix.hpp`, `src/protocol/fix.cpp`) over the same internal structs as the + binary codec, with genuine FIX BeginString/BodyLength/CheckSum framing for NewOrderSingle (35=D) + and OrderCancelRequest (35=F). Cross-codec equivalence + malformed-input rejection tested in + `tests/unit/test_fix_protocol.cpp`; docs in `docs/fix_protocol.md`. - Web dashboard for visualization. (#30) - Docker packaging. (#31) - Perf/flamegraph docs. (#32) — **done**: `make flamegraph` renders a perf call-graph flamegraph diff --git a/PROGRESS.md b/PROGRESS.md index 7293498..f7c953c 100644 --- a/PROGRESS.md +++ b/PROGRESS.md @@ -378,6 +378,24 @@ Lower priority: the bare-metal Fedora Asahi host (aarch64) from the clean committed tree (`Dirty inputs: no`). This is a software cpu-clock sampling hot-symbol profile, not a latency/throughput claim; full hardware cache-PMU evidence stays in #90. Do not merge from automation; human squash-merges. +- [2026-06-21] Issue #29 FIX-like text protocol adapter (`feat/fix-text-protocol-adapter`, stacked + on the flamegraph branch). Added `include/qsl/protocol/fix.hpp` + `src/protocol/fix.cpp`: a + `tag=value` SOH-framed adapter over the SAME internal structs as the binary codec, with genuine + FIX framing (8 BeginString / 9 BodyLength / 35 MsgType / … / 10 mod-256 CheckSum) for the + client→gateway order path — NewOrderSingle (35=D)→`NewOrder` and OrderCancelRequest + (35=F)→`CancelOrder`. Decoding is total/deterministic/`noexcept` (fixed field table, + `std::from_chars`, `string_view`; no heap on decode) and reports every malformed input through a + `FixError` taxonomy mirroring the binary `DecodeError`. Documented, deliberate simplifications: + Symbol (55) carries the numeric SymbolId; Price (44) carries integer ticks and is always present, + making `NewOrder↔FIX` a lossless bijection like the binary codec (never float for price). + `tests/unit/test_fix_protocol.cpp` mirrors the binary required tests and adds a **cross-codec + equivalence** test (binary and FIX decode the same order to identical structs across all + Side×OrdType×TIF), a byte-pinned fixture (checksum 164 / body-length 50), and rejection of + malformed framing / unsupported BeginString / unknown-or-wrong MsgType / BodyLength mismatch / + CheckSum mismatch / missing field / invalid field / invalid enum / out-of-range / oversized. Docs + in `docs/fix_protocol.md` (+ pointer from `docs/binary_protocol.md`). `make check` 260/260 and + `make asan` 260/260 clean (the parser handles untrusted text). Closes #29. Do not merge from + automation; human squash-merges. - [2026-06-03] M35: implemented a multi-client TCP connection-scaling load test (`scripts/socket_load.sh`, `make socket-load`, Linux-only) driving N concurrent `qsl-client`s against the portable TCP and epoll (M34) gateways; `results/socket_load_summary.txt` is Docker-generated and constrained. A `/code-review` (3 finder agents) caught and fixed real measurement-integrity bugs before the PR: a failed trial's `wall=0` no longer poisons the reported best (only trials whose gateway served count toward the min); the `completed` column reports the WORST per-trial completion, not the last, so partial/total trial failures are surfaced rather than masked; a per-client `timeout` bounds a hang if the gateway dies; and `QSL_LOAD_TRIALS` is validated. Post-PR hardening uses fresh monotonic ports per gateway start, retries transient startup/serve failures on new ports, and refuses to write a partial artifact unless `QSL_LOAD_ALLOW_PARTIAL=1` is set intentionally; the refreshed artifact records `Dirty tree: no`. The scaling-shape claim remains constrained to loopback connection setup, not a demonstrated production-capacity advantage for either transport. Deferred follow-up: a shared `scripts/lib` to remove the dirty-tree / `wait_ready` / gateway-stop duplication across the three socket scripts. - [2026-06-03] M35: started after M34 (#98) squash-merged (commit 9e3750b). Scope: multi-client load / socket-pressure testing of the gateway/feed path (TCP/UDP stress, socket-buffer pressure, connection scaling, backpressure) building on M34's epoll multi-client path and M30's socket tooling. Constraints: scripts/tests document load shape + environment; results must distinguish kernel/socket pressure from user-space engine cost; no production-capacity claims (honest constrained-environment framing, like M29/M30). - [2026-06-04] M35: PR #100 squash-merged to `main` as a86b701 after all CI jobs and review checks were green. M35 is now landed; original M36 NUMA remains deferred until the repository-health refactor analysis is completed or explicitly skipped by the human. diff --git a/docs/binary_protocol.md b/docs/binary_protocol.md index 2d2a827..1e1c948 100644 --- a/docs/binary_protocol.md +++ b/docs/binary_protocol.md @@ -81,3 +81,9 @@ a byte stream belong to the TCP/session layer (M9), not the codec. The wire format is pinned by a byte-fixture test (`tests/unit/test_protocol.cpp`) so any accidental change to field order or byte order fails the build. + +## Text alternative + +A human-readable, FIX-like `tag=value` adapter over the same internal message structs lives +alongside this binary codec — see [fix_protocol.md](fix_protocol.md). Both decode the same order to +identical structs, which the tests assert directly. diff --git a/docs/fix_protocol.md b/docs/fix_protocol.md new file mode 100644 index 0000000..5ecbf14 --- /dev/null +++ b/docs/fix_protocol.md @@ -0,0 +1,95 @@ +# FIX-like Text Protocol Adapter + +A human-readable `tag=value` text protocol alongside the [binary protocol](binary_protocol.md), +mapping the **same internal message structs**. Implemented in `include/qsl/protocol/fix.hpp` and +`src/protocol/fix.cpp`; tested in `tests/unit/test_fix_protocol.cpp`. This is the optional FIX +adapter tracked by issue #29. + +It is **FIX-like**, not a full FIX engine: it implements the genuine FIX framing and the +client→gateway order path, deliberately scoped to what mirrors the binary codec. + +## Why it exists + +Real venues speak FIX as well as binary protocols. Showing a second, independently-validated wire +format over one internal model demonstrates a clean protocol boundary: the engine does not care +which encoding produced a `NewOrder`. The strongest invariant the tests assert is that a binary +frame and a FIX message for the same order **decode to identical internal structs**. + +## Framing + +A message is a sequence of `tag=value` fields, each terminated by the **SOH** byte (`0x01`). The +adapter uses the standard FIX envelope: + +```text +8=FIX.4.2 | 9= | 35= | | 10= | +``` + +(`|` denotes SOH.) + +- **`8` BeginString** must be `FIX.4.2`; anything else is `UnsupportedBeginString`. +- **`9` BodyLength** is the byte count from the field after tag 9 through the SOH before tag 10. + A mismatch is `BodyLengthMismatch`. +- **`10` CheckSum** is the mod-256 sum of every byte up to the SOH before tag 10, formatted as + exactly three zero-padded digits. A mismatch is `ChecksumMismatch`. + +## Messages + +### NewOrderSingle (`35=D`) → `NewOrder` + +| Tag | FIX name | Internal field | Encoding | +|-----|---------------|----------------|----------| +| 34 | MsgSeqNum | sequence no. | carried like the binary header `seq_no` (validated, not stored in the body struct) | +| 11 | ClOrdID | `order_id` | decimal | +| 55 | Symbol | `symbol` | decimal `SymbolId` (see simplifications) | +| 54 | Side | `side` | `1`=Buy, `2`=Sell | +| 38 | OrderQty | `quantity` | decimal | +| 40 | OrdType | `type` | `1`=Market, `2`=Limit | +| 44 | Price | `price` | integer ticks (see simplifications) | +| 59 | TimeInForce | `tif` | `1`=GTC, `3`=IOC | + +### OrderCancelRequest (`35=F`) → `CancelOrder` + +| Tag | FIX name | Internal field | Notes | +|-----|--------------|----------------|-------| +| 34 | MsgSeqNum | sequence no. | as above | +| 41 | OrigClOrdID | `order_id` | the order being cancelled | +| 11 | ClOrdID | — | required by FIX; echoes `order_id` (the cancel request id is not modelled) | +| 55 | Symbol | `symbol` | decimal `SymbolId` | + +## Deliberate simplifications + +These are documented departures from strict FIX, chosen so the adapter stays a deterministic, +lossless map onto the simulator's internal model: + +- **Symbol (tag 55) carries the numeric `SymbolId`** in decimal, not a ticker string — the engine + keys on `SymbolId`, so mapping to a string table would only add a lossy layer. +- **Price (tag 44) carries integer ticks and is always present**, including market orders. The + project never represents price as a float, and `NewOrder` always has a `price` field; carrying it + losslessly makes `NewOrder ↔ FIX` a true bijection over the internal struct, exactly like the + binary codec. (Strict FIX uses a decimal price and forbids tag 44 on market orders.) + +## Error model + +Decoding is total and deterministic: it never throws, allocates nothing on the decode path (a +fixed field table, `std::from_chars`, `std::string_view`), and reports every malformed input via +`FixError` rather than undefined behavior — mirroring the binary codec's `DecodeError` discipline. + +`FixError`: `None`, `Malformed`, `UnsupportedBeginString`, `UnknownMsgType`, `MissingField`, +`InvalidField`, `BodyLengthMismatch`, `ChecksumMismatch`, `InvalidEnumValue`, `OutOfRange`. + +## Determinism and testing + +`tests/unit/test_fix_protocol.cpp` mirrors the binary codec's required tests and adds FIX-specific +ones: + +- round-trip for NewOrderSingle and OrderCancelRequest; +- **cross-codec equivalence**: binary and FIX decode the same order to identical structs across all + Side × OrdType × TIF combinations; +- a **byte-pinned fixture** (`8=FIX.4.2|9=50|35=D|…|10=164|`) so any change to field order or the + checksum/body-length computation fails the build; +- rejection of malformed framing, unsupported BeginString, unknown/wrong MsgType, BodyLength + mismatch, CheckSum mismatch, missing required fields, non-numeric fields, invalid enum codes, + out-of-range integers, and oversized messages; +- signed/extreme `int64` price and `uint64` id/seq round-trips. + +The adapter is also covered by the ASan/UBSan preset (`make asan`), since it parses untrusted text. diff --git a/include/qsl/protocol/fix.hpp b/include/qsl/protocol/fix.hpp new file mode 100644 index 0000000..7c15562 --- /dev/null +++ b/include/qsl/protocol/fix.hpp @@ -0,0 +1,110 @@ +#pragma once + +// FIX-like text protocol adapter (issue #29). +// +// A human-readable `tag=value` wire format alongside the binary codec +// (qsl/protocol/codec.hpp), mapping the same internal message structs. It is +// "FIX-like": it uses genuine FIX framing — SOH-delimited tag=value fields, the +// 8/9/35/.../10 envelope, a BodyLength (tag 9) and a mod-256 CheckSum (tag 10) — +// for the client->gateway order path: NewOrderSingle (35=D) -> NewOrder and +// OrderCancelRequest (35=F) -> CancelOrder. +// +// Deliberate, documented simplifications for a deterministic simulator (see +// docs/fix_protocol.md): +// * Symbol (tag 55) carries the numeric SymbolId in decimal, not a ticker +// string — the internal model keys on SymbolId. +// * Price (tag 44) carries integer ticks, never a decimal/float, and is always +// present (including market orders). This keeps NewOrder<->FIX a lossless +// bijection over the internal struct, exactly like the binary codec, so a +// binary frame and a FIX message for the same order decode to identical +// structs. Prices are never floating point. +// +// Decoding is total and deterministic: it never throws, allocates only the +// returned string on encode, and reports every malformed input through FixError +// rather than undefined behavior — mirroring the binary codec's DecodeError +// discipline. + +#include "qsl/protocol/messages.hpp" + +#include +#include + +namespace qsl::protocol::fix { + +// FIX field separator (SOH, 0x01) and the supported BeginString (tag 8). +inline constexpr char kSoh = '\x01'; +inline constexpr std::string_view kBeginString = "FIX.4.2"; +// Defensive upper bound on a single message; order messages are small. +inline constexpr std::size_t kMaxMessageLen = 1024; + +// FIX MsgType (tag 35) values this adapter handles. +inline constexpr char kMsgNewOrderSingle = 'D'; +inline constexpr char kMsgOrderCancelRequest = 'F'; + +// Deterministic decode outcomes for malformed/invalid FIX text. Extends the +// binary codec's error taxonomy with FIX-envelope-specific failures. +enum class FixError : std::uint8_t { + None = 0, + Malformed, // not tag=value/SOH framed, or empty/oversized + UnsupportedBeginString, // tag 8 != kBeginString + UnknownMsgType, // tag 35 absent, or not the expected message type + MissingField, // a required tag is absent + InvalidField, // a value failed integer parsing / is empty + BodyLengthMismatch, // tag 9 (BodyLength) != the actual body byte count + ChecksumMismatch, // tag 10 (CheckSum) != the computed mod-256 sum + InvalidEnumValue, // Side/OrdType/TimeInForce code is not recognized + OutOfRange, // a parsed integer does not fit its target field +}; + +[[nodiscard]] constexpr const char *to_string(FixError e) noexcept { + switch (e) { + case FixError::None: + return "None"; + case FixError::Malformed: + return "Malformed"; + case FixError::UnsupportedBeginString: + return "UnsupportedBeginString"; + case FixError::UnknownMsgType: + return "UnknownMsgType"; + case FixError::MissingField: + return "MissingField"; + case FixError::InvalidField: + return "InvalidField"; + case FixError::BodyLengthMismatch: + return "BodyLengthMismatch"; + case FixError::ChecksumMismatch: + return "ChecksumMismatch"; + case FixError::InvalidEnumValue: + return "InvalidEnumValue"; + case FixError::OutOfRange: + return "OutOfRange"; + } + return "Unknown"; +} + +template struct FixDecodeResult { + FixError error{FixError::None}; + T value{}; + + [[nodiscard]] bool ok() const noexcept { return error == FixError::None; } +}; + +// Encode internal order structs to a complete FIX-like message string (a single +// allocation, framed with BeginString/BodyLength/CheckSum). `seq` is carried in +// MsgSeqNum (tag 34), mirroring the binary frame's header sequence number. +[[nodiscard]] std::string encode(const NewOrder &msg, SeqNo seq); +[[nodiscard]] std::string encode(const CancelOrder &msg, SeqNo seq); + +// Decode and validate a complete FIX-like message into the internal struct. +[[nodiscard]] FixDecodeResult decode_new_order(std::string_view msg) noexcept; +[[nodiscard]] FixDecodeResult decode_cancel_order(std::string_view msg) noexcept; + +// Validate the FIX envelope (8/9/.../10) and return the MsgType (tag 35) so a +// dispatcher can route to the right typed decoder. +[[nodiscard]] FixDecodeResult peek_msg_type(std::string_view msg) noexcept; + +// Validate the envelope and return MsgSeqNum (tag 34). Useful for verifying that +// the sequence number round-trips, since the typed decoders return only the body. +[[nodiscard]] FixDecodeResult peek_seq(std::string_view msg) noexcept; + +} // namespace qsl::protocol::fix diff --git a/src/protocol/fix.cpp b/src/protocol/fix.cpp new file mode 100644 index 0000000..b97e4a8 --- /dev/null +++ b/src/protocol/fix.cpp @@ -0,0 +1,369 @@ +#include "qsl/protocol/fix.hpp" + +#include "qsl/core/types.hpp" + +#include +#include +#include +#include + +namespace qsl::protocol::fix { + +namespace { + +// FIX tags this adapter reads/writes. +enum Tag : unsigned { + kTagBeginString = 8, + kTagBodyLength = 9, + kTagCheckSum = 10, + kTagClOrdID = 11, + kTagMsgSeqNum = 34, + kTagMsgType = 35, + kTagOrderQty = 38, + kTagOrdType = 40, + kTagOrigClOrdID = 41, + kTagPrice = 44, + kTagSide = 54, + kTagSymbol = 55, + kTagTimeInForce = 59, +}; + +constexpr std::size_t kMaxFields = 32; + +struct Field { + unsigned tag{0}; + std::string_view value{}; + std::size_t start{0}; // byte offset of the field within the message +}; + +struct Parsed { + std::array fields{}; + std::size_t count{0}; +}; + +// Parse an unsigned/signed integer, requiring the whole view to be consumed. +template [[nodiscard]] bool parse_int(std::string_view sv, Int &out) noexcept { + if (sv.empty()) { + return false; + } + const char *first = sv.data(); + const char *last = sv.data() + sv.size(); + const auto res = std::from_chars(first, last, out); + return res.ec == std::errc() && res.ptr == last; +} + +[[nodiscard]] const Field *find_field(const Parsed &p, unsigned tag) noexcept { + for (std::size_t i = 0; i < p.count; ++i) { + if (p.fields[i].tag == tag) { + return &p.fields[i]; + } + } + return nullptr; +} + +// Validate the FIX envelope: SOH-delimited tag=value framing, the 8/9/.../10 +// ordering, BodyLength (tag 9), and the mod-256 CheckSum (tag 10). On success the +// field table is filled; business fields are looked up by the typed decoders. +[[nodiscard]] FixError parse_envelope(std::string_view msg, Parsed &out) noexcept { + if (msg.empty() || msg.size() > kMaxMessageLen) { + return FixError::Malformed; + } + + std::size_t pos = 0; + while (pos < msg.size()) { + const std::size_t field_start = pos; + const std::size_t soh = msg.find(kSoh, pos); + if (soh == std::string_view::npos) { + return FixError::Malformed; // a field is not SOH-terminated + } + const std::size_t eq = msg.find('=', pos); + if (eq == std::string_view::npos || eq >= soh) { + return FixError::Malformed; // no '=' within the field + } + const std::string_view tag_sv = msg.substr(field_start, eq - field_start); + const std::string_view val_sv = msg.substr(eq + 1, soh - (eq + 1)); + unsigned tag = 0; + if (!parse_int(tag_sv, tag)) { + return FixError::Malformed; // non-numeric tag + } + if (out.count >= kMaxFields) { + return FixError::Malformed; // too many fields + } + out.fields[out.count++] = Field{tag, val_sv, field_start}; + pos = soh + 1; + } + + if (out.count < 3) { + return FixError::Malformed; + } + const Field &f_begin = out.fields[0]; + const Field &f_len = out.fields[1]; + const Field &f_csum = out.fields[out.count - 1]; + if (f_begin.tag != kTagBeginString || f_len.tag != kTagBodyLength || + f_csum.tag != kTagCheckSum) { + return FixError::Malformed; + } + if (f_begin.value != kBeginString) { + return FixError::UnsupportedBeginString; + } + + // BodyLength counts the bytes from the first field after tag 9 through the + // SOH preceding tag 10, i.e. [fields[2].start, checksum_field.start). + const std::size_t body_start = out.fields[2].start; + const std::size_t checksum_start = f_csum.start; + std::size_t body_len = 0; + if (!parse_int(f_len.value, body_len)) { + return FixError::InvalidField; + } + if (body_len != checksum_start - body_start) { + return FixError::BodyLengthMismatch; + } + + // CheckSum is the mod-256 sum of every byte up to the SOH before tag 10, + // formatted as exactly three digits. + unsigned declared = 0; + if (f_csum.value.size() != 3 || !parse_int(f_csum.value, declared)) { + return FixError::InvalidField; + } + unsigned sum = 0; + for (std::size_t i = 0; i < checksum_start; ++i) { + sum += static_cast(msg[i]); + } + if ((sum & 0xFFu) != declared) { + return FixError::ChecksumMismatch; + } + return FixError::None; +} + +// Extract a required integer field; map absence/format/overflow to structured +// errors. A value that does not fit the target field is OutOfRange (distinct from +// a non-numeric InvalidField). +template +[[nodiscard]] FixError require_int(const Parsed &p, unsigned tag, Int &out) noexcept { + const Field *f = find_field(p, tag); + if (f == nullptr) { + return FixError::MissingField; + } + if (f->value.empty()) { + return FixError::InvalidField; + } + const char *first = f->value.data(); + const char *last = f->value.data() + f->value.size(); + const auto res = std::from_chars(first, last, out); + if (res.ec == std::errc::result_out_of_range) { + return FixError::OutOfRange; + } + if (res.ec != std::errc() || res.ptr != last) { + return FixError::InvalidField; + } + return FixError::None; +} + +// Require a single-character coded enum field (e.g. Side, OrdType, TIF). +[[nodiscard]] FixError require_code(const Parsed &p, unsigned tag, char &out) noexcept { + const Field *f = find_field(p, tag); + if (f == nullptr) { + return FixError::MissingField; + } + if (f->value.size() != 1) { + return FixError::InvalidEnumValue; + } + out = f->value.front(); + return FixError::None; +} + +void append_field(std::string &dst, unsigned tag, std::string_view value) { + dst += std::to_string(tag); + dst += '='; + dst += value; + dst += kSoh; +} + +void append_field(std::string &dst, unsigned tag, std::uint64_t value) { + append_field(dst, tag, std::to_string(value)); +} + +// Wrap an already-built body (the fields from tag 35 onward) in the FIX +// envelope: prepend 8/9 and append the computed CheckSum (tag 10). +[[nodiscard]] std::string frame(const std::string &body) { + std::string head; + head += "8="; + head += kBeginString; + head += kSoh; + head += "9="; + head += std::to_string(body.size()); + head += kSoh; + head += body; + + unsigned sum = 0; + for (const char c : head) { + sum += static_cast(c); + } + const unsigned cs = sum % 256u; // FIX CheckSum is the mod-256 byte sum... + char csum[4]; + // ...formatted as exactly three zero-padded digits (0..255). + csum[0] = static_cast('0' + ((cs / 100) % 10)); + csum[1] = static_cast('0' + ((cs / 10) % 10)); + csum[2] = static_cast('0' + (cs % 10)); + csum[3] = '\0'; + + head += "10="; + head += csum; + head += kSoh; + return head; +} + +} // namespace + +std::string encode(const NewOrder &msg, SeqNo seq) { + std::string body; + append_field(body, kTagMsgType, std::string_view(&kMsgNewOrderSingle, 1)); + append_field(body, kTagMsgSeqNum, seq); + append_field(body, kTagClOrdID, msg.order_id); + append_field(body, kTagSymbol, static_cast(msg.symbol)); + append_field(body, kTagSide, msg.side == Side::Buy ? "1" : "2"); + append_field(body, kTagOrderQty, static_cast(msg.quantity)); + append_field(body, kTagOrdType, msg.type == OrderType::Market ? "1" : "2"); + // Price is integer ticks (never a float) and is always present, so the FIX + // and binary encodings are both lossless bijections over NewOrder. + append_field(body, kTagPrice, std::to_string(static_cast(msg.price))); + append_field(body, kTagTimeInForce, msg.tif == TimeInForce::GTC ? "1" : "3"); + return frame(body); +} + +std::string encode(const CancelOrder &msg, SeqNo seq) { + std::string body; + append_field(body, kTagMsgType, std::string_view(&kMsgOrderCancelRequest, 1)); + append_field(body, kTagMsgSeqNum, seq); + // OrigClOrdID identifies the order to cancel; ClOrdID is the (required) id of + // the cancel request itself. CancelOrder carries only one id, so both echo it. + append_field(body, kTagOrigClOrdID, msg.order_id); + append_field(body, kTagClOrdID, msg.order_id); + append_field(body, kTagSymbol, static_cast(msg.symbol)); + return frame(body); +} + +FixDecodeResult decode_new_order(std::string_view msg) noexcept { + Parsed p; + if (const FixError e = parse_envelope(msg, p); e != FixError::None) { + return {e, {}}; + } + const Field *type = find_field(p, kTagMsgType); + if (type == nullptr || type->value.size() != 1 || type->value.front() != kMsgNewOrderSingle) { + return {FixError::UnknownMsgType, {}}; + } + + NewOrder out{}; + SeqNo seq = 0; // standard header field (tag 34); validated but not stored. + if (const FixError e = require_int(p, kTagMsgSeqNum, seq); e != FixError::None) { + return {e, {}}; + } + if (const FixError e = require_int(p, kTagClOrdID, out.order_id); e != FixError::None) { + return {e, {}}; + } + if (const FixError e = require_int(p, kTagSymbol, out.symbol); e != FixError::None) { + return {e, {}}; + } + if (const FixError e = require_int(p, kTagOrderQty, out.quantity); e != FixError::None) { + return {e, {}}; + } + if (const FixError e = require_int(p, kTagPrice, out.price); e != FixError::None) { + return {e, {}}; + } + + char side = 0; + char ord_type = 0; + char tif = 0; + if (const FixError e = require_code(p, kTagSide, side); e != FixError::None) { + return {e, {}}; + } + if (const FixError e = require_code(p, kTagOrdType, ord_type); e != FixError::None) { + return {e, {}}; + } + if (const FixError e = require_code(p, kTagTimeInForce, tif); e != FixError::None) { + return {e, {}}; + } + + switch (side) { + case '1': + out.side = Side::Buy; + break; + case '2': + out.side = Side::Sell; + break; + default: + return {FixError::InvalidEnumValue, {}}; + } + switch (ord_type) { + case '1': + out.type = OrderType::Market; + break; + case '2': + out.type = OrderType::Limit; + break; + default: + return {FixError::InvalidEnumValue, {}}; + } + switch (tif) { + case '1': + out.tif = TimeInForce::GTC; + break; + case '3': + out.tif = TimeInForce::IOC; + break; + default: + return {FixError::InvalidEnumValue, {}}; + } + return {FixError::None, out}; +} + +FixDecodeResult decode_cancel_order(std::string_view msg) noexcept { + Parsed p; + if (const FixError e = parse_envelope(msg, p); e != FixError::None) { + return {e, {}}; + } + const Field *type = find_field(p, kTagMsgType); + if (type == nullptr || type->value.size() != 1 || + type->value.front() != kMsgOrderCancelRequest) { + return {FixError::UnknownMsgType, {}}; + } + + CancelOrder out{}; + SeqNo seq = 0; + if (const FixError e = require_int(p, kTagMsgSeqNum, seq); e != FixError::None) { + return {e, {}}; + } + if (const FixError e = require_int(p, kTagOrigClOrdID, out.order_id); e != FixError::None) { + return {e, {}}; + } + if (const FixError e = require_int(p, kTagSymbol, out.symbol); e != FixError::None) { + return {e, {}}; + } + return {FixError::None, out}; +} + +FixDecodeResult peek_msg_type(std::string_view msg) noexcept { + Parsed p; + if (const FixError e = parse_envelope(msg, p); e != FixError::None) { + return {e, {}}; + } + const Field *type = find_field(p, kTagMsgType); + if (type == nullptr || type->value.size() != 1) { + return {FixError::UnknownMsgType, {}}; + } + return {FixError::None, type->value.front()}; +} + +FixDecodeResult peek_seq(std::string_view msg) noexcept { + Parsed p; + if (const FixError e = parse_envelope(msg, p); e != FixError::None) { + return {e, {}}; + } + SeqNo seq = 0; + if (const FixError e = require_int(p, kTagMsgSeqNum, seq); e != FixError::None) { + return {e, {}}; + } + return {FixError::None, seq}; +} + +} // namespace qsl::protocol::fix diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt index cb617a9..45669f1 100644 --- a/tests/CMakeLists.txt +++ b/tests/CMakeLists.txt @@ -15,7 +15,7 @@ foreach(t test_smoke test_types test_clock test_protocol test_order_book test_ma test_risk_gateway test_market_data test_event_log test_replay test_session test_tcp_gateway test_epoll_gateway test_md_feed test_invariants test_fuzz_protocol test_fixture_export test_shrink test_oracle_selftest test_reject_coverage test_spsc_ring - test_order_pool) + test_order_pool test_fix_protocol) add_executable(${t} unit/${t}.cpp) target_link_libraries(${t} PRIVATE qsl_core qsl_warnings Catch2::Catch2WithMain Threads::Threads) catch_discover_tests(${t}) diff --git a/tests/unit/test_fix_protocol.cpp b/tests/unit/test_fix_protocol.cpp new file mode 100644 index 0000000..4b9194d --- /dev/null +++ b/tests/unit/test_fix_protocol.cpp @@ -0,0 +1,268 @@ +#include "qsl/protocol/codec.hpp" +#include "qsl/protocol/fix.hpp" + +#include +#include +#include +#include +#include +#include +#include +#include + +using namespace qsl::protocol; + +namespace { + +constexpr char SOH = '\x01'; + +NewOrder sample_new_order() { + return NewOrder{/*order_id=*/1, /*symbol=*/2, /*price=*/12345, /*quantity=*/10, + Side::Buy, OrderType::Limit, TimeInForce::GTC}; +} + +// Build a complete FIX message from a body (the fields from tag 35 onward), +// computing BodyLength (tag 9) and the mod-256 CheckSum (tag 10). Lets a test +// construct messages with missing/invalid body fields that encode() never emits. +std::string wrap(const std::string &body) { + std::string head = "8="; + head += std::string(fix::kBeginString) + SOH; + head += "9=" + std::to_string(body.size()) + SOH; + head += body; + unsigned sum = 0; + for (const char c : head) { + sum += static_cast(c); + } + const unsigned mod = sum % 256u; + char cs[4]; + cs[0] = static_cast('0' + ((mod / 100) % 10)); + cs[1] = static_cast('0' + ((mod / 10) % 10)); + cs[2] = static_cast('0' + (mod % 10)); + cs[3] = '\0'; + head += "10="; + head += cs; + head += SOH; + return head; +} + +std::string field(unsigned tag, std::string_view value) { + return std::to_string(tag) + "=" + std::string(value) + SOH; +} + +void require_same(const NewOrder &a, const NewOrder &b) { + REQUIRE(a.order_id == b.order_id); + REQUIRE(a.symbol == b.symbol); + REQUIRE(a.price == b.price); + REQUIRE(a.quantity == b.quantity); + REQUIRE(a.side == b.side); + REQUIRE(a.type == b.type); + REQUIRE(a.tif == b.tif); +} + +} // namespace + +TEST_CASE("FIX NewOrder encode/decode round-trips", "[fix]") { + const NewOrder in = sample_new_order(); + const std::string msg = fix::encode(in, /*seq=*/7); + + const auto type = fix::peek_msg_type(msg); + REQUIRE(type.ok()); + REQUIRE(type.value == fix::kMsgNewOrderSingle); + + const auto seq = fix::peek_seq(msg); + REQUIRE(seq.ok()); + REQUIRE(seq.value == 7); + + const auto out = fix::decode_new_order(msg); + REQUIRE(out.ok()); + require_same(out.value, in); +} + +TEST_CASE("FIX CancelOrder encode/decode round-trips", "[fix]") { + const CancelOrder in{/*order_id=*/42, /*symbol=*/3}; + const std::string msg = fix::encode(in, /*seq=*/99); + + REQUIRE(fix::peek_msg_type(msg).value == fix::kMsgOrderCancelRequest); + REQUIRE(fix::peek_seq(msg).value == 99); + + const auto out = fix::decode_cancel_order(msg); + REQUIRE(out.ok()); + REQUIRE(out.value.order_id == 42); + REQUIRE(out.value.symbol == 3); +} + +TEST_CASE("FIX and binary codecs decode to identical NewOrder structs", "[fix]") { + // The strong invariant: two independent wire formats, one internal model. + for (const Side side : {Side::Buy, Side::Sell}) { + for (const OrderType type : {OrderType::Limit, OrderType::Market}) { + for (const TimeInForce tif : {TimeInForce::GTC, TimeInForce::IOC}) { + NewOrder in = sample_new_order(); + in.side = side; + in.type = type; + in.tif = tif; + + const std::vector bin = encode(in, /*seq=*/7); + const std::string fixmsg = fix::encode(in, /*seq=*/7); + + const auto bin_out = decode_new_order({bin.data(), bin.size()}); + const auto fix_out = fix::decode_new_order(fixmsg); + REQUIRE(bin_out.ok()); + REQUIRE(fix_out.ok()); + require_same(bin_out.value, fix_out.value); + } + } + } +} + +TEST_CASE("FIX side/ord-type/tif codes map both directions", "[fix]") { + NewOrder in = sample_new_order(); + in.side = Side::Sell; + in.type = OrderType::Market; + in.tif = TimeInForce::IOC; + const std::string msg = fix::encode(in, 1); + // 54=2 (Sell), 40=1 (Market), 59=3 (IOC). + REQUIRE(msg.find(field(54, "2")) != std::string::npos); + REQUIRE(msg.find(field(40, "1")) != std::string::npos); + REQUIRE(msg.find(field(59, "3")) != std::string::npos); + + const auto out = fix::decode_new_order(msg); + REQUIRE(out.ok()); + require_same(out.value, in); +} + +TEST_CASE("FIX deterministic fixture pins the wire format", "[fix]") { + const std::string msg = fix::encode(sample_new_order(), /*seq=*/7); + // Built with explicit SOH so the byte sequence (and the pinned BodyLength 50 + // and CheckSum 164) are unambiguous — a "\x01..." literal would greedily + // swallow the following digits into one hex escape. + const std::string S(1, SOH); + const std::string expected = "8=FIX.4.2" + S + "9=50" + S + "35=D" + S + "34=7" + S + "11=1" + + S + "55=2" + S + "54=1" + S + "38=10" + S + "40=2" + S + + "44=12345" + S + "59=1" + S + "10=164" + S; + REQUIRE(msg == expected); +} + +TEST_CASE("FIX malformed framing rejects deterministically", "[fix]") { + REQUIRE(fix::decode_new_order("").error == fix::FixError::Malformed); + REQUIRE(fix::decode_new_order("not fix at all").error == fix::FixError::Malformed); + // A field with no '=' before its SOH. + REQUIRE(fix::decode_new_order(std::string("8=FIX.4.2") + SOH + "garbage" + SOH).error == + fix::FixError::Malformed); + // A non-numeric tag. + REQUIRE(fix::decode_new_order(std::string("8=FIX.4.2") + SOH + "x=1" + SOH).error == + fix::FixError::Malformed); + // Last field is not the checksum (tag 10). + REQUIRE(fix::decode_new_order(std::string("8=FIX.4.2") + SOH + "9=0" + SOH).error == + fix::FixError::Malformed); +} + +TEST_CASE("FIX oversized message rejects", "[fix]") { + std::string body = field(35, "D"); + body += field(34, "1"); + body += "55="; + body += std::string(fix::kMaxMessageLen, '9'); + body += SOH; + REQUIRE(fix::decode_new_order(wrap(body)).error == fix::FixError::Malformed); +} + +TEST_CASE("FIX unsupported BeginString rejects", "[fix]") { + std::string msg = fix::encode(sample_new_order(), 1); + const auto pos = msg.find("FIX.4.2"); + REQUIRE(pos != std::string::npos); + msg.replace(pos, 7, "FIX.4.4"); // same width keeps BodyLength valid + const auto out = fix::decode_new_order(msg); + REQUIRE(out.error == fix::FixError::UnsupportedBeginString); +} + +TEST_CASE("FIX unknown / wrong message type rejects", "[fix]") { + // An unknown MsgType. + const std::string unknown = wrap(field(35, "X") + field(34, "1")); + REQUIRE(fix::peek_msg_type(unknown).value == 'X'); + REQUIRE(fix::decode_new_order(unknown).error == fix::FixError::UnknownMsgType); + + // A valid NewOrder decoded as a cancel rejects on type. + const std::string neworder = fix::encode(sample_new_order(), 1); + REQUIRE(fix::decode_cancel_order(neworder).error == fix::FixError::UnknownMsgType); +} + +TEST_CASE("FIX body-length mismatch rejects", "[fix]") { + std::string msg = fix::encode(sample_new_order(), 1); + const auto pos = msg.find("9=50"); + REQUIRE(pos != std::string::npos); + msg[pos + 3] = '1'; // declared 50 -> 51, actual body unchanged + REQUIRE(fix::decode_new_order(msg).error == fix::FixError::BodyLengthMismatch); +} + +TEST_CASE("FIX checksum mismatch rejects", "[fix]") { + std::string msg = fix::encode(sample_new_order(), 1); + REQUIRE(msg.size() >= 4); + char &last_digit = msg[msg.size() - 2]; // the final digit before the trailing SOH + last_digit = (last_digit == '9') ? '0' : static_cast(last_digit + 1); + REQUIRE(fix::decode_new_order(msg).error == fix::FixError::ChecksumMismatch); +} + +TEST_CASE("FIX missing required field rejects", "[fix]") { + // A NewOrder body lacking Symbol (tag 55). + std::string body = field(35, "D") + field(34, "1") + field(11, "1") + field(54, "1") + + field(38, "10") + field(40, "2") + field(44, "100") + field(59, "1"); + REQUIRE(fix::decode_new_order(wrap(body)).error == fix::FixError::MissingField); +} + +TEST_CASE("FIX invalid integer field rejects", "[fix]") { + std::string body = field(35, "D") + field(34, "1") + field(11, "1") + field(55, "2") + + field(54, "1") + field(38, "abc") + field(40, "2") + field(44, "100") + + field(59, "1"); + REQUIRE(fix::decode_new_order(wrap(body)).error == fix::FixError::InvalidField); +} + +TEST_CASE("FIX invalid enum code rejects", "[fix]") { + std::string body = field(35, "D") + field(34, "1") + field(11, "1") + field(55, "2") + + field(54, "9") + field(38, "10") + field(40, "2") + field(44, "100") + + field(59, "1"); + REQUIRE(fix::decode_new_order(wrap(body)).error == fix::FixError::InvalidEnumValue); +} + +TEST_CASE("FIX signed price round-trips including int64 extremes", "[fix]") { + for (const Price p : + {Price{-1}, std::numeric_limits::min(), std::numeric_limits::max()}) { + NewOrder in = sample_new_order(); + in.price = p; + const auto out = fix::decode_new_order(fix::encode(in, /*seq=*/5)); + REQUIRE(out.ok()); + REQUIRE(out.value.price == p); + } +} + +TEST_CASE("FIX overflowing a field reports OutOfRange", "[fix]") { + // Symbol (tag 55) is uint32; a value past its max is OutOfRange, not Invalid. + std::string body = field(35, "D") + field(34, "1") + field(11, "1") + field(55, "4294967296") + + field(54, "1") + field(38, "10") + field(40, "2") + field(44, "100") + + field(59, "1"); + REQUIRE(fix::decode_new_order(wrap(body)).error == fix::FixError::OutOfRange); +} + +TEST_CASE("FIX large order id and seq round-trip", "[fix]") { + NewOrder in = sample_new_order(); + in.order_id = std::numeric_limits::max(); + const SeqNo seq = std::numeric_limits::max(); + const std::string msg = fix::encode(in, seq); + REQUIRE(fix::peek_seq(msg).value == seq); + REQUIRE(fix::decode_new_order(msg).value.order_id == in.order_id); +} + +TEST_CASE("FIX errors stringify deterministically", "[fix]") { + using fix::FixError; + using fix::to_string; + REQUIRE(std::string_view{to_string(FixError::None)} == "None"); + REQUIRE(std::string_view{to_string(FixError::Malformed)} == "Malformed"); + REQUIRE(std::string_view{to_string(FixError::UnsupportedBeginString)} == + "UnsupportedBeginString"); + REQUIRE(std::string_view{to_string(FixError::UnknownMsgType)} == "UnknownMsgType"); + REQUIRE(std::string_view{to_string(FixError::MissingField)} == "MissingField"); + REQUIRE(std::string_view{to_string(FixError::InvalidField)} == "InvalidField"); + REQUIRE(std::string_view{to_string(FixError::BodyLengthMismatch)} == "BodyLengthMismatch"); + REQUIRE(std::string_view{to_string(FixError::ChecksumMismatch)} == "ChecksumMismatch"); + REQUIRE(std::string_view{to_string(FixError::InvalidEnumValue)} == "InvalidEnumValue"); + REQUIRE(std::string_view{to_string(FixError::OutOfRange)} == "OutOfRange"); + REQUIRE(std::string_view{to_string(static_cast(255))} == "Unknown"); +} From 872600ad18b65e15259fe382c0cdbe6a7e1b9e2f Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 08:54:49 -0400 Subject: [PATCH 05/22] perf: harden flamegraph collapsed-stack parsing (Codex review) Address two Codex review findings in scripts/flamegraph.py::parse_collapsed: - Prefer a tab separator when present so a tab-separated folded line whose stack contains spaces (C++ signatures) splits on the trailing count instead of an interior space and is silently dropped. - Ignore non-positive sample counts, so hand-crafted --from-collapsed input with 0/negative counts cannot render a misleading SVG (all-non-positive input now fails with exit 1 via the existing empty-folded guard). Adds test coverage in tests/shell/test_flamegraph.sh (19/19). Co-Authored-By: Claude Opus 4.8 --- scripts/flamegraph.py | 17 +++++++++++++---- tests/shell/test_flamegraph.sh | 16 ++++++++++++++++ 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/scripts/flamegraph.py b/scripts/flamegraph.py index 966d0c7..3af5110 100755 --- a/scripts/flamegraph.py +++ b/scripts/flamegraph.py @@ -93,19 +93,28 @@ def flush() -> None: def parse_collapsed(lines) -> dict[str, int]: - """Parse pre-collapsed `stack count` lines.""" + """Parse pre-collapsed `stackcount` lines. + + The canonical folded separator is a space, but a tab is tolerated. Tab is + preferred when present so a stack containing spaces (C++ signatures) still + splits on the trailing count rather than on an interior space. Non-positive + counts are ignored. + """ folded: dict[str, int] = {} for raw in lines: line = raw.rstrip("\n") if not line.strip(): continue - stack, _, count = line.rpartition(" ") - if not stack: - stack, _, count = line.rpartition("\t") + sep = "\t" if "\t" in line else " " + stack, found, count = line.rpartition(sep) + if not found: + continue try: n = int(count) except ValueError: continue + if n <= 0: + continue folded[stack] = folded.get(stack, 0) + n return folded diff --git a/tests/shell/test_flamegraph.sh b/tests/shell/test_flamegraph.sh index 2ba305d..585ba34 100644 --- a/tests/shell/test_flamegraph.sh +++ b/tests/shell/test_flamegraph.sh @@ -122,6 +122,22 @@ ESC_SVG="$(printf 'bench;a&c 3\n' | python3 "$FG" --from-collapsed)" expect_contains "frame names are XML-escaped" '<b>&c' "$ESC_SVG" expect_not_contains "raw unescaped angle bracket is not emitted in a frame title" 'a<b>' "$ESC_SVG" +# --- Collapsed input parsing ------------------------------------------------ + +# A tab-separated stack that itself contains spaces must split on the count, not +# on an interior space. +TAB_COLLAPSED="$(printf 'main;foo(unsigned int)\t7\n' | python3 "$FG" --from-collapsed --collapse-only)" +expect_eq "tab-separated collapsed line keeps its count" \ + 'main;foo(unsigned int) 7' "$TAB_COLLAPSED" + +# Non-positive counts are ignored; a stack with only such counts yields nothing. +NONPOS="$(printf 'a;b 0\nc;d -3\n' | python3 "$FG" --from-collapsed --collapse-only)" +expect_eq "non-positive collapsed counts are dropped" "" "$NONPOS" + +printf 'a;b 0\n' | python3 "$FG" --from-collapsed >/dev/null 2>&1 +rc=$? +expect_eq "all-non-positive collapsed input fails SVG with exit 1" "1" "$rc" + # --- Empty input ------------------------------------------------------------ EMPTY_COLLAPSE="$(printf '' | python3 "$FG" --collapse-only)" From 0201d54e593456add8e08ea881a5b14e36b273fa Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 20:49:32 -0400 Subject: [PATCH 06/22] perf: regenerate flamegraph artifact after parser hardening flamegraph.py is a provenance input, so regenerate results/flamegraph.svg + .txt from the clean tree to keep the Source digest consistent (423 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --- results/flamegraph.svg | 10 ++++----- results/flamegraph.txt | 48 +++++++++++++++++++++--------------------- 2 files changed, 29 insertions(+), 29 deletions(-) diff --git a/results/flamegraph.svg b/results/flamegraph.svg index 2281d5b..fc87dda 100644 --- a/results/flamegraph.svg +++ b/results/flamegraph.svg @@ -2,16 +2,16 @@ <!-- QSL flamegraph provenance Provenance version: 1 - Git commit (informational): 0c3b401 - Source digest: sha256:0d8061b5c92b9a8a1f3bffd14a340e733f28674b14d5716c2eaa6bdb00b31242 + Git commit (informational): 872600a + Source digest: sha256:211e5835552616102fbe44d8f10dfa7cb6a4b35495dca98243bc87d37c45cfb0 Source digest scope: flamegraph-benchmark Dirty inputs: no Generated output: results/flamegraph.svg - Date: 2026-06-21T06:36:51Z + Date: 2026-06-21T12:54:50Z Command: make flamegraph Artifact: flamegraph (cpu-clock software sampling hot-symbol profile) Record: perf record [call-graph dwarf | -F 4000 | -g | -e cpu-clock] - Samples: 397 | Folded stacks: 171 + Samples: 423 | Folded stacks: 163 Caveat: software cpu-clock sampling shows on-CPU time by symbol; it is not a latency or throughput measurement and is hardware/build dependent. --> @@ -28,4 +28,4 @@ function qslSearch(){ else{r.classList.remove('hl');}} if(detail)detail.textContent='Search: '+term; } -]]></script><rect width="1200" height="310" fill="#f8f8f8"/><text x="600" y="24" text-anchor="middle" font-size="17" font-weight="bold">QSL Matching-Engine Flame Graph (qsl-bench)</text><text x="600" y="40" text-anchor="middle" fill="#555">flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 397 samples | 171 stacks | 2026-06-21T06:36:51Z</text><text id="qsl-search" x="1190" y="24" text-anchor="end" fill="#990000" onclick="qslSearch()" style="cursor:pointer">Search</text><text id="qsl-detail" x="10" y="306" fill="#333"> </text><g class="func" data-name="all"><title>all (397 cpu-clock samples, 100.00%)allqsl-bench (397 cpu-clock samples, 100.00%)qsl-bench[unknown] (300 cpu-clock samples, 75.57%)[unknown][unknown] (278 cpu-clock samples, 70.03%)[unknown][unknown] (221 cpu-clock samples, 55.67%)[unknown][unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (1 cpu-clock samples, 0.25%)check_match (1 cpu-clock samples, 0.25%)do_lookup_x (2 cpu-clock samples, 0.50%)__libc_start_call_main (218 cpu-clock samples, 54.91%)__libc_start_call_mainmain (218 cpu-clock samples, 54.91%)mainqsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (30 cpu-clock samples, 7.56%)qsl::engi..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (4 cpu-clock samples, 1.01%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (21 cpu-clock samples, 5.29%)qsl::e..operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (13 cpu-clock samples, 3.27%)qs..std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (10 cpu-clock samples, 2.52%)s..std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (5 cpu-clock samples, 1.26%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (5 cpu-clock samples, 1.26%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (2 cpu-clock samples, 0.50%)std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const (1 cpu-clock samples, 0.25%)std::pmr::(anonymous namespace)::newdel_res_t::do_allocate(unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::cancel(unsigned long) (23 cpu-clock samples, 5.79%)qsl::e..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (22 cpu-clock samples, 5.54%)declty..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (13 cpu-clock samples, 3.27%)qs..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.76%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (5 cpu-clock samples, 1.26%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (7 cpu-clock samples, 1.76%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (3 cpu-clock samples, 0.76%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (56 cpu-clock samples, 14.11%)qsl::gateway::Sessio..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (56 cpu-clock samples, 14.11%)qsl::gateway::Sessio..__memcpy_generic (1 cpu-clock samples, 0.25%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (46 cpu-clock samples, 11.59%)qsl::gateway::Se..qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (10 cpu-clock samples, 2.52%)q..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (3 cpu-clock samples, 0.76%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::Fill const&) (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (28 cpu-clock samples, 7.05%)qsl::gate..qsl::engine::MatchingEngine::can_store_limit(unsigned int, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (2 cpu-clock samples, 0.50%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (18 cpu-clock samples, 4.53%)qsl:..operator new(unsigned long) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (9 cpu-clock samples, 2.27%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (2 cpu-clock samples, 0.50%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (3 cpu-clock samples, 0.76%)qsl::engine::check_limit(qsl::engine::RiskConfig const&, qsl::core::Side, long, unsigned int) (2 cpu-clock samples, 0.50%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (6 cpu-clock samples, 1.51%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (5 cpu-clock samples, 1.26%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (15 cpu-clock samples, 3.78%)qsl..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (4 cpu-clock samples, 1.01%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (32 cpu-clock samples, 8.06%)qsl::repla..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::cancel(unsigned long) (3 cpu-clock samples, 0.76%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (4 cpu-clock samples, 1.01%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (18 cpu-clock samples, 4.53%)qsl:..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.27%)qs..__memcpy_generic (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 1.01%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (6 cpu-clock samples, 1.51%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 1.01%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (3 cpu-clock samples, 0.76%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.25%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::contains(unsigned long) const (3 cpu-clock samples, 0.76%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (16 cpu-clock samples, 4.03%)qsl..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::contains(unsigned long) const (5 cpu-clock samples, 1.26%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (5 cpu-clock samples, 1.26%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.25%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (32 cpu-clock samples, 8.06%)qsl::repla..qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (30 cpu-clock samples, 7.56%)qsl::repl..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::cancel(unsigned long) (3 cpu-clock samples, 0.76%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.76%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (2 cpu-clock samples, 0.50%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.50%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (18 cpu-clock samples, 4.53%)qsl:..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (8 cpu-clock samples, 2.02%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.50%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (4 cpu-clock samples, 1.01%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (1 cpu-clock samples, 0.25%)operator new(unsigned long) (4 cpu-clock samples, 1.01%)malloc@plt (4 cpu-clock samples, 1.01%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (14 cpu-clock samples, 3.53%)qsl..[unknown] (14 cpu-clock samples, 3.53%)[un..[unknown] (14 cpu-clock samples, 3.53%)[un..[unknown] (9 cpu-clock samples, 2.27%)[unknown] (2 cpu-clock samples, 0.50%)_mid_memalign (2 cpu-clock samples, 0.50%)__posix_memalign (7 cpu-clock samples, 1.76%)malloc (5 cpu-clock samples, 1.26%)operator new(unsigned long, std::align_val_t) (5 cpu-clock samples, 1.26%)__posix_memalign (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (11 cpu-clock samples, 2.77%)q..[unknown] (9 cpu-clock samples, 2.27%)[unknown] (9 cpu-clock samples, 2.27%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (3 cpu-clock samples, 0.76%)_mid_memalign (3 cpu-clock samples, 0.76%)__posix_memalign (2 cpu-clock samples, 0.50%)malloc (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t) (4 cpu-clock samples, 1.01%)__posix_memalign (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (13 cpu-clock samples, 3.27%)qs..[unknown] (11 cpu-clock samples, 2.77%)[..[unknown] (11 cpu-clock samples, 2.77%)[..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (9 cpu-clock samples, 2.27%)malloc (5 cpu-clock samples, 1.26%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (14 cpu-clock samples, 3.53%)qsl..[unknown] (14 cpu-clock samples, 3.53%)[un..[unknown] (14 cpu-clock samples, 3.53%)[un..cfree@GLIBC_2.17 (7 cpu-clock samples, 1.76%)operator new(unsigned long) (7 cpu-clock samples, 1.76%)malloc (5 cpu-clock samples, 1.26%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)malloc@plt (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)posix_memalign@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (1 cpu-clock samples, 0.25%)_mid_memalign (1 cpu-clock samples, 0.25%)__posix_memalign (1 cpu-clock samples, 0.25%)malloc (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)__posix_memalign (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)operator new(unsigned long) (5 cpu-clock samples, 1.26%)malloc (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)_mid_memalign (4 cpu-clock samples, 1.01%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (4 cpu-clock samples, 1.01%)[unknown] (4 cpu-clock samples, 1.01%)[unknown] (4 cpu-clock samples, 1.01%)cfree@GLIBC_2.17 (4 cpu-clock samples, 1.01%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t)@plt (2 cpu-clock samples, 0.50%)__libc_start_call_main (7 cpu-clock samples, 1.76%)[unknown] (7 cpu-clock samples, 1.76%)[unknown] (7 cpu-clock samples, 1.76%)cfree@GLIBC_2.17 (7 cpu-clock samples, 1.76%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (6 cpu-clock samples, 1.51%)[unknown] (6 cpu-clock samples, 1.51%)[unknown] (6 cpu-clock samples, 1.51%)cfree@GLIBC_2.17 (6 cpu-clock samples, 1.51%)main (16 cpu-clock samples, 4.03%)main[unknown] (11 cpu-clock samples, 2.77%)[..[unknown] (11 cpu-clock samples, 2.77%)[..operator new(unsigned long) (11 cpu-clock samples, 2.77%)o..malloc (9 cpu-clock samples, 2.27%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (4 cpu-clock samples, 1.01%)operator new(unsigned long) (5 cpu-clock samples, 1.26%)malloc@plt (5 cpu-clock samples, 1.26%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (6 cpu-clock samples, 1.51%)[unknown] (4 cpu-clock samples, 1.01%)[unknown] (4 cpu-clock samples, 1.01%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)malloc (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (2 cpu-clock samples, 0.50%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (14 cpu-clock samples, 3.53%)qsl..[unknown] (10 cpu-clock samples, 2.52%)[..[unknown] (10 cpu-clock samples, 2.52%)[..[unknown] (5 cpu-clock samples, 1.26%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)_mid_memalign (2 cpu-clock samples, 0.50%)__posix_memalign (2 cpu-clock samples, 0.50%)malloc (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t) (5 cpu-clock samples, 1.26%)__posix_memalign (4 cpu-clock samples, 1.01%)memcpy@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (17 cpu-clock samples, 4.28%)qsl:..free@plt (2 cpu-clock samples, 0.50%)operator delete(void*, std::align_val_t)@plt (6 cpu-clock samples, 1.51%)operator delete(void*, unsigned long, std::align_val_t)@plt (6 cpu-clock samples, 1.51%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)@plt (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook()@plt (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (2 cpu-clock samples, 0.50%)free@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (3 cpu-clock samples, 0.76%)operator new(unsigned long)@plt (3 cpu-clock samples, 0.76%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)malloc (2 cpu-clock samples, 0.50%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (3 cpu-clock samples, 0.76%)memcpy@plt (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (6 cpu-clock samples, 1.51%)free@plt (2 cpu-clock samples, 0.50%)operator delete(void*, unsigned long, std::align_val_t)@plt (4 cpu-clock samples, 1.01%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (5 cpu-clock samples, 1.26%)operator new(unsigned long, std::align_val_t)@plt (5 cpu-clock samples, 1.26%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)operator delete(void*, std::align_val_t)@plt (1 cpu-clock samples, 0.25%) +]]>QSL Matching-Engine Flame Graph (qsl-bench)flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 423 samples | 163 stacks | 2026-06-21T12:54:50ZSearch all (423 cpu-clock samples, 100.00%)allqsl-bench (423 cpu-clock samples, 100.00%)qsl-bench[unknown] (343 cpu-clock samples, 81.09%)[unknown][unknown] (324 cpu-clock samples, 76.60%)[unknown][unknown] (286 cpu-clock samples, 67.61%)[unknown]__libc_start_call_main (286 cpu-clock samples, 67.61%)__libc_start_call_mainmain (286 cpu-clock samples, 67.61%)maincfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (47 cpu-clock samples, 11.11%)qsl::engine::Or..qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (6 cpu-clock samples, 1.42%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (33 cpu-clock samples, 7.80%)qsl::engin..qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (14 cpu-clock samples, 3.31%)qs..std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (11 cpu-clock samples, 2.60%)s..std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (9 cpu-clock samples, 2.13%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (5 cpu-clock samples, 1.18%)std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const (2 cpu-clock samples, 0.47%)std::pmr::(anonymous namespace)::newdel_res_t::do_allocate(unsigned long, unsigned long) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::cancel(unsigned long) (30 cpu-clock samples, 7.09%)qsl::engi..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (30 cpu-clock samples, 7.09%)decltype(..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (17 cpu-clock samples, 4.02%)qsl..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.47%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (3 cpu-clock samples, 0.71%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (23 cpu-clock samples, 5.44%)qsl::e..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (67 cpu-clock samples, 15.84%)qsl::gateway::Session::..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (67 cpu-clock samples, 15.84%)qsl::gateway::Session::..__memcpy_generic (1 cpu-clock samples, 0.24%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (61 cpu-clock samples, 14.42%)qsl::gateway::Session..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (15 cpu-clock samples, 3.55%)qsl..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.47%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (7 cpu-clock samples, 1.65%)__memcpy_generic (1 cpu-clock samples, 0.24%)operator new(unsigned long) (2 cpu-clock samples, 0.47%)qsl::protocol::encode(qsl::protocol::Fill const&) (2 cpu-clock samples, 0.47%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (36 cpu-clock samples, 8.51%)qsl::gatewa..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (7 cpu-clock samples, 1.65%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (24 cpu-clock samples, 5.67%)qsl::e..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (3 cpu-clock samples, 0.71%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (12 cpu-clock samples, 2.84%)q..__memcpy_generic (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (2 cpu-clock samples, 0.47%)operator new(unsigned long) (2 cpu-clock samples, 0.47%)malloc (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::check_limit(qsl::engine::RiskConfig const&, qsl::core::Side, long, unsigned int) (3 cpu-clock samples, 0.71%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (5 cpu-clock samples, 1.18%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.71%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (2 cpu-clock samples, 0.47%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (28 cpu-clock samples, 6.62%)qsl::pro..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (5 cpu-clock samples, 1.18%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (26 cpu-clock samples, 6.15%)qsl::re..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::cancel(unsigned long) (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (17 cpu-clock samples, 4.02%)qsl..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (10 cpu-clock samples, 2.36%)q..qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.71%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (7 cpu-clock samples, 1.65%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (6 cpu-clock samples, 1.42%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (4 cpu-clock samples, 0.95%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.47%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)__memcpy_generic (1 cpu-clock samples, 0.24%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (17 cpu-clock samples, 4.02%)qsl..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (6 cpu-clock samples, 1.42%)qsl::engine::OrderBook::contains(unsigned long) const (6 cpu-clock samples, 1.42%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (3 cpu-clock samples, 0.71%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.47%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (36 cpu-clock samples, 8.51%)qsl::replay..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (34 cpu-clock samples, 8.04%)qsl::repla..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (6 cpu-clock samples, 1.42%)qsl::engine::OrderBook::cancel(unsigned long) (5 cpu-clock samples, 1.18%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.47%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (8 cpu-clock samples, 1.89%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (6 cpu-clock samples, 1.42%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (5 cpu-clock samples, 1.18%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (15 cpu-clock samples, 3.55%)qsl..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.07%)qs..__memcpy_generic (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (10 cpu-clock samples, 2.36%)q..qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (8 cpu-clock samples, 1.89%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (6 cpu-clock samples, 1.42%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 0.95%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (10 cpu-clock samples, 2.36%)q..[unknown] (10 cpu-clock samples, 2.36%)[..[unknown] (10 cpu-clock samples, 2.36%)[..[unknown] (6 cpu-clock samples, 1.42%)[unknown] (4 cpu-clock samples, 0.95%)_mid_memalign (4 cpu-clock samples, 0.95%)__posix_memalign (2 cpu-clock samples, 0.47%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (4 cpu-clock samples, 0.95%)__posix_memalign (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (6 cpu-clock samples, 1.42%)[unknown] (6 cpu-clock samples, 1.42%)[unknown] (6 cpu-clock samples, 1.42%)[unknown] (5 cpu-clock samples, 1.18%)[unknown] (2 cpu-clock samples, 0.47%)_mid_memalign (2 cpu-clock samples, 0.47%)__posix_memalign (3 cpu-clock samples, 0.71%)malloc (2 cpu-clock samples, 0.47%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (10 cpu-clock samples, 2.36%)q..[unknown] (5 cpu-clock samples, 1.18%)[unknown] (5 cpu-clock samples, 1.18%)operator new(unsigned long) (5 cpu-clock samples, 1.18%)malloc (3 cpu-clock samples, 0.71%)free@plt (2 cpu-clock samples, 0.47%)operator delete(void*)@plt (2 cpu-clock samples, 0.47%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (11 cpu-clock samples, 2.60%)q..[unknown] (11 cpu-clock samples, 2.60%)[..[unknown] (11 cpu-clock samples, 2.60%)[..cfree@GLIBC_2.17 (8 cpu-clock samples, 1.89%)operator new(unsigned long) (3 cpu-clock samples, 0.71%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.71%)[unknown] (3 cpu-clock samples, 0.71%)[unknown] (3 cpu-clock samples, 0.71%)[unknown] (2 cpu-clock samples, 0.47%)[unknown] (1 cpu-clock samples, 0.24%)_mid_memalign (1 cpu-clock samples, 0.24%)__posix_memalign (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)[unknown] (2 cpu-clock samples, 0.47%)[unknown] (2 cpu-clock samples, 0.47%)operator new(unsigned long) (2 cpu-clock samples, 0.47%)malloc (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (7 cpu-clock samples, 1.65%)[unknown] (5 cpu-clock samples, 1.18%)[unknown] (5 cpu-clock samples, 1.18%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (1 cpu-clock samples, 0.24%)_mid_memalign (1 cpu-clock samples, 0.24%)__posix_memalign (3 cpu-clock samples, 0.71%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)__posix_memalign (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)cfree@GLIBC_2.17 (4 cpu-clock samples, 0.95%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)__libc_start_call_main (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)cfree@GLIBC_2.17 (7 cpu-clock samples, 1.65%)_start (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (3 cpu-clock samples, 0.71%)[unknown] (1 cpu-clock samples, 0.24%)do_lookup_x (1 cpu-clock samples, 0.24%)dl_relocate_ld (1 cpu-clock samples, 0.24%)_dl_lookup_symbol_x (2 cpu-clock samples, 0.47%)_dl_new_hash (1 cpu-clock samples, 0.24%)_dl_relocate_object_no_relro (1 cpu-clock samples, 0.24%)elf_dynamic_do_Rela (1 cpu-clock samples, 0.24%)elf_machine_rela (1 cpu-clock samples, 0.24%)resolve_map (1 cpu-clock samples, 0.24%)dl_symbol_visibility_binds_local_p (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (19 cpu-clock samples, 4.49%)decl..[unknown] (19 cpu-clock samples, 4.49%)[unk..[unknown] (19 cpu-clock samples, 4.49%)[unk..cfree@GLIBC_2.17 (19 cpu-clock samples, 4.49%)cfre..main (5 cpu-clock samples, 1.18%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (3 cpu-clock samples, 0.71%)malloc (3 cpu-clock samples, 0.71%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator new(unsigned long) (5 cpu-clock samples, 1.18%)malloc@plt (5 cpu-clock samples, 1.18%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (5 cpu-clock samples, 1.18%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (2 cpu-clock samples, 0.47%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (10 cpu-clock samples, 2.36%)q..[unknown] (9 cpu-clock samples, 2.13%)[unknown] (9 cpu-clock samples, 2.13%)[unknown] (6 cpu-clock samples, 1.42%)[unknown] (2 cpu-clock samples, 0.47%)_mid_memalign (2 cpu-clock samples, 0.47%)__posix_memalign (4 cpu-clock samples, 0.95%)malloc (3 cpu-clock samples, 0.71%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.47%)free@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.71%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_unhook()@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (5 cpu-clock samples, 1.18%)free@plt (1 cpu-clock samples, 0.24%)memcpy@plt (2 cpu-clock samples, 0.47%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (5 cpu-clock samples, 1.18%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)cfree@GLIBC_2.17 (4 cpu-clock samples, 0.95%)memcpy@plt (1 cpu-clock samples, 0.24%)qsl::protocol::encode(qsl::protocol::Ack const&) (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.47%)operator new(unsigned long) (5 cpu-clock samples, 1.18%)malloc (5 cpu-clock samples, 1.18%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (2 cpu-clock samples, 0.47%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%) diff --git a/results/flamegraph.txt b/results/flamegraph.txt index 4560ad7..0cbec7f 100644 --- a/results/flamegraph.txt +++ b/results/flamegraph.txt @@ -8,19 +8,19 @@ Perf: perf version 6.19.14-400.asahi.fc44.aarch64 Perf paranoid: 2 Build type: Release Provenance version: 1 -Git commit (informational): 0c3b401 -Source digest: sha256:0d8061b5c92b9a8a1f3bffd14a340e733f28674b14d5716c2eaa6bdb00b31242 +Git commit (informational): 872600a +Source digest: sha256:211e5835552616102fbe44d8f10dfa7cb6a4b35495dca98243bc87d37c45cfb0 Source digest scope: flamegraph-benchmark Dirty inputs: no Generated output: results/flamegraph.svg -Date: 2026-06-21T06:36:51Z +Date: 2026-06-21T12:54:50Z Benchmark binary: build/bench/qsl-bench Dataset: qsl-bench default synthetic benchmark suite Call graph: dwarf Record event: cpu-clock Sample freq: 4000 Hz -Sample count: 397 -Folded stacks: 171 +Sample count: 423 +Folded stacks: 163 Minimum samples for hot profile: 200 Insufficient samples: no Record status: 0 @@ -34,25 +34,25 @@ investigation. Frame width is proportional to on-CPU samples, not wall-clock latency or throughput, and is hardware/kernel/compiler/build dependent. Top 15 folded stacks (count stack): - 15 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) - 9 qsl-bench;main;[unknown];[unknown];operator new(unsigned long);malloc + 28 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) + 23 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) + 19 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 + 14 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) + 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long);std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&) + 10 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] + 9 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) + 8 qsl-bench;[unknown];[unknown];qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];cfree@GLIBC_2.17 + 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) 7 qsl-bench;__libc_start_call_main;[unknown];[unknown];cfree@GLIBC_2.17 - 7 qsl-bench;[unknown];[unknown];qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];cfree@GLIBC_2.17 - 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main - 6 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 - 6 qsl-bench;qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);operator delete(void*, std::align_val_t)@plt - 6 qsl-bench;qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);operator delete(void*, unsigned long, std::align_val_t)@plt - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::protocol::decode_new_order(std::span) - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&);qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - 5 qsl-bench;operator new(unsigned long);malloc@plt - 5 qsl-bench;std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&);operator new(unsigned long, std::align_val_t)@plt - 5 qsl-bench;[unknown];qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&);[unknown];[unknown];operator new(unsigned long);malloc - 5 qsl-bench;[unknown];[unknown];qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long);[unknown];[unknown];[unknown];__posix_memalign;malloc - 5 qsl-bench;[unknown];[unknown];qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long);[unknown];[unknown];operator new(unsigned long);malloc + 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::has_symbol(unsigned int) const + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long);qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const Benchmark output: -order_book add/mod/cancel 200000 ops 151.3 ns/op 6607667 ops/sec -protocol encode+decode 500000 ops 21.8 ns/op 45829279 ops/sec -gateway session (fill) 200000 ops 132.3 ns/op 7556487 ops/sec -matching engine flow 5004 items 104.7 ns/item 9553139 items/sec -replay command log 5004 items 115.1 ns/item 8690129 items/sec +order_book add/mod/cancel 200000 ops 140.7 ns/op 7107229 ops/sec +protocol encode+decode 500000 ops 21.0 ns/op 47719996 ops/sec +gateway session (fill) 200000 ops 129.6 ns/op 7715309 ops/sec +matching engine flow 5004 items 102.3 ns/item 9773521 items/sec +replay command log 5004 items 111.8 ns/item 8946368 items/sec From 3e4c8e3cc7b34d200d62903b126e603a3f8215b8 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 20:51:59 -0400 Subject: [PATCH 07/22] fix: enforce FIX-required ClOrdID on OrderCancelRequest (Codex review) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit decode_cancel_order validated only OrigClOrdID (41) and Symbol (55), so a 35=F message missing ClOrdID (tag 11) — which FIX requires and encode() emits — was accepted. Validate tag 11 (present and numeric) on decode without storing it, keeping decode symmetric with encode. Adds a rejection test and clarifies the docs/fix_protocol.md note for tag 11. make check 261/261; make asan 261/261. Co-Authored-By: Claude Opus 4.8 --- docs/fix_protocol.md | 2 +- src/protocol/fix.cpp | 7 +++++++ tests/unit/test_fix_protocol.cpp | 6 ++++++ 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/docs/fix_protocol.md b/docs/fix_protocol.md index 5ecbf14..d25c8ce 100644 --- a/docs/fix_protocol.md +++ b/docs/fix_protocol.md @@ -53,7 +53,7 @@ adapter uses the standard FIX envelope: |-----|--------------|----------------|-------| | 34 | MsgSeqNum | sequence no. | as above | | 41 | OrigClOrdID | `order_id` | the order being cancelled | -| 11 | ClOrdID | — | required by FIX; echoes `order_id` (the cancel request id is not modelled) | +| 11 | ClOrdID | — | required by FIX; validated on decode, echoes `order_id` on encode (no separate cancel-request id is modelled) | | 55 | Symbol | `symbol` | decimal `SymbolId` | ## Deliberate simplifications diff --git a/src/protocol/fix.cpp b/src/protocol/fix.cpp index b97e4a8..b63de16 100644 --- a/src/protocol/fix.cpp +++ b/src/protocol/fix.cpp @@ -336,6 +336,13 @@ FixDecodeResult decode_cancel_order(std::string_view msg) noexcept if (const FixError e = require_int(p, kTagOrigClOrdID, out.order_id); e != FixError::None) { return {e, {}}; } + // ClOrdID (tag 11) is required by FIX on an OrderCancelRequest. CancelOrder + // does not model a separate cancel-request id, so it is validated (present + // and numeric) but not stored — keeping decode symmetric with encode. + OrderId clord_id = 0; + if (const FixError e = require_int(p, kTagClOrdID, clord_id); e != FixError::None) { + return {e, {}}; + } if (const FixError e = require_int(p, kTagSymbol, out.symbol); e != FixError::None) { return {e, {}}; } diff --git a/tests/unit/test_fix_protocol.cpp b/tests/unit/test_fix_protocol.cpp index 4b9194d..489e5b4 100644 --- a/tests/unit/test_fix_protocol.cpp +++ b/tests/unit/test_fix_protocol.cpp @@ -208,6 +208,12 @@ TEST_CASE("FIX missing required field rejects", "[fix]") { REQUIRE(fix::decode_new_order(wrap(body)).error == fix::FixError::MissingField); } +TEST_CASE("FIX cancel without required ClOrdID rejects", "[fix]") { + // OrderCancelRequest lacking ClOrdID (tag 11), which FIX requires. + std::string body = field(35, "F") + field(34, "1") + field(41, "42") + field(55, "3"); + REQUIRE(fix::decode_cancel_order(wrap(body)).error == fix::FixError::MissingField); +} + TEST_CASE("FIX invalid integer field rejects", "[fix]") { std::string body = field(35, "D") + field(34, "1") + field(11, "1") + field(55, "2") + field(54, "1") + field(38, "abc") + field(40, "2") + field(44, "100") + From 52de5b8cbaed5fb79ac0a14304d6e0b4070a0dd4 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 21:13:08 -0400 Subject: [PATCH 08/22] refactor: improve flamegraph.py code health (CodeScene gate) CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested complexity). Restructure without changing output: - fold_perf_script: move per-line state into a small _Folder helper so the parsing loop is a flat if/elif/else instead of a nested block. - render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was 7) and extract _append_chrome, _frame_svg, _truncate; geometry constants (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass carries derived geometry. Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 --- scripts/flamegraph.py | 228 ++++++++++++++++++++++++++---------------- 1 file changed, 142 insertions(+), 86 deletions(-) diff --git a/scripts/flamegraph.py b/scripts/flamegraph.py index 3af5110..96accb8 100755 --- a/scripts/flamegraph.py +++ b/scripts/flamegraph.py @@ -28,6 +28,12 @@ import re import sys import zlib +from dataclasses import dataclass + +# SVG layout constants (pixels). +_SIDE = 10 # left/right margin +_PAD_TOP = 54 # space above the frames for title/subtitle +_PAD_BOTTOM = 16 # space below the frames for the detail line # perf-script stack frame line: leading whitespace, hex address, symbol, "(dso)". # C++ symbols contain spaces and parentheses, so the dso is taken as the final @@ -58,38 +64,59 @@ def _clean_symbol(rest: str) -> str: return rest -def fold_perf_script(lines) -> dict[str, int]: - """Collapse `perf script` output into {stack_string: sample_count}.""" - folded: dict[str, int] = {} - comm = "" - stack: list[str] = [] - - def flush() -> None: - nonlocal stack, comm - if stack: - frames = list(reversed(stack)) - if comm: - frames.insert(0, comm) +class _Folder: + """Accumulates `perf script` samples into collapsed {stack: count} pairs. + + Keeping the per-line state transitions as small methods keeps the parsing + loop flat (one if/elif/else) instead of a deeply nested block. + """ + + def __init__(self) -> None: + self.folded: dict[str, int] = {} + self._comm = "" + self._stack: list[str] = [] + + def start_sample(self, header: str) -> None: + # Header line: "comm pid timestamp: period event:". Finalize any prior + # sample (perf usually separates with a blank line, but not always). + self._flush() + self._comm = header.split()[0] + + def add_frame(self, line: str) -> None: + m = _FRAME_RE.match(line) + if m: + self._stack.append(_clean_symbol(m.group("rest"))) + + def end_sample(self) -> None: + self._flush() + self._comm = "" + + def _flush(self) -> None: + if self._stack: + frames = list(reversed(self._stack)) # perf prints leaf-first + if self._comm: + frames.insert(0, self._comm) key = ";".join(frames) - folded[key] = folded.get(key, 0) + 1 - stack = [] + self.folded[key] = self.folded.get(key, 0) + 1 + self._stack = [] + + def result(self) -> dict[str, int]: + self._flush() + return self.folded + +def fold_perf_script(lines) -> dict[str, int]: + """Collapse `perf script` output into {stack_string: sample_count}.""" + folder = _Folder() for raw in lines: line = raw.rstrip("\n") if not line.strip(): - flush() - comm = "" - continue - if line[0].isspace(): - m = _FRAME_RE.match(line) - if m: - stack.append(_clean_symbol(m.group("rest"))) - continue - # Header line: "comm pid timestamp: period event:" -> capture comm. - flush() - comm = line.split()[0] - flush() - return folded + folder.end_sample() + elif line[0].isspace(): + folder.add_frame(line) + else: + folder.start_sample(line) + return folder.result() def parse_collapsed(lines) -> dict[str, int]: @@ -163,31 +190,34 @@ def _layout(node: _Node, depth: int, x: int, total: int, out: list) -> None: cursor += child.value -def render_svg( - root: _Node, - *, - title: str, - subtitle: str, - width: int = 1200, - frame_height: int = 16, - min_px: float = 0.1, - countname: str = "samples", -) -> str: - total = root.value or 1 - placed: list = [] - _layout(root, 0, 0, total, placed) - max_depth = max((d for _, d, _ in placed), default=0) +@dataclass +class FlameOptions: + """Styling/labelling knobs for an SVG render.""" - pad_top = 54 - pad_bottom = 16 - side = 10 - plot_width = width - 2 * side - height = pad_top + (max_depth + 1) * frame_height + pad_bottom + title: str = "QSL Flame Graph" + subtitle: str = "" + countname: str = "samples" + width: int = 1200 + frame_height: int = 16 + min_px: float = 0.1 - def px(samples: int) -> float: - return samples / total * plot_width - parts: list[str] = [] +@dataclass +class _Canvas: + """Derived geometry passed to per-frame rendering.""" + + total: int + max_depth: int + height: int + plot_width: int + frame_height: int + min_px: float + countname: str + + +def _append_chrome(parts: list, opts: FlameOptions, height: int) -> None: + """Append the static page furniture: SVG root, style, title, controls.""" + width = opts.width parts.append( f'\n' f' float: parts.append(f'') parts.append( f'{html.escape(title)}' + f'font-size="17" font-weight="bold">{html.escape(opts.title)}' ) parts.append( f'' - f'{html.escape(subtitle)}' + f'{html.escape(opts.subtitle)}' ) parts.append( - f'Search' ) parts.append( - f' ' + f' ' ) - for node, depth, x in placed: - w = px(node.value) - if w < min_px: - continue - x_px = side + px(x) - y = pad_top + (max_depth - depth) * frame_height - pct = node.value / total * 100.0 - label = node.name - # Approx 7px per char at this font; reserve 6px padding. - maxchars = int((w - 6) / 7) - text = "" - if maxchars >= 3: - text = label if len(label) <= maxchars else label[: maxchars - 2] + ".." - tip = f"{label} ({node.value} {countname}, {pct:.2f}%)" - parts.append(f'') - parts.append(f"{html.escape(tip)}") - parts.append( - f'' + +def _truncate(label: str, width_px: float) -> str: + """Fit a label into a frame, ~7px/char with 6px padding (else nothing).""" + maxchars = int((width_px - 6) / 7) + if maxchars < 3: + return "" + return label if len(label) <= maxchars else label[: maxchars - 2] + ".." + + +def _frame_svg(c: _Canvas, node: _Node, depth: int, x: int) -> str: + """Render one frame's group, or "" when narrower than the cutoff.""" + w = node.value / c.total * c.plot_width + if w < c.min_px: + return "" + x_px = _SIDE + x / c.total * c.plot_width + y = _PAD_TOP + (c.max_depth - depth) * c.frame_height + pct = node.value / c.total * 100.0 + tip = f"{node.name} ({node.value} {c.countname}, {pct:.2f}%)" + out = [ + f'', + f"{html.escape(tip)}", + f'', + ] + text = _truncate(node.name, w) + if text: + out.append( + f'{html.escape(text)}' ) - if text: - parts.append( - f'{html.escape(text)}' - ) - parts.append("") + out.append("") + return "".join(out) + +def render_svg(root: _Node, opts: FlameOptions | None = None) -> str: + opts = opts or FlameOptions() + total = root.value or 1 + placed: list = [] + _layout(root, 0, 0, total, placed) + max_depth = max((d for _, d, _ in placed), default=0) + height = _PAD_TOP + (max_depth + 1) * opts.frame_height + _PAD_BOTTOM + canvas = _Canvas( + total=total, + max_depth=max_depth, + height=height, + plot_width=opts.width - 2 * _SIDE, + frame_height=opts.frame_height, + min_px=opts.min_px, + countname=opts.countname, + ) + + parts: list[str] = [] + _append_chrome(parts, opts, height) + for node, depth, x in placed: + parts.append(_frame_svg(canvas, node, depth, x)) parts.append("\n") return "".join(parts) @@ -299,15 +357,13 @@ def main(argv=None) -> int: return 1 root = build_tree(folded, args.root_name) - sys.stdout.write( - render_svg( - root, - title=args.title, - subtitle=args.subtitle, - width=args.width, - countname=args.countname, - ) + opts = FlameOptions( + title=args.title, + subtitle=args.subtitle, + countname=args.countname, + width=args.width, ) + sys.stdout.write(render_svg(root, opts)) return 0 From d4be2daf640d442415ed3b315ce7a0a564cdccdd Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 21:16:15 -0400 Subject: [PATCH 09/22] perf: regenerate flamegraph artifact after code-health refactor flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt from the clean tree (402 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 --- results/flamegraph.svg | 12 +++++------ results/flamegraph.txt | 48 +++++++++++++++++++++--------------------- 2 files changed, 30 insertions(+), 30 deletions(-) diff --git a/results/flamegraph.svg b/results/flamegraph.svg index fc87dda..7882ae3 100644 --- a/results/flamegraph.svg +++ b/results/flamegraph.svg @@ -2,20 +2,20 @@ -QSL Matching-Engine Flame Graph (qsl-bench)flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 423 samples | 163 stacks | 2026-06-21T12:54:50ZSearch all (423 cpu-clock samples, 100.00%)allqsl-bench (423 cpu-clock samples, 100.00%)qsl-bench[unknown] (343 cpu-clock samples, 81.09%)[unknown][unknown] (324 cpu-clock samples, 76.60%)[unknown][unknown] (286 cpu-clock samples, 67.61%)[unknown]__libc_start_call_main (286 cpu-clock samples, 67.61%)__libc_start_call_mainmain (286 cpu-clock samples, 67.61%)maincfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (47 cpu-clock samples, 11.11%)qsl::engine::Or..qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (6 cpu-clock samples, 1.42%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (33 cpu-clock samples, 7.80%)qsl::engin..qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (14 cpu-clock samples, 3.31%)qs..std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (11 cpu-clock samples, 2.60%)s..std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (9 cpu-clock samples, 2.13%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (5 cpu-clock samples, 1.18%)std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const (2 cpu-clock samples, 0.47%)std::pmr::(anonymous namespace)::newdel_res_t::do_allocate(unsigned long, unsigned long) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::cancel(unsigned long) (30 cpu-clock samples, 7.09%)qsl::engi..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (30 cpu-clock samples, 7.09%)decltype(..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (17 cpu-clock samples, 4.02%)qsl..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.47%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (3 cpu-clock samples, 0.71%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (23 cpu-clock samples, 5.44%)qsl::e..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (67 cpu-clock samples, 15.84%)qsl::gateway::Session::..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (67 cpu-clock samples, 15.84%)qsl::gateway::Session::..__memcpy_generic (1 cpu-clock samples, 0.24%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (61 cpu-clock samples, 14.42%)qsl::gateway::Session..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (15 cpu-clock samples, 3.55%)qsl..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.47%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (7 cpu-clock samples, 1.65%)__memcpy_generic (1 cpu-clock samples, 0.24%)operator new(unsigned long) (2 cpu-clock samples, 0.47%)qsl::protocol::encode(qsl::protocol::Fill const&) (2 cpu-clock samples, 0.47%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (36 cpu-clock samples, 8.51%)qsl::gatewa..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (7 cpu-clock samples, 1.65%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (24 cpu-clock samples, 5.67%)qsl::e..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (3 cpu-clock samples, 0.71%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (12 cpu-clock samples, 2.84%)q..__memcpy_generic (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (2 cpu-clock samples, 0.47%)operator new(unsigned long) (2 cpu-clock samples, 0.47%)malloc (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::check_limit(qsl::engine::RiskConfig const&, qsl::core::Side, long, unsigned int) (3 cpu-clock samples, 0.71%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (5 cpu-clock samples, 1.18%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.71%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (2 cpu-clock samples, 0.47%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (28 cpu-clock samples, 6.62%)qsl::pro..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (5 cpu-clock samples, 1.18%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (26 cpu-clock samples, 6.15%)qsl::re..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::cancel(unsigned long) (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (17 cpu-clock samples, 4.02%)qsl..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (10 cpu-clock samples, 2.36%)q..qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.71%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (7 cpu-clock samples, 1.65%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (6 cpu-clock samples, 1.42%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (4 cpu-clock samples, 0.95%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.47%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)__memcpy_generic (1 cpu-clock samples, 0.24%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (17 cpu-clock samples, 4.02%)qsl..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (6 cpu-clock samples, 1.42%)qsl::engine::OrderBook::contains(unsigned long) const (6 cpu-clock samples, 1.42%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (3 cpu-clock samples, 0.71%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.47%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (36 cpu-clock samples, 8.51%)qsl::replay..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (34 cpu-clock samples, 8.04%)qsl::repla..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (6 cpu-clock samples, 1.42%)qsl::engine::OrderBook::cancel(unsigned long) (5 cpu-clock samples, 1.18%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.47%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (8 cpu-clock samples, 1.89%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (6 cpu-clock samples, 1.42%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (5 cpu-clock samples, 1.18%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (15 cpu-clock samples, 3.55%)qsl..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.07%)qs..__memcpy_generic (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (10 cpu-clock samples, 2.36%)q..qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (8 cpu-clock samples, 1.89%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (6 cpu-clock samples, 1.42%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 0.95%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (10 cpu-clock samples, 2.36%)q..[unknown] (10 cpu-clock samples, 2.36%)[..[unknown] (10 cpu-clock samples, 2.36%)[..[unknown] (6 cpu-clock samples, 1.42%)[unknown] (4 cpu-clock samples, 0.95%)_mid_memalign (4 cpu-clock samples, 0.95%)__posix_memalign (2 cpu-clock samples, 0.47%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (4 cpu-clock samples, 0.95%)__posix_memalign (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (6 cpu-clock samples, 1.42%)[unknown] (6 cpu-clock samples, 1.42%)[unknown] (6 cpu-clock samples, 1.42%)[unknown] (5 cpu-clock samples, 1.18%)[unknown] (2 cpu-clock samples, 0.47%)_mid_memalign (2 cpu-clock samples, 0.47%)__posix_memalign (3 cpu-clock samples, 0.71%)malloc (2 cpu-clock samples, 0.47%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (10 cpu-clock samples, 2.36%)q..[unknown] (5 cpu-clock samples, 1.18%)[unknown] (5 cpu-clock samples, 1.18%)operator new(unsigned long) (5 cpu-clock samples, 1.18%)malloc (3 cpu-clock samples, 0.71%)free@plt (2 cpu-clock samples, 0.47%)operator delete(void*)@plt (2 cpu-clock samples, 0.47%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (11 cpu-clock samples, 2.60%)q..[unknown] (11 cpu-clock samples, 2.60%)[..[unknown] (11 cpu-clock samples, 2.60%)[..cfree@GLIBC_2.17 (8 cpu-clock samples, 1.89%)operator new(unsigned long) (3 cpu-clock samples, 0.71%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.71%)[unknown] (3 cpu-clock samples, 0.71%)[unknown] (3 cpu-clock samples, 0.71%)[unknown] (2 cpu-clock samples, 0.47%)[unknown] (1 cpu-clock samples, 0.24%)_mid_memalign (1 cpu-clock samples, 0.24%)__posix_memalign (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)[unknown] (2 cpu-clock samples, 0.47%)[unknown] (2 cpu-clock samples, 0.47%)operator new(unsigned long) (2 cpu-clock samples, 0.47%)malloc (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (7 cpu-clock samples, 1.65%)[unknown] (5 cpu-clock samples, 1.18%)[unknown] (5 cpu-clock samples, 1.18%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (1 cpu-clock samples, 0.24%)_mid_memalign (1 cpu-clock samples, 0.24%)__posix_memalign (3 cpu-clock samples, 0.71%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)__posix_memalign (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)cfree@GLIBC_2.17 (4 cpu-clock samples, 0.95%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)__libc_start_call_main (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)cfree@GLIBC_2.17 (7 cpu-clock samples, 1.65%)_start (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (3 cpu-clock samples, 0.71%)[unknown] (1 cpu-clock samples, 0.24%)do_lookup_x (1 cpu-clock samples, 0.24%)dl_relocate_ld (1 cpu-clock samples, 0.24%)_dl_lookup_symbol_x (2 cpu-clock samples, 0.47%)_dl_new_hash (1 cpu-clock samples, 0.24%)_dl_relocate_object_no_relro (1 cpu-clock samples, 0.24%)elf_dynamic_do_Rela (1 cpu-clock samples, 0.24%)elf_machine_rela (1 cpu-clock samples, 0.24%)resolve_map (1 cpu-clock samples, 0.24%)dl_symbol_visibility_binds_local_p (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (19 cpu-clock samples, 4.49%)decl..[unknown] (19 cpu-clock samples, 4.49%)[unk..[unknown] (19 cpu-clock samples, 4.49%)[unk..cfree@GLIBC_2.17 (19 cpu-clock samples, 4.49%)cfre..main (5 cpu-clock samples, 1.18%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (3 cpu-clock samples, 0.71%)malloc (3 cpu-clock samples, 0.71%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator new(unsigned long) (5 cpu-clock samples, 1.18%)malloc@plt (5 cpu-clock samples, 1.18%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (5 cpu-clock samples, 1.18%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (2 cpu-clock samples, 0.47%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (10 cpu-clock samples, 2.36%)q..[unknown] (9 cpu-clock samples, 2.13%)[unknown] (9 cpu-clock samples, 2.13%)[unknown] (6 cpu-clock samples, 1.42%)[unknown] (2 cpu-clock samples, 0.47%)_mid_memalign (2 cpu-clock samples, 0.47%)__posix_memalign (4 cpu-clock samples, 0.95%)malloc (3 cpu-clock samples, 0.71%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.47%)free@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.71%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_unhook()@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (5 cpu-clock samples, 1.18%)free@plt (1 cpu-clock samples, 0.24%)memcpy@plt (2 cpu-clock samples, 0.47%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (5 cpu-clock samples, 1.18%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)cfree@GLIBC_2.17 (4 cpu-clock samples, 0.95%)memcpy@plt (1 cpu-clock samples, 0.24%)qsl::protocol::encode(qsl::protocol::Ack const&) (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.47%)operator new(unsigned long) (5 cpu-clock samples, 1.18%)malloc (5 cpu-clock samples, 1.18%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (2 cpu-clock samples, 0.47%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%) +]]>QSL Matching-Engine Flame Graph (qsl-bench)flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 402 samples | 164 stacks | 2026-06-22T01:13:09ZSearch all (402 cpu-clock samples, 100.00%)allqsl-bench (402 cpu-clock samples, 100.00%)qsl-bench[unknown] (322 cpu-clock samples, 80.10%)[unknown][unknown] (296 cpu-clock samples, 73.63%)[unknown][unknown] (245 cpu-clock samples, 60.95%)[unknown][unknown] (4 cpu-clock samples, 1.00%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)_dl_cache_libcmp (1 cpu-clock samples, 0.25%)check_match (2 cpu-clock samples, 0.50%)strcmp (1 cpu-clock samples, 0.25%)_dl_relocate_object_no_relro (1 cpu-clock samples, 0.25%)elf_dynamic_do_Rela (1 cpu-clock samples, 0.25%)elf_machine_rela (1 cpu-clock samples, 0.25%)resolve_map (1 cpu-clock samples, 0.25%)dl_symbol_visibility_binds_local_p (1 cpu-clock samples, 0.25%)__libc_start_call_main (241 cpu-clock samples, 59.95%)__libc_start_call_mainmain (241 cpu-clock samples, 59.95%)maincfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (43 cpu-clock samples, 10.70%)qsl::engine::Or..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (8 cpu-clock samples, 1.99%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (26 cpu-clock samples, 6.47%)qsl::eng..operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (8 cpu-clock samples, 1.99%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (5 cpu-clock samples, 1.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (12 cpu-clock samples, 2.99%)st..std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (5 cpu-clock samples, 1.24%)qsl::engine::OrderBook::cancel(unsigned long) (33 cpu-clock samples, 8.21%)qsl::engin..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (33 cpu-clock samples, 8.21%)decltype(a..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (18 cpu-clock samples, 4.48%)qsl:..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.75%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (6 cpu-clock samples, 1.49%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (3 cpu-clock samples, 0.75%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (11 cpu-clock samples, 2.74%)q..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (54 cpu-clock samples, 13.43%)qsl::gateway::Sessi..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (54 cpu-clock samples, 13.43%)qsl::gateway::Sessi..__memcpy_generic (1 cpu-clock samples, 0.25%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (47 cpu-clock samples, 11.69%)qsl::gateway::Se..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (11 cpu-clock samples, 2.74%)q..qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (5 cpu-clock samples, 1.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::Ack const&) (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::Fill const&) (3 cpu-clock samples, 0.75%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (29 cpu-clock samples, 7.21%)qsl::gate..qsl::engine::MatchingEngine::can_store_limit(unsigned int, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (19 cpu-clock samples, 4.73%)qsl::..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (11 cpu-clock samples, 2.74%)q..__memcpy_generic (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (8 cpu-clock samples, 1.99%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.50%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (4 cpu-clock samples, 1.00%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (1 cpu-clock samples, 0.25%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (16 cpu-clock samples, 3.98%)qsl..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (2 cpu-clock samples, 0.50%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (26 cpu-clock samples, 6.47%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (6 cpu-clock samples, 1.49%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (6 cpu-clock samples, 1.49%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.50%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.23%)qs..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (7 cpu-clock samples, 1.74%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (4 cpu-clock samples, 1.00%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (15 cpu-clock samples, 3.73%)qsl..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (6 cpu-clock samples, 1.49%)qsl::engine::OrderBook::contains(unsigned long) const (4 cpu-clock samples, 1.00%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (4 cpu-clock samples, 1.00%)qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::cancel(unsigned long) (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.25%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (34 cpu-clock samples, 8.46%)qsl::replay..__memcpy_generic (1 cpu-clock samples, 0.25%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (28 cpu-clock samples, 6.97%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::cancel(unsigned long) (4 cpu-clock samples, 1.00%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.75%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (7 cpu-clock samples, 1.74%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (6 cpu-clock samples, 1.49%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.75%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (1 cpu-clock samples, 0.25%)std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (10 cpu-clock samples, 2.49%)q..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (7 cpu-clock samples, 1.74%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.25%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 1.00%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.50%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (7 cpu-clock samples, 1.74%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.75%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.75%)std::_Rb_tree<unsigned int, std::pair<unsigned int const, qsl::engine::OrderBook>, std::_Select1st<std::pair<unsigned int const, qsl::engine::OrderBook> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, qsl::engine::OrderBook> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned int const, qsl::engine::OrderBook> >*) [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::~OrderBook() (1 cpu-clock samples, 0.25%)std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_erase(std::_Rb_tree_node<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >*) (1 cpu-clock samples, 0.25%)operator new(unsigned long) (5 cpu-clock samples, 1.24%)malloc@plt (5 cpu-clock samples, 1.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (8 cpu-clock samples, 1.99%)[unknown] (8 cpu-clock samples, 1.99%)[unknown] (8 cpu-clock samples, 1.99%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (3 cpu-clock samples, 0.75%)_mid_memalign (3 cpu-clock samples, 0.75%)__posix_memalign (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (4 cpu-clock samples, 1.00%)__posix_memalign (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (14 cpu-clock samples, 3.48%)qsl..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (11 cpu-clock samples, 2.74%)[..[unknown] (6 cpu-clock samples, 1.49%)_mid_memalign (6 cpu-clock samples, 1.49%)__posix_memalign (5 cpu-clock samples, 1.24%)malloc (5 cpu-clock samples, 1.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.50%)__posix_memalign (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (12 cpu-clock samples, 2.99%)qs..[unknown] (10 cpu-clock samples, 2.49%)[..[unknown] (10 cpu-clock samples, 2.49%)[..cfree@GLIBC_2.17 (4 cpu-clock samples, 1.00%)operator new(unsigned long) (6 cpu-clock samples, 1.49%)malloc (5 cpu-clock samples, 1.24%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (12 cpu-clock samples, 2.99%)qs..[unknown] (12 cpu-clock samples, 2.99%)[u..[unknown] (12 cpu-clock samples, 2.99%)[u..cfree@GLIBC_2.17 (6 cpu-clock samples, 1.49%)operator new(unsigned long) (6 cpu-clock samples, 1.49%)malloc (4 cpu-clock samples, 1.00%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)main (3 cpu-clock samples, 0.75%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.75%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)malloc@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.50%)posix_memalign@plt (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)_mid_memalign (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)operator new(unsigned long) (3 cpu-clock samples, 0.75%)malloc (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (6 cpu-clock samples, 1.49%)[unknown] (5 cpu-clock samples, 1.24%)[unknown] (5 cpu-clock samples, 1.24%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (1 cpu-clock samples, 0.25%)__libc_malloc2 (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)__posix_memalign (1 cpu-clock samples, 0.25%)malloc (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (3 cpu-clock samples, 0.75%)__posix_memalign (2 cpu-clock samples, 0.50%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (5 cpu-clock samples, 1.24%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (4 cpu-clock samples, 1.00%)cfree@GLIBC_2.17 (4 cpu-clock samples, 1.00%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)malloc (1 cpu-clock samples, 0.25%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t)@plt (2 cpu-clock samples, 0.50%)__libc_start_call_main (5 cpu-clock samples, 1.24%)[unknown] (5 cpu-clock samples, 1.24%)[unknown] (5 cpu-clock samples, 1.24%)cfree@GLIBC_2.17 (5 cpu-clock samples, 1.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (9 cpu-clock samples, 2.24%)[unknown] (9 cpu-clock samples, 2.24%)[unknown] (9 cpu-clock samples, 2.24%)cfree@GLIBC_2.17 (9 cpu-clock samples, 2.24%)main (17 cpu-clock samples, 4.23%)main[unknown] (12 cpu-clock samples, 2.99%)[u..[unknown] (12 cpu-clock samples, 2.99%)[u..operator new(unsigned long) (12 cpu-clock samples, 2.99%)op..malloc (8 cpu-clock samples, 1.99%)operator delete(void*)@plt (3 cpu-clock samples, 0.75%)operator delete(void*, unsigned long)@plt (2 cpu-clock samples, 0.50%)operator new(unsigned long) (3 cpu-clock samples, 0.75%)malloc@plt (3 cpu-clock samples, 0.75%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.23%)qs..[unknown] (6 cpu-clock samples, 1.49%)[unknown] (6 cpu-clock samples, 1.49%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (4 cpu-clock samples, 1.00%)malloc (2 cpu-clock samples, 0.50%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (2 cpu-clock samples, 0.50%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (16 cpu-clock samples, 3.98%)qsl..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (11 cpu-clock samples, 2.74%)[..[unknown] (7 cpu-clock samples, 1.74%)[unknown] (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)_mid_memalign (6 cpu-clock samples, 1.49%)__posix_memalign (4 cpu-clock samples, 1.00%)malloc (4 cpu-clock samples, 1.00%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.50%)__posix_memalign (2 cpu-clock samples, 0.50%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook()@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (1 cpu-clock samples, 0.25%)memcpy@plt (1 cpu-clock samples, 0.25%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (4 cpu-clock samples, 1.00%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.75%)memcpy@plt (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (2 cpu-clock samples, 0.50%)operator new(unsigned long)@plt (2 cpu-clock samples, 0.50%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)operator new(unsigned long) (3 cpu-clock samples, 0.75%)malloc (3 cpu-clock samples, 0.75%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (3 cpu-clock samples, 0.75%)free@plt (2 cpu-clock samples, 0.50%)operator delete(void*, unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.25%) diff --git a/results/flamegraph.txt b/results/flamegraph.txt index 0cbec7f..b0d8682 100644 --- a/results/flamegraph.txt +++ b/results/flamegraph.txt @@ -8,19 +8,19 @@ Perf: perf version 6.19.14-400.asahi.fc44.aarch64 Perf paranoid: 2 Build type: Release Provenance version: 1 -Git commit (informational): 872600a -Source digest: sha256:211e5835552616102fbe44d8f10dfa7cb6a4b35495dca98243bc87d37c45cfb0 +Git commit (informational): 52de5b8 +Source digest: sha256:75c1d53ba776085cb43ed6c600692286ab547ec20c9dc7a2018a56c222673f3c Source digest scope: flamegraph-benchmark Dirty inputs: no Generated output: results/flamegraph.svg -Date: 2026-06-21T12:54:50Z +Date: 2026-06-22T01:13:09Z Benchmark binary: build/bench/qsl-bench Dataset: qsl-bench default synthetic benchmark suite Call graph: dwarf Record event: cpu-clock Sample freq: 4000 Hz -Sample count: 423 -Folded stacks: 163 +Sample count: 402 +Folded stacks: 164 Minimum samples for hot profile: 200 Insufficient samples: no Record status: 0 @@ -34,25 +34,25 @@ investigation. Frame width is proportional to on-CPU samples, not wall-clock latency or throughput, and is hardware/kernel/compiler/build dependent. Top 15 folded stacks (count stack): - 28 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) - 23 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) - 19 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 - 14 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) - 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long);std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&) - 10 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] - 9 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) - 8 qsl-bench;[unknown];[unknown];qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];cfree@GLIBC_2.17 - 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - 7 qsl-bench;__libc_start_call_main;[unknown];[unknown];cfree@GLIBC_2.17 - 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::has_symbol(unsigned int) const - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) + 16 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) + 12 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] + 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) + 9 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 + 8 qsl-bench;main;[unknown];[unknown];operator new(unsigned long);malloc + 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) + 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) + 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) + 6 qsl-bench;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];[unknown];[unknown];_mid_memalign + 6 qsl-bench;[unknown];[unknown];qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);[unknown];[unknown];[unknown];[unknown];_mid_memalign + 6 qsl-bench;[unknown];[unknown];qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];cfree@GLIBC_2.17 + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long);qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const Benchmark output: -order_book add/mod/cancel 200000 ops 140.7 ns/op 7107229 ops/sec -protocol encode+decode 500000 ops 21.0 ns/op 47719996 ops/sec -gateway session (fill) 200000 ops 129.6 ns/op 7715309 ops/sec -matching engine flow 5004 items 102.3 ns/item 9773521 items/sec -replay command log 5004 items 111.8 ns/item 8946368 items/sec +order_book add/mod/cancel 200000 ops 133.5 ns/op 7487925 ops/sec +protocol encode+decode 500000 ops 20.7 ns/op 48254784 ops/sec +gateway session (fill) 200000 ops 128.0 ns/op 7812016 ops/sec +matching engine flow 5004 items 102.3 ns/item 9773237 items/sec +replay command log 5004 items 112.3 ns/item 8905762 items/sec From 9c68039a8d90993dc3b5b23bff73b714bff1b393 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 21:25:12 -0400 Subject: [PATCH 10/22] refactor: reduce decode_new_order complexity in fix.cpp (CodeScene gate) CodeScene's delta gate scored src/protocol/fix.cpp at 8.02 (decode_new_order: Complex Method / Complex Conditional / Overall Code Complexity). Restructure without changing behavior: - Extract map_side / map_ord_type / map_tif (the enum-code switches) and expect_msg_type (the tag-35 check) as small helpers. - Add a FieldReader accumulator that reads required integer/coded fields and short-circuits on the first error, so decode_new_order and decode_cancel_order become a flat fluent chain plus a single error check instead of a long if-return ladder with three inline switches. make check 261/261; make asan 261/261; FIX tests 19 cases / 140 assertions. Co-Authored-By: Claude Opus 4.8 --- src/protocol/fix.cpp | 192 +++++++++++++++++++++++++------------------ 1 file changed, 112 insertions(+), 80 deletions(-) diff --git a/src/protocol/fix.cpp b/src/protocol/fix.cpp index b63de16..c9a292b 100644 --- a/src/protocol/fix.cpp +++ b/src/protocol/fix.cpp @@ -172,6 +172,93 @@ template return FixError::None; } +// Map FIX coded-enum characters to internal enums (Side 1/2, OrdType 1/2, +// TIF 1/3). An unrecognized code is InvalidEnumValue. +[[nodiscard]] FixError map_side(char c, Side &out) noexcept { + switch (c) { + case '1': + out = Side::Buy; + return FixError::None; + case '2': + out = Side::Sell; + return FixError::None; + default: + return FixError::InvalidEnumValue; + } +} + +[[nodiscard]] FixError map_ord_type(char c, OrderType &out) noexcept { + switch (c) { + case '1': + out = OrderType::Market; + return FixError::None; + case '2': + out = OrderType::Limit; + return FixError::None; + default: + return FixError::InvalidEnumValue; + } +} + +[[nodiscard]] FixError map_tif(char c, TimeInForce &out) noexcept { + switch (c) { + case '1': + out = TimeInForce::GTC; + return FixError::None; + case '3': + out = TimeInForce::IOC; + return FixError::None; + default: + return FixError::InvalidEnumValue; + } +} + +// Confirm MsgType (tag 35) is present, single-character, and the expected type. +[[nodiscard]] FixError expect_msg_type(const Parsed &p, char expected) noexcept { + const Field *type = find_field(p, kTagMsgType); + if (type == nullptr || type->value.size() != 1 || type->value.front() != expected) { + return FixError::UnknownMsgType; + } + return FixError::None; +} + +// Reads required fields and short-circuits on the first error, so the typed +// decoders stay a flat chain instead of a long if-return ladder. +class FieldReader { + public: + explicit FieldReader(const Parsed &p) noexcept : p_(p) {} + + template FieldReader &integer(unsigned tag, Int &out) noexcept { + if (err_ == FixError::None) { + err_ = require_int(p_, tag, out); + } + return *this; + } + FieldReader &side(unsigned tag, Side &out) noexcept { return coded(tag, out, map_side); } + FieldReader &ord_type(unsigned tag, OrderType &out) noexcept { + return coded(tag, out, map_ord_type); + } + FieldReader &tif(unsigned tag, TimeInForce &out) noexcept { return coded(tag, out, map_tif); } + + [[nodiscard]] FixError error() const noexcept { return err_; } + + private: + template FieldReader &coded(unsigned tag, Enum &out, Map map) noexcept { + if (err_ != FixError::None) { + return *this; + } + char code = 0; + err_ = require_code(p_, tag, code); + if (err_ == FixError::None) { + err_ = map(code, out); + } + return *this; + } + + const Parsed &p_; + FixError err_{FixError::None}; +}; + void append_field(std::string &dst, unsigned tag, std::string_view value) { dst += std::to_string(tag); dst += '='; @@ -248,72 +335,25 @@ FixDecodeResult decode_new_order(std::string_view msg) noexcept { if (const FixError e = parse_envelope(msg, p); e != FixError::None) { return {e, {}}; } - const Field *type = find_field(p, kTagMsgType); - if (type == nullptr || type->value.size() != 1 || type->value.front() != kMsgNewOrderSingle) { - return {FixError::UnknownMsgType, {}}; - } - - NewOrder out{}; - SeqNo seq = 0; // standard header field (tag 34); validated but not stored. - if (const FixError e = require_int(p, kTagMsgSeqNum, seq); e != FixError::None) { - return {e, {}}; - } - if (const FixError e = require_int(p, kTagClOrdID, out.order_id); e != FixError::None) { - return {e, {}}; - } - if (const FixError e = require_int(p, kTagSymbol, out.symbol); e != FixError::None) { - return {e, {}}; - } - if (const FixError e = require_int(p, kTagOrderQty, out.quantity); e != FixError::None) { - return {e, {}}; - } - if (const FixError e = require_int(p, kTagPrice, out.price); e != FixError::None) { + if (const FixError e = expect_msg_type(p, kMsgNewOrderSingle); e != FixError::None) { return {e, {}}; } - char side = 0; - char ord_type = 0; - char tif = 0; - if (const FixError e = require_code(p, kTagSide, side); e != FixError::None) { - return {e, {}}; - } - if (const FixError e = require_code(p, kTagOrdType, ord_type); e != FixError::None) { - return {e, {}}; - } - if (const FixError e = require_code(p, kTagTimeInForce, tif); e != FixError::None) { + NewOrder out{}; + SeqNo seq = 0; // tag 34 (standard header); validated but not stored. + const FixError e = FieldReader(p) + .integer(kTagMsgSeqNum, seq) + .integer(kTagClOrdID, out.order_id) + .integer(kTagSymbol, out.symbol) + .integer(kTagOrderQty, out.quantity) + .integer(kTagPrice, out.price) + .side(kTagSide, out.side) + .ord_type(kTagOrdType, out.type) + .tif(kTagTimeInForce, out.tif) + .error(); + if (e != FixError::None) { return {e, {}}; } - - switch (side) { - case '1': - out.side = Side::Buy; - break; - case '2': - out.side = Side::Sell; - break; - default: - return {FixError::InvalidEnumValue, {}}; - } - switch (ord_type) { - case '1': - out.type = OrderType::Market; - break; - case '2': - out.type = OrderType::Limit; - break; - default: - return {FixError::InvalidEnumValue, {}}; - } - switch (tif) { - case '1': - out.tif = TimeInForce::GTC; - break; - case '3': - out.tif = TimeInForce::IOC; - break; - default: - return {FixError::InvalidEnumValue, {}}; - } return {FixError::None, out}; } @@ -322,28 +362,20 @@ FixDecodeResult decode_cancel_order(std::string_view msg) noexcept if (const FixError e = parse_envelope(msg, p); e != FixError::None) { return {e, {}}; } - const Field *type = find_field(p, kTagMsgType); - if (type == nullptr || type->value.size() != 1 || - type->value.front() != kMsgOrderCancelRequest) { - return {FixError::UnknownMsgType, {}}; + if (const FixError e = expect_msg_type(p, kMsgOrderCancelRequest); e != FixError::None) { + return {e, {}}; } CancelOrder out{}; - SeqNo seq = 0; - if (const FixError e = require_int(p, kTagMsgSeqNum, seq); e != FixError::None) { - return {e, {}}; - } - if (const FixError e = require_int(p, kTagOrigClOrdID, out.order_id); e != FixError::None) { - return {e, {}}; - } - // ClOrdID (tag 11) is required by FIX on an OrderCancelRequest. CancelOrder - // does not model a separate cancel-request id, so it is validated (present - // and numeric) but not stored — keeping decode symmetric with encode. - OrderId clord_id = 0; - if (const FixError e = require_int(p, kTagClOrdID, clord_id); e != FixError::None) { - return {e, {}}; - } - if (const FixError e = require_int(p, kTagSymbol, out.symbol); e != FixError::None) { + SeqNo seq = 0; // tag 34 + OrderId clord_id = 0; // tag 11 (ClOrdID): required by FIX, validated but not stored. + const FixError e = FieldReader(p) + .integer(kTagMsgSeqNum, seq) + .integer(kTagOrigClOrdID, out.order_id) + .integer(kTagClOrdID, clord_id) + .integer(kTagSymbol, out.symbol) + .error(); + if (e != FixError::None) { return {e, {}}; } return {FixError::None, out}; From 4aec1d0b81d40a799d9c64077d87b57ed81ef230 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 21:28:01 -0400 Subject: [PATCH 11/22] refactor: flatten flamegraph.py remaining complexity (CodeScene) Clear the last two CodeScene flags on scripts/flamegraph.py: - _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings never contain parens, so a non-nested " (...)$" match is exact and won't strip a C++ signature's own parentheses. - _layout: drop the unused `total` parameter (5 args -> 4). Output unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 --- scripts/flamegraph.py | 27 ++++++++++----------------- 1 file changed, 10 insertions(+), 17 deletions(-) diff --git a/scripts/flamegraph.py b/scripts/flamegraph.py index 96accb8..a9cc7f3 100755 --- a/scripts/flamegraph.py +++ b/scripts/flamegraph.py @@ -40,6 +40,10 @@ # parenthesized group and the symbol is everything between the address and it. _FRAME_RE = re.compile(r"^\s+(?P[0-9a-fA-F]+)\s+(?P.*\S)\s*$") _OFFSET_RE = re.compile(r"\+0x[0-9a-fA-F]+$") +# Trailing " (dso)" group. perf prints a space before the dso, and dso strings +# (paths or "[unknown]") never contain parens, so a non-nested match is exact and +# avoids stripping a C++ signature's own "(...)" (which has no preceding space). +_DSO_RE = re.compile(r"\s+\([^()]*\)$") def _clean_symbol(rest: str) -> str: @@ -47,21 +51,9 @@ def _clean_symbol(rest: str) -> str: Drops the trailing `(dso)` and the `+0xoffset`, matching stackcollapse-perf. """ - # Strip the final "(...)" dso group if present (balanced at end of line). - if rest.endswith(")"): - depth = 0 - for i in range(len(rest) - 1, -1, -1): - if rest[i] == ")": - depth += 1 - elif rest[i] == "(": - depth -= 1 - if depth == 0: - rest = rest[:i].rstrip() - break + rest = _DSO_RE.sub("", rest) rest = _OFFSET_RE.sub("", rest).strip() - if not rest: - return "[unknown]" - return rest + return rest if rest else "[unknown]" class _Folder: @@ -181,12 +173,13 @@ def _color(name: str) -> str: return f"rgb({r},{g},{b})" -def _layout(node: _Node, depth: int, x: int, total: int, out: list) -> None: +def _layout(node: _Node, depth: int, x: int, out: list) -> None: + """Pre-order walk assigning each node a (depth, x-offset-in-samples).""" out.append((node, depth, x)) cursor = x for name in sorted(node.children): child = node.children[name] - _layout(child, depth + 1, cursor, total, out) + _layout(child, depth + 1, cursor, out) cursor += child.value @@ -286,7 +279,7 @@ def render_svg(root: _Node, opts: FlameOptions | None = None) -> str: opts = opts or FlameOptions() total = root.value or 1 placed: list = [] - _layout(root, 0, 0, total, placed) + _layout(root, 0, 0, placed) max_depth = max((d for _, d, _ in placed), default=0) height = _PAD_TOP + (max_depth + 1) * opts.frame_height + _PAD_BOTTOM canvas = _Canvas( From 3905059ebade2a924e1fdb20d04b1aca512d8943 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 21:29:36 -0400 Subject: [PATCH 12/22] perf: regenerate flamegraph artifact after complexity flattening Provenance input changed; regenerate from clean tree (416 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 --- results/flamegraph.svg | 12 +++++----- results/flamegraph.txt | 50 +++++++++++++++++++++--------------------- 2 files changed, 31 insertions(+), 31 deletions(-) diff --git a/results/flamegraph.svg b/results/flamegraph.svg index 7882ae3..378b45b 100644 --- a/results/flamegraph.svg +++ b/results/flamegraph.svg @@ -2,20 +2,20 @@ -QSL Matching-Engine Flame Graph (qsl-bench)flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 402 samples | 164 stacks | 2026-06-22T01:13:09ZSearch all (402 cpu-clock samples, 100.00%)allqsl-bench (402 cpu-clock samples, 100.00%)qsl-bench[unknown] (322 cpu-clock samples, 80.10%)[unknown][unknown] (296 cpu-clock samples, 73.63%)[unknown][unknown] (245 cpu-clock samples, 60.95%)[unknown][unknown] (4 cpu-clock samples, 1.00%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)_dl_cache_libcmp (1 cpu-clock samples, 0.25%)check_match (2 cpu-clock samples, 0.50%)strcmp (1 cpu-clock samples, 0.25%)_dl_relocate_object_no_relro (1 cpu-clock samples, 0.25%)elf_dynamic_do_Rela (1 cpu-clock samples, 0.25%)elf_machine_rela (1 cpu-clock samples, 0.25%)resolve_map (1 cpu-clock samples, 0.25%)dl_symbol_visibility_binds_local_p (1 cpu-clock samples, 0.25%)__libc_start_call_main (241 cpu-clock samples, 59.95%)__libc_start_call_mainmain (241 cpu-clock samples, 59.95%)maincfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (43 cpu-clock samples, 10.70%)qsl::engine::Or..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (8 cpu-clock samples, 1.99%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (26 cpu-clock samples, 6.47%)qsl::eng..operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (8 cpu-clock samples, 1.99%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (5 cpu-clock samples, 1.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (12 cpu-clock samples, 2.99%)st..std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (5 cpu-clock samples, 1.24%)qsl::engine::OrderBook::cancel(unsigned long) (33 cpu-clock samples, 8.21%)qsl::engin..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (33 cpu-clock samples, 8.21%)decltype(a..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (18 cpu-clock samples, 4.48%)qsl:..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.75%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (6 cpu-clock samples, 1.49%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (3 cpu-clock samples, 0.75%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (11 cpu-clock samples, 2.74%)q..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (54 cpu-clock samples, 13.43%)qsl::gateway::Sessi..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (54 cpu-clock samples, 13.43%)qsl::gateway::Sessi..__memcpy_generic (1 cpu-clock samples, 0.25%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (47 cpu-clock samples, 11.69%)qsl::gateway::Se..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (11 cpu-clock samples, 2.74%)q..qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (5 cpu-clock samples, 1.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::Ack const&) (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::Fill const&) (3 cpu-clock samples, 0.75%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (29 cpu-clock samples, 7.21%)qsl::gate..qsl::engine::MatchingEngine::can_store_limit(unsigned int, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (19 cpu-clock samples, 4.73%)qsl::..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (11 cpu-clock samples, 2.74%)q..__memcpy_generic (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (8 cpu-clock samples, 1.99%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.50%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (4 cpu-clock samples, 1.00%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (1 cpu-clock samples, 0.25%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (16 cpu-clock samples, 3.98%)qsl..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (2 cpu-clock samples, 0.50%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (26 cpu-clock samples, 6.47%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (6 cpu-clock samples, 1.49%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (6 cpu-clock samples, 1.49%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.50%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.23%)qs..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (7 cpu-clock samples, 1.74%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (4 cpu-clock samples, 1.00%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (15 cpu-clock samples, 3.73%)qsl..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (6 cpu-clock samples, 1.49%)qsl::engine::OrderBook::contains(unsigned long) const (4 cpu-clock samples, 1.00%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (4 cpu-clock samples, 1.00%)qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::cancel(unsigned long) (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.25%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (34 cpu-clock samples, 8.46%)qsl::replay..__memcpy_generic (1 cpu-clock samples, 0.25%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (28 cpu-clock samples, 6.97%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::cancel(unsigned long) (4 cpu-clock samples, 1.00%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.75%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (7 cpu-clock samples, 1.74%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (6 cpu-clock samples, 1.49%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.75%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (1 cpu-clock samples, 0.25%)std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (10 cpu-clock samples, 2.49%)q..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (7 cpu-clock samples, 1.74%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.25%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 1.00%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.50%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (7 cpu-clock samples, 1.74%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.75%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.75%)std::_Rb_tree<unsigned int, std::pair<unsigned int const, qsl::engine::OrderBook>, std::_Select1st<std::pair<unsigned int const, qsl::engine::OrderBook> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, qsl::engine::OrderBook> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned int const, qsl::engine::OrderBook> >*) [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::~OrderBook() (1 cpu-clock samples, 0.25%)std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_erase(std::_Rb_tree_node<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >*) (1 cpu-clock samples, 0.25%)operator new(unsigned long) (5 cpu-clock samples, 1.24%)malloc@plt (5 cpu-clock samples, 1.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (8 cpu-clock samples, 1.99%)[unknown] (8 cpu-clock samples, 1.99%)[unknown] (8 cpu-clock samples, 1.99%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (3 cpu-clock samples, 0.75%)_mid_memalign (3 cpu-clock samples, 0.75%)__posix_memalign (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (4 cpu-clock samples, 1.00%)__posix_memalign (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (14 cpu-clock samples, 3.48%)qsl..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (11 cpu-clock samples, 2.74%)[..[unknown] (6 cpu-clock samples, 1.49%)_mid_memalign (6 cpu-clock samples, 1.49%)__posix_memalign (5 cpu-clock samples, 1.24%)malloc (5 cpu-clock samples, 1.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.50%)__posix_memalign (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (12 cpu-clock samples, 2.99%)qs..[unknown] (10 cpu-clock samples, 2.49%)[..[unknown] (10 cpu-clock samples, 2.49%)[..cfree@GLIBC_2.17 (4 cpu-clock samples, 1.00%)operator new(unsigned long) (6 cpu-clock samples, 1.49%)malloc (5 cpu-clock samples, 1.24%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (12 cpu-clock samples, 2.99%)qs..[unknown] (12 cpu-clock samples, 2.99%)[u..[unknown] (12 cpu-clock samples, 2.99%)[u..cfree@GLIBC_2.17 (6 cpu-clock samples, 1.49%)operator new(unsigned long) (6 cpu-clock samples, 1.49%)malloc (4 cpu-clock samples, 1.00%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)main (3 cpu-clock samples, 0.75%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.75%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)malloc@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.50%)posix_memalign@plt (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)_mid_memalign (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)operator new(unsigned long) (3 cpu-clock samples, 0.75%)malloc (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (6 cpu-clock samples, 1.49%)[unknown] (5 cpu-clock samples, 1.24%)[unknown] (5 cpu-clock samples, 1.24%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (1 cpu-clock samples, 0.25%)__libc_malloc2 (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)__posix_memalign (1 cpu-clock samples, 0.25%)malloc (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (3 cpu-clock samples, 0.75%)__posix_memalign (2 cpu-clock samples, 0.50%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (5 cpu-clock samples, 1.24%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (4 cpu-clock samples, 1.00%)cfree@GLIBC_2.17 (4 cpu-clock samples, 1.00%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)malloc (1 cpu-clock samples, 0.25%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t)@plt (2 cpu-clock samples, 0.50%)__libc_start_call_main (5 cpu-clock samples, 1.24%)[unknown] (5 cpu-clock samples, 1.24%)[unknown] (5 cpu-clock samples, 1.24%)cfree@GLIBC_2.17 (5 cpu-clock samples, 1.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (9 cpu-clock samples, 2.24%)[unknown] (9 cpu-clock samples, 2.24%)[unknown] (9 cpu-clock samples, 2.24%)cfree@GLIBC_2.17 (9 cpu-clock samples, 2.24%)main (17 cpu-clock samples, 4.23%)main[unknown] (12 cpu-clock samples, 2.99%)[u..[unknown] (12 cpu-clock samples, 2.99%)[u..operator new(unsigned long) (12 cpu-clock samples, 2.99%)op..malloc (8 cpu-clock samples, 1.99%)operator delete(void*)@plt (3 cpu-clock samples, 0.75%)operator delete(void*, unsigned long)@plt (2 cpu-clock samples, 0.50%)operator new(unsigned long) (3 cpu-clock samples, 0.75%)malloc@plt (3 cpu-clock samples, 0.75%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.23%)qs..[unknown] (6 cpu-clock samples, 1.49%)[unknown] (6 cpu-clock samples, 1.49%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (4 cpu-clock samples, 1.00%)malloc (2 cpu-clock samples, 0.50%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (2 cpu-clock samples, 0.50%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (16 cpu-clock samples, 3.98%)qsl..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (11 cpu-clock samples, 2.74%)[..[unknown] (7 cpu-clock samples, 1.74%)[unknown] (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)_mid_memalign (6 cpu-clock samples, 1.49%)__posix_memalign (4 cpu-clock samples, 1.00%)malloc (4 cpu-clock samples, 1.00%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.50%)__posix_memalign (2 cpu-clock samples, 0.50%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook()@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (1 cpu-clock samples, 0.25%)memcpy@plt (1 cpu-clock samples, 0.25%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (4 cpu-clock samples, 1.00%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.75%)memcpy@plt (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (2 cpu-clock samples, 0.50%)operator new(unsigned long)@plt (2 cpu-clock samples, 0.50%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)operator new(unsigned long) (3 cpu-clock samples, 0.75%)malloc (3 cpu-clock samples, 0.75%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (3 cpu-clock samples, 0.75%)free@plt (2 cpu-clock samples, 0.50%)operator delete(void*, unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.25%) +]]>QSL Matching-Engine Flame Graph (qsl-bench)flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 416 samples | 165 stacks | 2026-06-22T01:28:01ZSearch all (416 cpu-clock samples, 100.00%)allqsl-bench (416 cpu-clock samples, 100.00%)qsl-bench[unknown] (335 cpu-clock samples, 80.53%)[unknown][unknown] (317 cpu-clock samples, 76.20%)[unknown][unknown] (276 cpu-clock samples, 66.35%)[unknown][unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (2 cpu-clock samples, 0.48%)do_lookup_x (2 cpu-clock samples, 0.48%)_dl_lookup_symbol_x (1 cpu-clock samples, 0.24%)_dl_new_hash (1 cpu-clock samples, 0.24%)__libc_start_call_main (273 cpu-clock samples, 65.62%)__libc_start_call_mainmain (273 cpu-clock samples, 65.62%)maincfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (39 cpu-clock samples, 9.38%)qsl::engine:..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (5 cpu-clock samples, 1.20%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (30 cpu-clock samples, 7.21%)qsl::engi..operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (13 cpu-clock samples, 3.12%)qs..std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (12 cpu-clock samples, 2.88%)st..std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (8 cpu-clock samples, 1.92%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (2 cpu-clock samples, 0.48%)std::pmr::(anonymous namespace)::newdel_res_t::do_allocate(unsigned long, unsigned long) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::cancel(unsigned long) (42 cpu-clock samples, 10.10%)qsl::engine::O..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (41 cpu-clock samples, 9.86%)decltype(auto..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (19 cpu-clock samples, 4.57%)qsl:..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (2 cpu-clock samples, 0.48%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.24%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (13 cpu-clock samples, 3.12%)qs..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (74 cpu-clock samples, 17.79%)qsl::gateway::Session::on_b..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (73 cpu-clock samples, 17.55%)qsl::gateway::Session::on_..qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (68 cpu-clock samples, 16.35%)qsl::gateway::Session::p..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (20 cpu-clock samples, 4.81%)qsl::..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (10 cpu-clock samples, 2.40%)q..__memcpy_generic (4 cpu-clock samples, 0.96%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::protocol::encode(qsl::protocol::Ack const&) (1 cpu-clock samples, 0.24%)qsl::protocol::encode(qsl::protocol::Fill const&) (2 cpu-clock samples, 0.48%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (34 cpu-clock samples, 8.17%)qsl::gatew..qsl::engine::MatchingEngine::can_store_limit(unsigned int, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (3 cpu-clock samples, 0.72%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (25 cpu-clock samples, 6.01%)qsl::en..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (14 cpu-clock samples, 3.37%)qs..__memcpy_generic (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (9 cpu-clock samples, 2.16%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (2 cpu-clock samples, 0.48%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (5 cpu-clock samples, 1.20%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.72%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (14 cpu-clock samples, 3.37%)qs..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (6 cpu-clock samples, 1.44%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (29 cpu-clock samples, 6.97%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (3 cpu-clock samples, 0.72%)qsl::engine::OrderBook::cancel(unsigned long) (2 cpu-clock samples, 0.48%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (4 cpu-clock samples, 0.96%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (2 cpu-clock samples, 0.48%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.48%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (17 cpu-clock samples, 4.09%)qsl:..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (12 cpu-clock samples, 2.88%)qs..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (6 cpu-clock samples, 1.44%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (4 cpu-clock samples, 0.96%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 0.96%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (3 cpu-clock samples, 0.72%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (18 cpu-clock samples, 4.33%)qsl:..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (8 cpu-clock samples, 1.92%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (4 cpu-clock samples, 0.96%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.48%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (33 cpu-clock samples, 7.93%)qsl::repla..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (28 cpu-clock samples, 6.73%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::cancel(unsigned long) (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (9 cpu-clock samples, 2.16%)qsl::engine::OrderBook::can_apply_modify(unsigned long, long, unsigned int) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (5 cpu-clock samples, 1.20%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.72%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.72%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (15 cpu-clock samples, 3.61%)qsl..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (8 cpu-clock samples, 1.92%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.20%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.48%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (4 cpu-clock samples, 0.96%)operator new(unsigned long) (5 cpu-clock samples, 1.20%)malloc@plt (5 cpu-clock samples, 1.20%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (10 cpu-clock samples, 2.40%)q..[unknown] (10 cpu-clock samples, 2.40%)[..[unknown] (10 cpu-clock samples, 2.40%)[..[unknown] (7 cpu-clock samples, 1.68%)[unknown] (5 cpu-clock samples, 1.20%)_mid_memalign (5 cpu-clock samples, 1.20%)__posix_memalign (2 cpu-clock samples, 0.48%)_mid_memalign (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (3 cpu-clock samples, 0.72%)__posix_memalign (3 cpu-clock samples, 0.72%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (10 cpu-clock samples, 2.40%)q..[unknown] (9 cpu-clock samples, 2.16%)[unknown] (9 cpu-clock samples, 2.16%)[unknown] (8 cpu-clock samples, 1.92%)[unknown] (3 cpu-clock samples, 0.72%)_mid_memalign (3 cpu-clock samples, 0.72%)__posix_memalign (5 cpu-clock samples, 1.20%)malloc (5 cpu-clock samples, 1.20%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (9 cpu-clock samples, 2.16%)[unknown] (5 cpu-clock samples, 1.20%)[unknown] (5 cpu-clock samples, 1.20%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.48%)operator new(unsigned long) (3 cpu-clock samples, 0.72%)malloc (2 cpu-clock samples, 0.48%)free@plt (3 cpu-clock samples, 0.72%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)operator new(unsigned long) (4 cpu-clock samples, 0.96%)malloc (3 cpu-clock samples, 0.72%)operator new(unsigned long) (3 cpu-clock samples, 0.72%)malloc@plt (3 cpu-clock samples, 0.72%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)posix_memalign@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)__posix_memalign (2 cpu-clock samples, 0.48%)malloc (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.20%)[unknown] (5 cpu-clock samples, 1.20%)[unknown] (5 cpu-clock samples, 1.20%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (1 cpu-clock samples, 0.24%)_mid_memalign (1 cpu-clock samples, 0.24%)__posix_memalign (2 cpu-clock samples, 0.48%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.48%)__posix_memalign (1 cpu-clock samples, 0.24%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (2 cpu-clock samples, 0.48%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (4 cpu-clock samples, 0.96%)operator new(unsigned long, std::align_val_t)@plt (4 cpu-clock samples, 0.96%)__libc_start_call_main (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)cfree@GLIBC_2.17 (7 cpu-clock samples, 1.68%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (11 cpu-clock samples, 2.64%)d..[unknown] (11 cpu-clock samples, 2.64%)[..[unknown] (11 cpu-clock samples, 2.64%)[..cfree@GLIBC_2.17 (11 cpu-clock samples, 2.64%)c..main (14 cpu-clock samples, 3.37%)main[unknown] (10 cpu-clock samples, 2.40%)[..[unknown] (10 cpu-clock samples, 2.40%)[..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (9 cpu-clock samples, 2.16%)malloc (6 cpu-clock samples, 1.44%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (2 cpu-clock samples, 0.48%)operator new(unsigned long) (5 cpu-clock samples, 1.20%)malloc@plt (5 cpu-clock samples, 1.20%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)posix_memalign@plt (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (4 cpu-clock samples, 0.96%)[unknown] (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (20 cpu-clock samples, 4.81%)qsl::..[unknown] (17 cpu-clock samples, 4.09%)[unk..[unknown] (17 cpu-clock samples, 4.09%)[unk..[unknown] (13 cpu-clock samples, 3.12%)[u..[unknown] (9 cpu-clock samples, 2.16%)_mid_memalign (9 cpu-clock samples, 2.16%)__posix_memalign (4 cpu-clock samples, 0.96%)malloc (3 cpu-clock samples, 0.72%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.48%)__posix_memalign (1 cpu-clock samples, 0.24%)memcpy@plt (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.48%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (4 cpu-clock samples, 0.96%)free@plt (1 cpu-clock samples, 0.24%)memcpy@plt (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (2 cpu-clock samples, 0.48%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (6 cpu-clock samples, 1.44%)[unknown] (6 cpu-clock samples, 1.44%)[unknown] (6 cpu-clock samples, 1.44%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)operator new(unsigned long) (3 cpu-clock samples, 0.72%)malloc (3 cpu-clock samples, 0.72%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (2 cpu-clock samples, 0.48%)memcpy@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%) diff --git a/results/flamegraph.txt b/results/flamegraph.txt index b0d8682..ea163fb 100644 --- a/results/flamegraph.txt +++ b/results/flamegraph.txt @@ -8,19 +8,19 @@ Perf: perf version 6.19.14-400.asahi.fc44.aarch64 Perf paranoid: 2 Build type: Release Provenance version: 1 -Git commit (informational): 52de5b8 -Source digest: sha256:75c1d53ba776085cb43ed6c600692286ab547ec20c9dc7a2018a56c222673f3c +Git commit (informational): 4aec1d0 +Source digest: sha256:619c700c4c9b872ffd42e0b4145d73f06548f971c50b2158398a7722b3d5f41a Source digest scope: flamegraph-benchmark Dirty inputs: no Generated output: results/flamegraph.svg -Date: 2026-06-22T01:13:09Z +Date: 2026-06-22T01:28:01Z Benchmark binary: build/bench/qsl-bench Dataset: qsl-bench default synthetic benchmark suite Call graph: dwarf Record event: cpu-clock Sample freq: 4000 Hz -Sample count: 402 -Folded stacks: 164 +Sample count: 416 +Folded stacks: 165 Minimum samples for hot profile: 200 Insufficient samples: no Record status: 0 @@ -34,25 +34,25 @@ investigation. Frame width is proportional to on-CPU samples, not wall-clock latency or throughput, and is hardware/kernel/compiler/build dependent. Top 15 folded stacks (count stack): - 16 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) - 12 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] - 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) - 9 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 - 8 qsl-bench;main;[unknown];[unknown];operator new(unsigned long);malloc - 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) - 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) - 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) - 6 qsl-bench;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];[unknown];[unknown];_mid_memalign - 6 qsl-bench;[unknown];[unknown];qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);[unknown];[unknown];[unknown];[unknown];_mid_memalign - 6 qsl-bench;[unknown];[unknown];qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];cfree@GLIBC_2.17 - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long);qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const + 20 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] + 14 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) + 14 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) + 13 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) + 11 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 + 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long);std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&) + 9 qsl-bench;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];[unknown];[unknown];_mid_memalign + 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) + 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long);qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const + 7 qsl-bench;__libc_start_call_main;[unknown];[unknown];cfree@GLIBC_2.17 + 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) + 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&);qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 6 qsl-bench;main;[unknown];[unknown];operator new(unsigned long);malloc + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) Benchmark output: -order_book add/mod/cancel 200000 ops 133.5 ns/op 7487925 ops/sec -protocol encode+decode 500000 ops 20.7 ns/op 48254784 ops/sec -gateway session (fill) 200000 ops 128.0 ns/op 7812016 ops/sec -matching engine flow 5004 items 102.3 ns/item 9773237 items/sec -replay command log 5004 items 112.3 ns/item 8905762 items/sec +order_book add/mod/cancel 200000 ops 132.9 ns/op 7523640 ops/sec +protocol encode+decode 500000 ops 19.8 ns/op 50418890 ops/sec +gateway session (fill) 200000 ops 127.6 ns/op 7838397 ops/sec +matching engine flow 5004 items 102.5 ns/item 9759934 items/sec +replay command log 5004 items 111.8 ns/item 8943232 items/sec From 68fe197521c6ce60a8eb796d185fcdc4540d6ba2 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 21:34:56 -0400 Subject: [PATCH 13/22] refactor: remove decoder duplication and split parse_envelope (CodeScene) Clear the remaining CodeScene flags on src/protocol/fix.cpp (Code Duplication, Complex Method, Complex Conditional): - Extract a decode_typed skeleton (validate envelope -> confirm MsgType -> fill body via FieldReader -> error check) so decode_new_order and decode_cancel_order collapse to just their field maps, removing the duplicated prologue/epilogue. - Split parse_envelope into tokenize / check_envelope_shape / verify_length_and_checksum, each a small single-purpose function, and fold the 8/9/10 ordering check into a named bool. Behavior unchanged: make check 261/261, make asan 261/261, FIX tests 19 cases / 140 assertions. Co-Authored-By: Claude Opus 4.8 --- src/protocol/fix.cpp | 145 ++++++++++++++++++++++--------------------- 1 file changed, 74 insertions(+), 71 deletions(-) diff --git a/src/protocol/fix.cpp b/src/protocol/fix.cpp index c9a292b..f6b8976 100644 --- a/src/protocol/fix.cpp +++ b/src/protocol/fix.cpp @@ -61,66 +61,60 @@ template [[nodiscard]] bool parse_int(std::string_view sv, Int &out) return nullptr; } -// Validate the FIX envelope: SOH-delimited tag=value framing, the 8/9/.../10 -// ordering, BodyLength (tag 9), and the mod-256 CheckSum (tag 10). On success the -// field table is filled; business fields are looked up by the typed decoders. -[[nodiscard]] FixError parse_envelope(std::string_view msg, Parsed &out) noexcept { - if (msg.empty() || msg.size() > kMaxMessageLen) { - return FixError::Malformed; - } - +// Split the message into SOH-delimited tag=value fields. Malformed framing +// (missing SOH, missing '=', non-numeric tag, too many fields) is rejected. +[[nodiscard]] FixError tokenize(std::string_view msg, Parsed &out) noexcept { std::size_t pos = 0; while (pos < msg.size()) { const std::size_t field_start = pos; const std::size_t soh = msg.find(kSoh, pos); if (soh == std::string_view::npos) { - return FixError::Malformed; // a field is not SOH-terminated + return FixError::Malformed; // field not SOH-terminated } const std::size_t eq = msg.find('=', pos); if (eq == std::string_view::npos || eq >= soh) { return FixError::Malformed; // no '=' within the field } - const std::string_view tag_sv = msg.substr(field_start, eq - field_start); - const std::string_view val_sv = msg.substr(eq + 1, soh - (eq + 1)); unsigned tag = 0; - if (!parse_int(tag_sv, tag)) { + if (!parse_int(msg.substr(field_start, eq - field_start), tag)) { return FixError::Malformed; // non-numeric tag } if (out.count >= kMaxFields) { return FixError::Malformed; // too many fields } - out.fields[out.count++] = Field{tag, val_sv, field_start}; + out.fields[out.count++] = Field{tag, msg.substr(eq + 1, soh - (eq + 1)), field_start}; pos = soh + 1; } + return FixError::None; +} - if (out.count < 3) { +// Confirm the 8 / 9 / ... / 10 field ordering and the supported BeginString. +[[nodiscard]] FixError check_envelope_shape(const Parsed &p) noexcept { + if (p.count < 3) { return FixError::Malformed; } - const Field &f_begin = out.fields[0]; - const Field &f_len = out.fields[1]; - const Field &f_csum = out.fields[out.count - 1]; - if (f_begin.tag != kTagBeginString || f_len.tag != kTagBodyLength || - f_csum.tag != kTagCheckSum) { + const Field &begin = p.fields[0]; + const bool ordered = begin.tag == kTagBeginString && p.fields[1].tag == kTagBodyLength && + p.fields[p.count - 1].tag == kTagCheckSum; + if (!ordered) { return FixError::Malformed; } - if (f_begin.value != kBeginString) { - return FixError::UnsupportedBeginString; - } + return begin.value == kBeginString ? FixError::None : FixError::UnsupportedBeginString; +} - // BodyLength counts the bytes from the first field after tag 9 through the - // SOH preceding tag 10, i.e. [fields[2].start, checksum_field.start). - const std::size_t body_start = out.fields[2].start; +// Verify BodyLength (tag 9) against the actual body span and the mod-256 +// CheckSum (tag 10) against the sum of every byte before the tag-10 field. +[[nodiscard]] FixError verify_length_and_checksum(std::string_view msg, const Parsed &p) noexcept { + const Field &f_csum = p.fields[p.count - 1]; + // BodyLength spans [fields[2].start, checksum_field.start). const std::size_t checksum_start = f_csum.start; std::size_t body_len = 0; - if (!parse_int(f_len.value, body_len)) { + if (!parse_int(p.fields[1].value, body_len)) { return FixError::InvalidField; } - if (body_len != checksum_start - body_start) { + if (body_len != checksum_start - p.fields[2].start) { return FixError::BodyLengthMismatch; } - - // CheckSum is the mod-256 sum of every byte up to the SOH before tag 10, - // formatted as exactly three digits. unsigned declared = 0; if (f_csum.value.size() != 3 || !parse_int(f_csum.value, declared)) { return FixError::InvalidField; @@ -129,10 +123,22 @@ template [[nodiscard]] bool parse_int(std::string_view sv, Int &out) for (std::size_t i = 0; i < checksum_start; ++i) { sum += static_cast(msg[i]); } - if ((sum & 0xFFu) != declared) { - return FixError::ChecksumMismatch; + return (sum & 0xFFu) == declared ? FixError::None : FixError::ChecksumMismatch; +} + +// Validate the FIX envelope and fill the field table; business fields are then +// looked up by the typed decoders. +[[nodiscard]] FixError parse_envelope(std::string_view msg, Parsed &out) noexcept { + if (msg.empty() || msg.size() > kMaxMessageLen) { + return FixError::Malformed; + } + if (const FixError e = tokenize(msg, out); e != FixError::None) { + return e; } - return FixError::None; + if (const FixError e = check_envelope_shape(out); e != FixError::None) { + return e; + } + return verify_length_and_checksum(msg, out); } // Extract a required integer field; map absence/format/overflow to structured @@ -330,55 +336,52 @@ std::string encode(const CancelOrder &msg, SeqNo seq) { return frame(body); } -FixDecodeResult decode_new_order(std::string_view msg) noexcept { +// Shared typed-decode skeleton: validate the envelope, confirm MsgType, then let +// `fill` read the body fields through a FieldReader (which short-circuits on the +// first error). Keeps the two public decoders to just their field maps. +template +[[nodiscard]] FixDecodeResult decode_typed(std::string_view msg, char expected, + Fill fill) noexcept { Parsed p; if (const FixError e = parse_envelope(msg, p); e != FixError::None) { return {e, {}}; } - if (const FixError e = expect_msg_type(p, kMsgNewOrderSingle); e != FixError::None) { + if (const FixError e = expect_msg_type(p, expected); e != FixError::None) { return {e, {}}; } - - NewOrder out{}; - SeqNo seq = 0; // tag 34 (standard header); validated but not stored. - const FixError e = FieldReader(p) - .integer(kTagMsgSeqNum, seq) - .integer(kTagClOrdID, out.order_id) - .integer(kTagSymbol, out.symbol) - .integer(kTagOrderQty, out.quantity) - .integer(kTagPrice, out.price) - .side(kTagSide, out.side) - .ord_type(kTagOrdType, out.type) - .tif(kTagTimeInForce, out.tif) - .error(); - if (e != FixError::None) { - return {e, {}}; + T out{}; + FieldReader reader(p); + fill(reader, out); + if (reader.error() != FixError::None) { + return {reader.error(), {}}; } return {FixError::None, out}; } -FixDecodeResult decode_cancel_order(std::string_view msg) noexcept { - Parsed p; - if (const FixError e = parse_envelope(msg, p); e != FixError::None) { - return {e, {}}; - } - if (const FixError e = expect_msg_type(p, kMsgOrderCancelRequest); e != FixError::None) { - return {e, {}}; - } +FixDecodeResult decode_new_order(std::string_view msg) noexcept { + return decode_typed(msg, kMsgNewOrderSingle, [](FieldReader &r, NewOrder &o) { + SeqNo seq = 0; // tag 34 (standard header); validated but not stored. + r.integer(kTagMsgSeqNum, seq) + .integer(kTagClOrdID, o.order_id) + .integer(kTagSymbol, o.symbol) + .integer(kTagOrderQty, o.quantity) + .integer(kTagPrice, o.price) + .side(kTagSide, o.side) + .ord_type(kTagOrdType, o.type) + .tif(kTagTimeInForce, o.tif); + }); +} - CancelOrder out{}; - SeqNo seq = 0; // tag 34 - OrderId clord_id = 0; // tag 11 (ClOrdID): required by FIX, validated but not stored. - const FixError e = FieldReader(p) - .integer(kTagMsgSeqNum, seq) - .integer(kTagOrigClOrdID, out.order_id) - .integer(kTagClOrdID, clord_id) - .integer(kTagSymbol, out.symbol) - .error(); - if (e != FixError::None) { - return {e, {}}; - } - return {FixError::None, out}; +FixDecodeResult decode_cancel_order(std::string_view msg) noexcept { + return decode_typed( + msg, kMsgOrderCancelRequest, [](FieldReader &r, CancelOrder &o) { + SeqNo seq = 0; // tag 34 + OrderId clord_id = 0; // tag 11 (ClOrdID): required by FIX, validated but not stored. + r.integer(kTagMsgSeqNum, seq) + .integer(kTagOrigClOrdID, o.order_id) + .integer(kTagClOrdID, clord_id) + .integer(kTagSymbol, o.symbol); + }); } FixDecodeResult peek_msg_type(std::string_view msg) noexcept { From cb8f99ce9e05cec294102142b1b3205efbec31ec Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 21:40:38 -0400 Subject: [PATCH 14/22] refactor: table-driven enum maps + simpler msg-type check (CodeScene) Clear the last two CodeScene advisory flags on src/protocol/fix.cpp: - Code Duplication: the three near-identical map_side/map_ord_type/map_tif switches are replaced by one generic FieldReader::coded(tag, out, table) that looks a code up in a small constexpr {code, enum} table; the decoder supplies the per-enum tables. No per-enum mapping duplication remains. - Complex Conditional: expect_msg_type's 3-term `||` is split so each branch has a single operator. make check 261/261; make asan 261/261; FIX tests 19 cases / 140 assertions. Co-Authored-By: Claude Opus 4.8 --- src/protocol/fix.cpp | 87 +++++++++++++++----------------------------- 1 file changed, 30 insertions(+), 57 deletions(-) diff --git a/src/protocol/fix.cpp b/src/protocol/fix.cpp index f6b8976..a8c75ec 100644 --- a/src/protocol/fix.cpp +++ b/src/protocol/fix.cpp @@ -6,6 +6,7 @@ #include #include #include +#include namespace qsl::protocol::fix { @@ -178,54 +179,13 @@ template return FixError::None; } -// Map FIX coded-enum characters to internal enums (Side 1/2, OrdType 1/2, -// TIF 1/3). An unrecognized code is InvalidEnumValue. -[[nodiscard]] FixError map_side(char c, Side &out) noexcept { - switch (c) { - case '1': - out = Side::Buy; - return FixError::None; - case '2': - out = Side::Sell; - return FixError::None; - default: - return FixError::InvalidEnumValue; - } -} - -[[nodiscard]] FixError map_ord_type(char c, OrderType &out) noexcept { - switch (c) { - case '1': - out = OrderType::Market; - return FixError::None; - case '2': - out = OrderType::Limit; - return FixError::None; - default: - return FixError::InvalidEnumValue; - } -} - -[[nodiscard]] FixError map_tif(char c, TimeInForce &out) noexcept { - switch (c) { - case '1': - out = TimeInForce::GTC; - return FixError::None; - case '3': - out = TimeInForce::IOC; - return FixError::None; - default: - return FixError::InvalidEnumValue; - } -} - // Confirm MsgType (tag 35) is present, single-character, and the expected type. [[nodiscard]] FixError expect_msg_type(const Parsed &p, char expected) noexcept { const Field *type = find_field(p, kTagMsgType); - if (type == nullptr || type->value.size() != 1 || type->value.front() != expected) { + if (type == nullptr || type->value.size() != 1) { return FixError::UnknownMsgType; } - return FixError::None; + return type->value.front() == expected ? FixError::None : FixError::UnknownMsgType; } // Reads required fields and short-circuits on the first error, so the typed @@ -240,27 +200,34 @@ class FieldReader { } return *this; } - FieldReader &side(unsigned tag, Side &out) noexcept { return coded(tag, out, map_side); } - FieldReader &ord_type(unsigned tag, OrderType &out) noexcept { - return coded(tag, out, map_ord_type); - } - FieldReader &tif(unsigned tag, TimeInForce &out) noexcept { return coded(tag, out, map_tif); } - - [[nodiscard]] FixError error() const noexcept { return err_; } - private: - template FieldReader &coded(unsigned tag, Enum &out, Map map) noexcept { + // Read a single-character coded field and map it via a {code, enum} table + // (Side 1/2, OrdType 1/2, TIF 1/3). One generic method covers every enum, so + // there is no per-enum mapping duplication. An unknown code is InvalidEnumValue. + template + FieldReader &coded(unsigned tag, Enum &out, + const std::array, N> &table) noexcept { if (err_ != FixError::None) { return *this; } char code = 0; err_ = require_code(p_, tag, code); - if (err_ == FixError::None) { - err_ = map(code, out); + if (err_ != FixError::None) { + return *this; + } + for (const auto &entry : table) { + if (entry.first == code) { + out = entry.second; + return *this; + } } + err_ = FixError::InvalidEnumValue; return *this; } + [[nodiscard]] FixError error() const noexcept { return err_; } + + private: const Parsed &p_; FixError err_{FixError::None}; }; @@ -360,15 +327,21 @@ template FixDecodeResult decode_new_order(std::string_view msg) noexcept { return decode_typed(msg, kMsgNewOrderSingle, [](FieldReader &r, NewOrder &o) { + static constexpr std::array, 2> sides{ + {{'1', Side::Buy}, {'2', Side::Sell}}}; + static constexpr std::array, 2> types{ + {{'1', OrderType::Market}, {'2', OrderType::Limit}}}; + static constexpr std::array, 2> tifs{ + {{'1', TimeInForce::GTC}, {'3', TimeInForce::IOC}}}; SeqNo seq = 0; // tag 34 (standard header); validated but not stored. r.integer(kTagMsgSeqNum, seq) .integer(kTagClOrdID, o.order_id) .integer(kTagSymbol, o.symbol) .integer(kTagOrderQty, o.quantity) .integer(kTagPrice, o.price) - .side(kTagSide, o.side) - .ord_type(kTagOrdType, o.type) - .tif(kTagTimeInForce, o.tif); + .coded(kTagSide, o.side, sides) + .coded(kTagOrdType, o.type, types) + .coded(kTagTimeInForce, o.tif, tifs); }); } From d630fbdc558156ddb5df653a2c308c53cec16cca Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 21:59:12 -0400 Subject: [PATCH 15/22] =?UTF-8?q?release:=20v0.2.1=20=E2=80=94=20FIX=20tex?= =?UTF-8?q?t=20protocol=20adapter,=20perf=20flamegraph,=20anchor=20sweep?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bump the project version 0.2.0 -> 0.2.1 and record the release. - CMakeLists.txt: project VERSION 0.2.1. - CHANGELOG.md: new [0.2.1] section — FIX-like text protocol adapter (#29), `make flamegraph` + dependency-free renderer (#32), and the Codex resume-anchor / PMU-claim consistency sweep (#127/#128 follow-up); [Unreleased] reset. - PROGRESS.md / HANDOFF.md: release and resume anchors brought to the v0.2.1 released state; #29 and #32 marked done. No code or benchmark artifacts change in this release PR. On squash-merge, tag `v0.2.1` on the merge commit and publish the GitHub release. make check 261/261. Co-Authored-By: Claude Opus 4.8 --- CHANGELOG.md | 35 ++++++++++++++++++++++++++ CMakeLists.txt | 2 +- HANDOFF.md | 45 +++++++++++++++++++--------------- PROGRESS.md | 66 +++++++++++++++++++++++++++++++------------------- 4 files changed, 103 insertions(+), 45 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 813bef7..98f8da8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,41 @@ All notable changes to this project. The format is loosely based on _Nothing yet._ +## [0.2.1] - 2026-06-21 + +Two backlog items — reprioritized by the maintainer and delivered — plus a resume-anchor and +perf-evidence consistency sweep. Same honesty bar as prior releases: a deterministic C++20 exchange +simulator and cross-language differential-testing harness — **not** a production exchange, no +real-market connectivity, no latency or profitability claims, and not formal verification. + +### Added + +- **FIX-like text protocol adapter (#29).** A human-readable `tag=value` (SOH-framed) codec + (`include/qsl/protocol/fix.hpp`, `src/protocol/fix.cpp`) over the **same internal message structs** + as the binary codec, with genuine FIX framing — BeginString (8) / BodyLength (9) / MsgType (35) / + … / mod-256 CheckSum (10) — for the client→gateway order path: NewOrderSingle (`35=D`) → `NewOrder` + and OrderCancelRequest (`35=F`) → `CancelOrder`. Decoding is total, deterministic, and `noexcept` + (fixed field table, `std::from_chars`, `std::string_view`; no heap on the decode path) and reports + every malformed input through a `FixError` taxonomy mirroring the binary codec's `DecodeError`. + Covered by `tests/unit/test_fix_protocol.cpp`, including a **cross-codec equivalence** test (binary + and FIX decode the same order to identical structs) and a byte-pinned fixture; documented in + `docs/fix_protocol.md`. Prices stay integer ticks and Symbol carries the numeric `SymbolId` + (documented simplifications, never floating-point price). +- **`make flamegraph` (#32).** Renders a Linux `perf` call-graph flamegraph + (`results/flamegraph.svg` + a provenance/classification `results/flamegraph.txt`) from the + benchmark harness via `scripts/flamegraph.py` — a dependency-free (stdlib-only) stackcollapse + SVG + renderer (deterministic; unit-tested in `tests/shell/test_flamegraph.sh`), so the artifact is + reproducible from the repo without vendoring the Perl FlameGraph toolkit. The committed artifact is + a software cpu-clock sampling **hot-symbol profile** from the bare-metal Fedora Asahi host — not a + latency/throughput claim; full hardware cache-PMU evidence stays in issue #90. + +### Changed + +- Synced the `/resume` anchors and perf-evidence wording to the released `v0.2.0` state and narrowed + an overstated Apple **Blizzard** (E-core) PMU claim — those rows read `` because the + single-threaded benchmark stays on the Avalanche P-cores (Codex follow-up to PRs #127/#128): + `PROGRESS.md`, `AGENTS.md`/`CLAUDE.md` agreement, and `docs/perf_analysis.md`. + ## [0.2.0] - 2026-06-21 Quant Systems Lab v0.2.0 — the Phase III/IV systems arc (M24–M49: a bounded SPSC queue and threaded diff --git a/CMakeLists.txt b/CMakeLists.txt index 383a4e6..19fc3b1 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -1,5 +1,5 @@ cmake_minimum_required(VERSION 3.24) -project(quant-systems-lab VERSION 0.2.0 LANGUAGES CXX) +project(quant-systems-lab VERSION 0.2.1 LANGUAGES CXX) set(CMAKE_CXX_STANDARD 20) set(CMAKE_CXX_STANDARD_REQUIRED ON) diff --git a/HANDOFF.md b/HANDOFF.md index 0bee945..0c949cc 100644 --- a/HANDOFF.md +++ b/HANDOFF.md @@ -16,7 +16,8 @@ command lists, roadmap state, non-overclaiming rules, and benchmark rules. --- ## Current handoff -The repo is released at `v0.2.0` (tag on ded6e80, marked Latest), after `v0.1.0`. M0–M49 are +The repo's current release is `v0.2.1` (tagged on the release-PR merge commit, marked Latest), after +`v0.2.0` (ded6e80) and `v0.1.0`. M0–M49 are merged. PR #101 (40f9249) and PR #102 (7092423) synchronized project-memory files after M35. PR #103 (0f2ceb7) inserted the repository-health refactor phase **M36–M42** and shifted the original networking/persistence roadmap after those @@ -28,8 +29,12 @@ refactors. PR #113 extended the future roadmap to **M43–M49**. M36–M42 lande PR #123 (c643b62), and PR #124 (d8c16b2), with M45B provenance migration in PR #116 (b9ea27a) and the M47 storage diagnosis follow-up in PR #122 (548cb68). The Linux host artifact refresh landed as PR #125 (d9094df), and the **v0.2.0 release** — a bare-metal Linux evidence refresh, the -partial-PMU reframe, and a full documentation staleness sweep — landed as PR #127 (ded6e80). There -is no active milestone; the project is between releases. +partial-PMU reframe, and a full documentation staleness sweep — landed as PR #127 (ded6e80). The +**v0.2.1 release** then adds two reprioritized backlog items and a consistency sweep: a Codex +resume-anchor/PMU sweep (PR #129), a perf call-graph flamegraph + `make flamegraph` (PR #130, +issue #32), the FIX-like text protocol adapter (PR #131, issue #29), and the version-bump release +PR — merged in that order, with `v0.2.1` tagged on the release merge commit. There is no active +milestone; the project is between releases. Background — Linux perf evidence (merged, now bare-metal partial PMU): @@ -65,32 +70,34 @@ git pull --ff-only git log --oneline -10 gh pr list --state open git tag -l -gh release view v0.2.0 +gh release view v0.2.1 ``` Current state: -- latest synced main baseline: `ded6e80` (PR #127, v0.2.0 release) +- latest synced main baseline: `ded6e80` (PR #127, v0.2.0); the `v0.2.1` baseline is the release-PR + merge commit, after PRs #129/#130/#131 - current active branch, if active: none (work lands via scoped PRs from `main`) -- current active status: `v0.2.0` released. The bare-metal Linux evidence refresh, the - partial-PMU reframe (real Apple PMU counters; cache counters unsupported), and a full - documentation staleness sweep are merged. All 15 `results/*.txt` are bare-metal with - `Dirty inputs: no` and no MAC leaks; `make check` 241/241 and `make asan` 241/241; README and - recruiting benchmark numbers match `results/latest.txt`. No active milestone -- release tag: `v0.2.0` (Latest), after `v0.1.0` +- current active status: `v0.2.1` is the current release on top of `v0.2.0`. It adds the FIX-like + text protocol adapter (#29), `make flamegraph` + a bare-metal flamegraph artifact (#32), and a + Codex resume-anchor/PMU consistency sweep. `make check` 261/261 and `make asan` 261/261 on the + bare-metal Apple M2 Fedora Asahi host; both new code files pass the CI CodeScene Code Health gate. + No active milestone +- release tag: `v0.2.1` (Latest, tagged on the release-PR merge commit), after `v0.2.0` and `v0.1.0` - open follow-up issue: #90 — narrowed to the full cache-counter PMU set; the bare-metal Apple host provides real cycles/instructions/branches/branch-misses but no cache-reference/cache-miss support -- issues #95, #28, and #26 were closed by PR #112 +- issues #95, #28, and #26 were closed by PR #112; issues #32 and #29 were closed by PR #130 and + PR #131 (now part of `v0.2.1`) - open review request issue: #94 -- legacy backlog still open: #29 and #32 ### Next milestone -There is no active milestone. M0–M49, the Linux artifact refresh (PR #125), and the v0.2.0 release -(PR #127) are merged. The highest-value remaining work is non-code and externally gated: issue #94 -(independent external review — needs a human reviewer) and issue #90 (full cache-counter PMU -evidence — needs a PMU microarchitecture that exposes cache events). Low-signal backlog: #32 -(flamegraph) and #29 (FIX adapter). Do not invent a new milestone without an explicit human request. +There is no active milestone. M0–M49, the Linux artifact refresh (PR #125), the v0.2.0 release +(PR #127), and the v0.2.1 content (PRs #129/#130/#131 + release PR) are merged. The highest-value +remaining work is non-code and externally gated: issue #94 (independent external review — needs a +human reviewer) and issue #90 (full cache-counter PMU evidence — needs a PMU microarchitecture that +exposes cache events). The #32 (flamegraph) and #29 (FIX adapter) backlog items are now done. Do not +invent a new milestone without an explicit human request. ### Phase III / IV purpose @@ -100,7 +107,7 @@ studies, advanced concurrency validation, event-driven gateway architecture, mul pressure, NUMA/affinity and scheduler-migration studies, ingress memory ordering and false-sharing evidence, persistence/recovery benchmarking, and late-stage low-latency networking research. -Current priority order (post-v0.2.0): +Current priority order (post-v0.2.1): 1. Issue #94 — independent external technical review remains the single highest credibility gap (human-gated; cannot be self-certified). diff --git a/PROGRESS.md b/PROGRESS.md index f7c953c..f5d937c 100644 --- a/PROGRESS.md +++ b/PROGRESS.md @@ -20,27 +20,27 @@ Do not rely on prior chat memory. ## Current state -- **Active milestone:** none — `v0.2.0` released; project is between releases -- **Status:** ☑ `v0.2.0` published (Phase III/IV systems arc + bare-metal evidence refresh) +- **Active milestone:** none — `v0.2.1` released; project is between releases +- **Status:** ☑ `v0.2.1` published (FIX-like text protocol adapter #29, perf flamegraph #32, and a + resume-anchor/PMU consistency sweep) on top of `v0.2.0` - **Active branch:** none (work lands via scoped PRs from `main`) - **Last completed milestone:** M49 — NIC offload and low-latency networking study (PR #124, - d8c16b2), then the Linux host artifact refresh (PR #125, d9094df) and the v0.2.0 release - (PR #127, ded6e80) -- **Last completed docs sync:** v0.2.0 documentation staleness sweep (PR #127): perf evidence - reframed as bare-metal partial PMU, release-readiness rewritten, every doc read and brought current -- **Release:** `v0.1.0` (tag on 9857e1a) and `v0.2.0` (tag on ded6e80, marked Latest) published as - GitHub-only releases; no packages published -- **`make check` passing:** yes — `make check` 241/241 and `make asan` 241/241 on the bare-metal - Apple M2 (aarch64) Fedora Asahi host on 2026-06-21 -- **Last action:** prepared and released `v0.2.0`. Reframed the perf evidence from "constrained - Docker validation" to **partial hardware PMU evidence** on a bare-metal Apple M2 (real - cycles/instructions/branches/branch-misses; cache-references/cache-misses unsupported by the Apple - Silicon PMU), with a new three-way `perf_stat.sh` classifier and a reframed issue #90. Regenerated - all 15 `results/*.txt` on bare metal (`Dirty inputs: no`, MAC-leak grep clean), bumped the project - version to 0.2.0, swept every doc for staleness (release-readiness rewritten to 241 tests / six CI - jobs; architecture/socket/storage/OCaml framing corrected), verified all six mermaid diagrams, and - synced README/recruiting benchmark numbers to `results/latest.txt` (~87/16/110/98/110 ns). - PR #127 squash-merged to `main` as ded6e80; `v0.2.0` tagged and published. + d8c16b2); since then `v0.2.0` (PR #127, ded6e80) and the `v0.2.1` content: Codex resume-anchor + sweep (PR #129), perf flamegraph #32 (PR #130), and the FIX text adapter #29 (PR #131) +- **Last completed docs sync:** v0.2.1 release prep (this PR): version bump + CHANGELOG `[0.2.1]` + and resume/release anchors brought current +- **Release:** `v0.1.0` (tag on 9857e1a), `v0.2.0` (tag on ded6e80), and `v0.2.1` (tag created on the + squash-merge of the release PR, marked Latest) published as GitHub-only releases; no packages + published +- **`make check` passing:** yes — `make check` 261/261 and `make asan` 261/261 on the bare-metal + Apple M2 (aarch64) Fedora Asahi host on 2026-06-21 (includes the v0.2.1 FIX-adapter and flamegraph + renderer tests) +- **Last action:** delivered the `v0.2.1` content as scoped PRs and prepared this version-bump + release. Two reprioritized backlog items — the FIX-like text protocol adapter (#29) and the perf + call-graph flamegraph (#32) — plus the Codex resume-anchor/PMU consistency sweep (#127/#128 + follow-up). Ran Codex as an independent reviewer and fixed its findings; brought every touched file + through the CodeScene Code Health gate (table-driven enum maps, a `decode_typed` skeleton, split + `parse_envelope`, flattened `flamegraph.py`). `make check`/`make asan` 261/261. - **Next action:** no active milestone. Highest-value remaining work is non-code and gated: issue #94 (independent external review — needs a human reviewer) and issue #90 (full cache-counter PMU evidence — needs a PMU microarchitecture that exposes cache events, e.g. @@ -396,6 +396,21 @@ Lower priority: in `docs/fix_protocol.md` (+ pointer from `docs/binary_protocol.md`). `make check` 260/260 and `make asan` 260/260 clean (the parser handles untrusted text). Closes #29. Do not merge from automation; human squash-merges. +- [2026-06-21] Post-review code-health pass on #130/#131 after Codex + the CI CodeScene Code Health + gate flagged `flamegraph.py` and `fix.cpp` below the 10.0 health bar. `flamegraph.py`: bundled + render args into a `FlameOptions` dataclass + extracted `_append_chrome`/`_frame_svg`, flattened + `fold_perf_script` into a `_Folder`, replaced the nested dso scan with a regex, and dropped an + unused `_layout` arg. `fix.cpp`: table-driven enum maps via `FieldReader::coded`, a `decode_typed` + skeleton to remove decoder duplication, and `parse_envelope` split into + tokenize/check-shape/verify-length-checksum. Behavior unchanged (`make check`/`make asan` 261/261); + both PRs' CodeScene gate now passes. Also fixed three Codex findings (cancel `ClOrdID` enforcement; + flamegraph tab/non-positive collapsed parsing). The local CodeScene MCP token is expired, so the + authoritative gate is the CI `CodeScene Code Health Review` check. +- [2026-06-21] Prepared the `v0.2.1` release (`docs/v0.2.1-release`, stacked on the FIX adapter PR): + bumped `CMakeLists.txt` to 0.2.1, added the CHANGELOG `[0.2.1]` section (FIX adapter #29, perf + flamegraph #32, resume-anchor/PMU sweep), and brought the PROGRESS/HANDOFF release anchors current. + No code or benchmark artifacts change in the release PR itself. On squash-merge the human tags + `v0.2.1` on the merge commit and publishes the GitHub release. Do not merge from automation. - [2026-06-03] M35: implemented a multi-client TCP connection-scaling load test (`scripts/socket_load.sh`, `make socket-load`, Linux-only) driving N concurrent `qsl-client`s against the portable TCP and epoll (M34) gateways; `results/socket_load_summary.txt` is Docker-generated and constrained. A `/code-review` (3 finder agents) caught and fixed real measurement-integrity bugs before the PR: a failed trial's `wall=0` no longer poisons the reported best (only trials whose gateway served count toward the min); the `completed` column reports the WORST per-trial completion, not the last, so partial/total trial failures are surfaced rather than masked; a per-client `timeout` bounds a hang if the gateway dies; and `QSL_LOAD_TRIALS` is validated. Post-PR hardening uses fresh monotonic ports per gateway start, retries transient startup/serve failures on new ports, and refuses to write a partial artifact unless `QSL_LOAD_ALLOW_PARTIAL=1` is set intentionally; the refreshed artifact records `Dirty tree: no`. The scaling-shape claim remains constrained to loopback connection setup, not a demonstrated production-capacity advantage for either transport. Deferred follow-up: a shared `scripts/lib` to remove the dirty-tree / `wait_ready` / gateway-stop duplication across the three socket scripts. - [2026-06-03] M35: started after M34 (#98) squash-merged (commit 9e3750b). Scope: multi-client load / socket-pressure testing of the gateway/feed path (TCP/UDP stress, socket-buffer pressure, connection scaling, backpressure) building on M34's epoll multi-client path and M30's socket tooling. Constraints: scripts/tests document load shape + environment; results must distinguish kernel/socket pressure from user-space engine cost; no production-capacity claims (honest constrained-environment framing, like M29/M30). - [2026-06-04] M35: PR #100 squash-merged to `main` as a86b701 after all CI jobs and review checks were green. M35 is now landed; original M36 NUMA remains deferred until the repository-health refactor analysis is completed or explicitly skipped by the human. @@ -802,12 +817,13 @@ Quant Systems Lab — Linux Systems + Exchange Infrastructure Simulator ## Next action remains -There is no active milestone. `v0.2.0` is released (PR #127 ded6e80, tag on ded6e80, marked Latest; -resume-anchor sync PR #128 ae93545). M0–M49, the Linux host artifact refresh (PR #125, d9094df), and -the v0.2.0 release are all merged to `main`. The committed perf artifacts are **partial hardware PMU -evidence** from this bare-metal Apple M2 (aarch64) Fedora Asahi host — real -cycles/instructions/branches/branch-misses with cache-reference/cache-miss counters unsupported by -the Apple Silicon PMU — not NIC-offload, latency, or full hardware-PMU evidence. +There is no active milestone. `v0.2.1` is the current release, on top of `v0.2.0` (PR #127 ded6e80) +and `v0.1.0`. The `v0.2.1` content lands as the Codex resume-anchor sweep (PR #129), the perf +flamegraph #32 (PR #130), the FIX text adapter #29 (PR #131), and the version-bump release PR; the +human squash-merges those in order and tags `v0.2.1` on the release merge commit. The committed perf +artifacts remain **partial hardware PMU evidence** from this bare-metal Apple M2 (aarch64) Fedora +Asahi host — real cycles/instructions/branches/branch-misses with cache-reference/cache-miss counters +unsupported by the Apple Silicon PMU — not NIC-offload, latency, or full hardware-PMU evidence. Highest-value remaining work is non-code and gated: issue #94 (independent external review) and issue #90 (full cache-PMU evidence). Issue #90 needs a PMU **microarchitecture** that exposes cache From dfa4da28d3b5ca94efd33c066aeb9725b0de1464 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 22:13:47 -0400 Subject: [PATCH 16/22] docs: record resume-anchor sync in PROGRESS current-state (Codex #129) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The decision-log entry and bottom "Next action remains" block already covered this follow-up, but the top `## Current state` bullets — the first thing `/resume` reads — still presented the v0.2.0 release (PR #127) as the "Last action". A resuming agent could therefore miss that this resume-anchor / PMU sync already happened and duplicate it. Record the sync as the current "Last action" and "Last completed docs sync", demoting the v0.2.0 release detail to a "Prior action" bullet. Docs-only. Co-Authored-By: Claude Opus 4.8 --- PROGRESS.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/PROGRESS.md b/PROGRESS.md index 070fece..5888c0e 100644 --- a/PROGRESS.md +++ b/PROGRESS.md @@ -26,14 +26,22 @@ Do not rely on prior chat memory. - **Last completed milestone:** M49 — NIC offload and low-latency networking study (PR #124, d8c16b2), then the Linux host artifact refresh (PR #125, d9094df) and the v0.2.0 release (PR #127, ded6e80) -- **Last completed docs sync:** v0.2.0 documentation staleness sweep (PR #127): perf evidence - reframed as bare-metal partial PMU, release-readiness rewritten, every doc read and brought current +- **Last completed docs sync:** resume-anchor + PMU-claim sync (`docs/codex-resume-anchor-sync`, + this PR) resolving the Codex findings left on `main` by PRs #127/#128. Prior sweep: v0.2.0 + documentation staleness sweep (PR #127): perf evidence reframed as bare-metal partial PMU, + release-readiness rewritten, every doc read and brought current - **Release:** `v0.1.0` (tag on 9857e1a) and `v0.2.0` (tag on ded6e80, marked Latest) published as GitHub-only releases; no packages published - **`make check` passing:** yes — `make check` 241/241 and `make asan` 241/241 on the bare-metal Apple M2 (aarch64) Fedora Asahi host on 2026-06-21 -- **Last action:** prepared and released `v0.2.0`. Reframed the perf evidence from "constrained - Docker validation" to **partial hardware PMU evidence** on a bare-metal Apple M2 (real +- **Last action:** resume-anchor + PMU-claim sync on `docs/codex-resume-anchor-sync` resolving the + Codex review findings left on `main` by PRs #127/#128 — removed PROGRESS's stale "Next action + remains" block that still pointed `/resume` at the merged PR #125, brought AGENTS.md in line with + CLAUDE.md's v0.2.0 partial-PMU reframe (no more "constrained Docker validation" wording), and + narrowed docs/perf_analysis.md so the Apple Blizzard (E-core) PMU rows are not implied to carry + live counts. Docs/memory only; no code or artifacts changed (`make check` still 241/241). +- **Prior action (v0.2.0 release):** prepared and released `v0.2.0`. Reframed the perf evidence from + "constrained Docker validation" to **partial hardware PMU evidence** on a bare-metal Apple M2 (real cycles/instructions/branches/branch-misses; cache-references/cache-misses unsupported by the Apple Silicon PMU), with a new three-way `perf_stat.sh` classifier and a reframed issue #90. Regenerated all 15 `results/*.txt` on bare metal (`Dirty inputs: no`, MAC-leak grep clean), bumped the project From 31070b17677186aff151aff9402b6655a1f59723 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 22:18:17 -0400 Subject: [PATCH 17/22] perf: harden flamegraph.sh classification + sample gating (Codex #130) Address five Codex review findings on the flamegraph driver: 1. Classify `zero-sized data` (perf script's no-sample report) as a perf limitation, matching scripts/perf_record.sh, so the documented QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping the unexpected-failure exit. 2. Remove any prior results/flamegraph.svg when a partial run captures no folded stacks, so a constrained rerun cannot leave a previous host's SVG beside a .txt that says there is no sample report. 3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the minimum-sample gate on the authoritative folded sample total rather than perf record's self-described estimate. Report both counts. 4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`; a renderer/parser failure now exits 4 (unmaskable) rather than being published as a constrained-environment artifact. 5. Derive the sampling-kind label/caveat from the selected event (software cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment, and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc. Co-Authored-By: Claude Opus 4.8 --- scripts/flamegraph.sh | 76 +++++++++++++++++++++++++++++++------------ 1 file changed, 56 insertions(+), 20 deletions(-) diff --git a/scripts/flamegraph.sh b/scripts/flamegraph.sh index 7324f64..3d7dbfa 100755 --- a/scripts/flamegraph.sh +++ b/scripts/flamegraph.sh @@ -82,9 +82,10 @@ RECORD_ERR="$(mktemp)" SCRIPT_OUT="$(mktemp)" SCRIPT_ERR="$(mktemp)" FOLDED="$(mktemp)" +COLLAPSE_ERR="$(mktemp)" SVG_TMP="$(mktemp)" TXT_TMP="$(mktemp)" -trap 'rm -f "$BENCH_OUT" "$RECORD_BENCH_OUT" "$RECORD_ERR" "$SCRIPT_OUT" "$SCRIPT_ERR" "$FOLDED" "$SVG_TMP" "$TXT_TMP"' EXIT +trap 'rm -f "$BENCH_OUT" "$RECORD_BENCH_OUT" "$RECORD_ERR" "$SCRIPT_OUT" "$SCRIPT_ERR" "$FOLDED" "$COLLAPSE_ERR" "$SVG_TMP" "$TXT_TMP"' EXIT # Fail fast if the benchmark itself is broken (partial mode must not mask this). BENCH_STATUS=0 @@ -105,31 +106,52 @@ if [[ "$RECORD_STATUS" -eq 0 ]]; then fi PERF_LIMITATION=no -if grep -Eiq 'No samples|failed to open|Permission denied|Operation not permitted|perf_event_open|not supported|Operation not supported|perf not found for kernel|linux-tools' \ +# `zero-sized data` is how `perf script` reports a no-sample capture; classify it +# as a perf limitation here exactly as scripts/perf_record.sh does, so the +# documented constrained-host (QSL_PERF_ALLOW_PARTIAL=1) path works instead of +# tripping the unexpected-failure exit. +if grep -Eiq 'zero-sized data|No samples|failed to open|Permission denied|Operation not permitted|perf_event_open|not supported|Operation not supported|perf not found for kernel|linux-tools' \ "$RECORD_ERR" "$SCRIPT_ERR"; then PERF_LIMITATION=yes fi -SAMPLE_TOKEN="$(sed -nE 's/.*\(([0-9][0-9.,]*[KkMm]?) samples\).*/\1/p' "$RECORD_ERR" | head -1)" -SAMPLE_COUNT="$(parse_sample_count_token "$SAMPLE_TOKEN")" -[[ -z "$SAMPLE_COUNT" ]] && SAMPLE_COUNT=0 +# perf record prints its sample summary as "(N samples)" or, on some versions, +# "(~N samples)" — and that count is only its own estimate. Accept the optional +# `~` so the token is not dropped, but keep this value informational; the sample +# gate below uses the authoritative folded total, not this estimate. +SAMPLE_TOKEN="$(sed -nE 's/.*\(~?([0-9][0-9.,]*[KkMm]?) samples\).*/\1/p' "$RECORD_ERR" | head -1)" +PERF_EST_SAMPLES="$(parse_sample_count_token "$SAMPLE_TOKEN")" +[[ -z "$PERF_EST_SAMPLES" ]] && PERF_EST_SAMPLES=0 -# Fold to collapsed stacks for the text summary and as an SVG precondition. +# Fold to collapsed stacks for the text summary and as an SVG precondition. A +# nonzero COLLAPSE_STATUS means the renderer/parser itself failed (a generator +# regression), which is handled as an unexpected failure below — never masked as +# a perf sampling limitation. FOLDED_SAMPLES is the real sample total carried by +# the folded stacks (sum of trailing counts), the authoritative gate input. STACK_COUNT=0 +FOLDED_SAMPLES=0 +COLLAPSE_STATUS=0 if [[ "$SCRIPT_STATUS" -eq 0 && -s "$SCRIPT_OUT" ]]; then - python3 scripts/flamegraph.py --collapse-only <"$SCRIPT_OUT" >"$FOLDED" 2>/dev/null || true + python3 scripts/flamegraph.py --collapse-only <"$SCRIPT_OUT" >"$FOLDED" 2>"$COLLAPSE_ERR" || + COLLAPSE_STATUS=$? STACK_COUNT="$(wc -l <"$FOLDED" | tr -d ' ')" + FOLDED_SAMPLES="$(awk '{ s += $NF } END { printf "%d\n", s + 0 }' "$FOLDED")" fi INSUFFICIENT_SAMPLES=no -if [[ "$RECORD_STATUS" -eq 0 && "$SCRIPT_STATUS" -eq 0 && "$SAMPLE_COUNT" -lt "$MIN_SAMPLES" ]]; then +if [[ "$RECORD_STATUS" -eq 0 && "$SCRIPT_STATUS" -eq 0 && "$COLLAPSE_STATUS" -eq 0 && + "$FOLDED_SAMPLES" -lt "$MIN_SAMPLES" ]]; then INSUFFICIENT_SAMPLES=yes fi -ARTIFACT_TYPE="flamegraph ($EVENT software sampling hot-symbol profile)" -if [[ "$EVENT" == "cycles" ]]; then - ARTIFACT_TYPE="flamegraph (cycles hardware-PMU sampling hot-symbol profile)" -fi +# Describe the sampling source once so every label/caveat (artifact type, SVG +# comment, text companion) stays consistent: software timers vs a hardware PMU +# event. cpu-clock/task-clock are software; cycles/instructions/etc. are PMU. +case "$EVENT" in +cpu-clock | task-clock) SAMPLE_KIND="software $EVENT sampling" ;; +*) SAMPLE_KIND="$EVENT hardware-PMU sampling" ;; +esac +ARTIFACT_TYPE="flamegraph ($SAMPLE_KIND hot-symbol profile)" if [[ "$RECORD_STATUS" -ne 0 || "$SCRIPT_STATUS" -ne 0 || "$STACK_COUNT" -eq 0 ]]; then ARTIFACT_TYPE="constrained-environment validation (partial; no clean sample report)" elif [[ "$INSUFFICIENT_SAMPLES" == "yes" ]]; then @@ -139,7 +161,7 @@ fi PROVENANCE="$(qsl_emit_provenance "$PROVENANCE_SCOPE" "$OUT_SVG" "${PROVENANCE_INPUTS[@]}")" HOST="$(uname -s) $(uname -m)" DATE="$(qsl_utc_timestamp)" -SUBTITLE="$ARTIFACT_TYPE | $HOST | $EVENT @ ${FREQ}Hz | ${SAMPLE_COUNT} samples | ${STACK_COUNT} stacks | $DATE" +SUBTITLE="$ARTIFACT_TYPE | $HOST | $EVENT @ ${FREQ}Hz | ${FOLDED_SAMPLES} samples | ${STACK_COUNT} stacks | $DATE" # Render the SVG (deterministic for a fixed folded input + fixed subtitle). if [[ "$STACK_COUNT" -gt 0 ]]; then @@ -154,9 +176,9 @@ if [[ "$STACK_COUNT" -gt 0 ]]; then echo " Command: make flamegraph" echo " Artifact: $ARTIFACT_TYPE" echo " Record: perf record [call-graph $CALLGRAPH | -F $FREQ | -g | -e $EVENT]" - echo " Samples: $SAMPLE_COUNT | Folded stacks: $STACK_COUNT" - echo " Caveat: software cpu-clock sampling shows on-CPU time by symbol; it is" - echo " not a latency or throughput measurement and is hardware/build dependent." + echo " Samples (folded): $FOLDED_SAMPLES | perf record estimate: $PERF_EST_SAMPLES | Folded stacks: $STACK_COUNT" + echo " Caveat: $SAMPLE_KIND shows on-CPU time by symbol; it is not a latency" + echo " or throughput measurement and is hardware/build dependent." } | sed 's/--/- -/g' echo "-->" # Drop the renderer's own XML declaration; we emitted ours above. @@ -167,6 +189,11 @@ if [[ "$STACK_COUNT" -gt 0 ]]; then --from-collapsed <"$FOLDED" | tail -n +2 } >"$SVG_TMP" qsl_publish_artifact "$SVG_TMP" "$OUT_SVG" +else + # No clean folded stacks. Remove any prior SVG so a constrained rerun cannot + # leave a previous host's flamegraph beside a .txt that says there is no + # sample report — which could be committed as if the two still matched. + rm -f "$OUT_SVG" fi # Text companion: provenance + classification + top folded stacks (human/queryable). @@ -186,7 +213,8 @@ fi echo "Call graph: $CALLGRAPH" echo "Record event: $EVENT" echo "Sample freq: $FREQ Hz" - echo "Sample count: $SAMPLE_COUNT" + echo "Sample count (folded total): $FOLDED_SAMPLES" + echo "Sample count (perf record est.): $PERF_EST_SAMPLES" echo "Folded stacks: $STACK_COUNT" echo "Minimum samples for hot profile: $MIN_SAMPLES" echo "Insufficient samples: $INSUFFICIENT_SAMPLES" @@ -197,13 +225,13 @@ fi echo "Perf data: $DATA (generated, not intended for commit)" echo if [[ "$ARTIFACT_TYPE" == flamegraph* ]]; then - echo "Caveat: this flamegraph is a software cpu-clock sampling profile for hot-symbol" + echo "Caveat: this flamegraph is a $SAMPLE_KIND profile for hot-symbol" echo "investigation. Frame width is proportional to on-CPU samples, not wall-clock" echo "latency or throughput, and is hardware/kernel/compiler/build dependent." else echo "Caveat: constrained/partial perf validation, not a hot-symbol flamegraph. Treat" - echo "frame widths as unusable until sampling succeeds and Sample count meets the" - echo "Minimum samples for hot profile." + echo "frame widths as unusable until sampling succeeds and the folded sample total" + echo "meets the Minimum samples for hot profile." fi echo echo "Top $TOP_STACKS folded stacks (count stack):" @@ -224,6 +252,14 @@ qsl_publish_artifact "$TXT_TMP" "$OUT_TXT" echo "wrote $OUT_TXT" [[ "$STACK_COUNT" -gt 0 ]] && echo "wrote $OUT_SVG" +# A renderer/parser failure (perf script succeeded but flamegraph.py errored) is +# a generator bug, not a perf sampling limitation — fail hard so partial mode +# cannot publish a Python/parser regression as a constrained-environment artifact. +if [[ "$SCRIPT_STATUS" -eq 0 && "$COLLAPSE_STATUS" -ne 0 ]]; then + echo "error: flamegraph.py --collapse-only failed (status $COLLAPSE_STATUS); this is a renderer/parser failure, not a perf limitation, and partial mode cannot mask it." >&2 + cat "$COLLAPSE_ERR" >&2 + exit 4 +fi if [[ ("$RECORD_STATUS" -ne 0 || "$SCRIPT_STATUS" -ne 0) && "$PERF_LIMITATION" != "yes" ]]; then echo "error: perf record/script failed for a reason other than a perf access limitation." >&2 exit 3 From 06b76759216bd8f1e4937105749f8bdeb325683e Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 22:18:39 -0400 Subject: [PATCH 18/22] perf: regenerate flamegraph artifact after classification hardening Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded samples / 159 stacks, classified `flamegraph (software cpu-clock sampling hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the hardened scripts/flamegraph.sh; the .txt reports both the folded total and perf record's estimate. Co-Authored-By: Claude Opus 4.8 --- results/flamegraph.svg | 16 ++++++------- results/flamegraph.txt | 53 +++++++++++++++++++++--------------------- 2 files changed, 35 insertions(+), 34 deletions(-) diff --git a/results/flamegraph.svg b/results/flamegraph.svg index 378b45b..80466d2 100644 --- a/results/flamegraph.svg +++ b/results/flamegraph.svg @@ -2,18 +2,18 @@ QSL Matching-Engine Flame Graph (qsl-bench)flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 416 samples | 165 stacks | 2026-06-22T01:28:01ZSearch all (416 cpu-clock samples, 100.00%)allqsl-bench (416 cpu-clock samples, 100.00%)qsl-bench[unknown] (335 cpu-clock samples, 80.53%)[unknown][unknown] (317 cpu-clock samples, 76.20%)[unknown][unknown] (276 cpu-clock samples, 66.35%)[unknown][unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (2 cpu-clock samples, 0.48%)do_lookup_x (2 cpu-clock samples, 0.48%)_dl_lookup_symbol_x (1 cpu-clock samples, 0.24%)_dl_new_hash (1 cpu-clock samples, 0.24%)__libc_start_call_main (273 cpu-clock samples, 65.62%)__libc_start_call_mainmain (273 cpu-clock samples, 65.62%)maincfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (39 cpu-clock samples, 9.38%)qsl::engine:..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (5 cpu-clock samples, 1.20%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (30 cpu-clock samples, 7.21%)qsl::engi..operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (13 cpu-clock samples, 3.12%)qs..std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (12 cpu-clock samples, 2.88%)st..std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (8 cpu-clock samples, 1.92%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (2 cpu-clock samples, 0.48%)std::pmr::(anonymous namespace)::newdel_res_t::do_allocate(unsigned long, unsigned long) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::cancel(unsigned long) (42 cpu-clock samples, 10.10%)qsl::engine::O..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (41 cpu-clock samples, 9.86%)decltype(auto..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (19 cpu-clock samples, 4.57%)qsl:..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (2 cpu-clock samples, 0.48%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.24%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (13 cpu-clock samples, 3.12%)qs..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (74 cpu-clock samples, 17.79%)qsl::gateway::Session::on_b..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (73 cpu-clock samples, 17.55%)qsl::gateway::Session::on_..qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (68 cpu-clock samples, 16.35%)qsl::gateway::Session::p..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (20 cpu-clock samples, 4.81%)qsl::..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (10 cpu-clock samples, 2.40%)q..__memcpy_generic (4 cpu-clock samples, 0.96%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::protocol::encode(qsl::protocol::Ack const&) (1 cpu-clock samples, 0.24%)qsl::protocol::encode(qsl::protocol::Fill const&) (2 cpu-clock samples, 0.48%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (34 cpu-clock samples, 8.17%)qsl::gatew..qsl::engine::MatchingEngine::can_store_limit(unsigned int, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (3 cpu-clock samples, 0.72%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (25 cpu-clock samples, 6.01%)qsl::en..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (14 cpu-clock samples, 3.37%)qs..__memcpy_generic (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (9 cpu-clock samples, 2.16%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (2 cpu-clock samples, 0.48%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (5 cpu-clock samples, 1.20%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.72%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (14 cpu-clock samples, 3.37%)qs..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (6 cpu-clock samples, 1.44%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (29 cpu-clock samples, 6.97%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (3 cpu-clock samples, 0.72%)qsl::engine::OrderBook::cancel(unsigned long) (2 cpu-clock samples, 0.48%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (4 cpu-clock samples, 0.96%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (2 cpu-clock samples, 0.48%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.48%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (17 cpu-clock samples, 4.09%)qsl:..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (12 cpu-clock samples, 2.88%)qs..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (6 cpu-clock samples, 1.44%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (4 cpu-clock samples, 0.96%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 0.96%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (3 cpu-clock samples, 0.72%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (18 cpu-clock samples, 4.33%)qsl:..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (8 cpu-clock samples, 1.92%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (4 cpu-clock samples, 0.96%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.48%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (33 cpu-clock samples, 7.93%)qsl::repla..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (28 cpu-clock samples, 6.73%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::cancel(unsigned long) (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (9 cpu-clock samples, 2.16%)qsl::engine::OrderBook::can_apply_modify(unsigned long, long, unsigned int) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (5 cpu-clock samples, 1.20%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.72%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.72%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (15 cpu-clock samples, 3.61%)qsl..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (8 cpu-clock samples, 1.92%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.20%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.48%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (4 cpu-clock samples, 0.96%)operator new(unsigned long) (5 cpu-clock samples, 1.20%)malloc@plt (5 cpu-clock samples, 1.20%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (10 cpu-clock samples, 2.40%)q..[unknown] (10 cpu-clock samples, 2.40%)[..[unknown] (10 cpu-clock samples, 2.40%)[..[unknown] (7 cpu-clock samples, 1.68%)[unknown] (5 cpu-clock samples, 1.20%)_mid_memalign (5 cpu-clock samples, 1.20%)__posix_memalign (2 cpu-clock samples, 0.48%)_mid_memalign (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (3 cpu-clock samples, 0.72%)__posix_memalign (3 cpu-clock samples, 0.72%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (10 cpu-clock samples, 2.40%)q..[unknown] (9 cpu-clock samples, 2.16%)[unknown] (9 cpu-clock samples, 2.16%)[unknown] (8 cpu-clock samples, 1.92%)[unknown] (3 cpu-clock samples, 0.72%)_mid_memalign (3 cpu-clock samples, 0.72%)__posix_memalign (5 cpu-clock samples, 1.20%)malloc (5 cpu-clock samples, 1.20%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (9 cpu-clock samples, 2.16%)[unknown] (5 cpu-clock samples, 1.20%)[unknown] (5 cpu-clock samples, 1.20%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.48%)operator new(unsigned long) (3 cpu-clock samples, 0.72%)malloc (2 cpu-clock samples, 0.48%)free@plt (3 cpu-clock samples, 0.72%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)operator new(unsigned long) (4 cpu-clock samples, 0.96%)malloc (3 cpu-clock samples, 0.72%)operator new(unsigned long) (3 cpu-clock samples, 0.72%)malloc@plt (3 cpu-clock samples, 0.72%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)posix_memalign@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)__posix_memalign (2 cpu-clock samples, 0.48%)malloc (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.20%)[unknown] (5 cpu-clock samples, 1.20%)[unknown] (5 cpu-clock samples, 1.20%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (1 cpu-clock samples, 0.24%)_mid_memalign (1 cpu-clock samples, 0.24%)__posix_memalign (2 cpu-clock samples, 0.48%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.48%)__posix_memalign (1 cpu-clock samples, 0.24%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (2 cpu-clock samples, 0.48%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (4 cpu-clock samples, 0.96%)operator new(unsigned long, std::align_val_t)@plt (4 cpu-clock samples, 0.96%)__libc_start_call_main (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)cfree@GLIBC_2.17 (7 cpu-clock samples, 1.68%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (11 cpu-clock samples, 2.64%)d..[unknown] (11 cpu-clock samples, 2.64%)[..[unknown] (11 cpu-clock samples, 2.64%)[..cfree@GLIBC_2.17 (11 cpu-clock samples, 2.64%)c..main (14 cpu-clock samples, 3.37%)main[unknown] (10 cpu-clock samples, 2.40%)[..[unknown] (10 cpu-clock samples, 2.40%)[..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (9 cpu-clock samples, 2.16%)malloc (6 cpu-clock samples, 1.44%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (2 cpu-clock samples, 0.48%)operator new(unsigned long) (5 cpu-clock samples, 1.20%)malloc@plt (5 cpu-clock samples, 1.20%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)posix_memalign@plt (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (4 cpu-clock samples, 0.96%)[unknown] (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (20 cpu-clock samples, 4.81%)qsl::..[unknown] (17 cpu-clock samples, 4.09%)[unk..[unknown] (17 cpu-clock samples, 4.09%)[unk..[unknown] (13 cpu-clock samples, 3.12%)[u..[unknown] (9 cpu-clock samples, 2.16%)_mid_memalign (9 cpu-clock samples, 2.16%)__posix_memalign (4 cpu-clock samples, 0.96%)malloc (3 cpu-clock samples, 0.72%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.48%)__posix_memalign (1 cpu-clock samples, 0.24%)memcpy@plt (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.48%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (4 cpu-clock samples, 0.96%)free@plt (1 cpu-clock samples, 0.24%)memcpy@plt (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (2 cpu-clock samples, 0.48%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (6 cpu-clock samples, 1.44%)[unknown] (6 cpu-clock samples, 1.44%)[unknown] (6 cpu-clock samples, 1.44%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)operator new(unsigned long) (3 cpu-clock samples, 0.72%)malloc (3 cpu-clock samples, 0.72%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (2 cpu-clock samples, 0.48%)memcpy@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%) +]]>QSL Matching-Engine Flame Graph (qsl-bench)flamegraph (software cpu-clock sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 329 samples | 159 stacks | 2026-06-22T02:18:23ZSearch all (329 cpu-clock samples, 100.00%)allqsl-bench (329 cpu-clock samples, 100.00%)qsl-bench[unknown] (251 cpu-clock samples, 76.29%)[unknown][unknown] (237 cpu-clock samples, 72.04%)[unknown][unknown] (201 cpu-clock samples, 61.09%)[unknown][unknown] (2 cpu-clock samples, 0.61%)[unknown] (2 cpu-clock samples, 0.61%)[unknown] (2 cpu-clock samples, 0.61%)[unknown] (2 cpu-clock samples, 0.61%)[unknown] (1 cpu-clock samples, 0.30%)do_lookup_x (1 cpu-clock samples, 0.30%)_dl_lookup_symbol_x (1 cpu-clock samples, 0.30%)_dl_new_hash (1 cpu-clock samples, 0.30%)__libc_start_call_main (199 cpu-clock samples, 60.49%)__libc_start_call_mainmain (199 cpu-clock samples, 60.49%)maincfree@GLIBC_2.17 (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (20 cpu-clock samples, 6.08%)qsl::en..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (2 cpu-clock samples, 0.61%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.61%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (13 cpu-clock samples, 3.95%)qsl..operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 1.22%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (3 cpu-clock samples, 0.91%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.30%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (7 cpu-clock samples, 2.13%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (3 cpu-clock samples, 0.91%)std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::cancel(unsigned long) (18 cpu-clock samples, 5.47%)qsl::e..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (18 cpu-clock samples, 5.47%)declty..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (13 cpu-clock samples, 3.95%)qsl..std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (4 cpu-clock samples, 1.22%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.30%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.30%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (3 cpu-clock samples, 0.91%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (2 cpu-clock samples, 0.61%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (56 cpu-clock samples, 17.02%)qsl::gateway::Session::on..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (56 cpu-clock samples, 17.02%)qsl::gateway::Session::on..qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (53 cpu-clock samples, 16.11%)qsl::gateway::Session::p..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.30%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (13 cpu-clock samples, 3.95%)qsl..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.91%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (5 cpu-clock samples, 1.52%)__memcpy_generic (3 cpu-clock samples, 0.91%)qsl::protocol::encode(qsl::protocol::Fill const&) (2 cpu-clock samples, 0.61%)operator new(unsigned long) (1 cpu-clock samples, 0.30%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (33 cpu-clock samples, 10.03%)qsl::gateway::..qsl::engine::MatchingEngine::can_store_limit(unsigned int, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (2 cpu-clock samples, 0.61%)qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (4 cpu-clock samples, 1.22%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (1 cpu-clock samples, 0.30%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (16 cpu-clock samples, 4.86%)qsl::..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (3 cpu-clock samples, 0.91%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (2 cpu-clock samples, 0.61%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.30%)qsl::engine::check_limit(qsl::engine::RiskConfig const&, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.30%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.91%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.91%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (1 cpu-clock samples, 0.30%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (15 cpu-clock samples, 4.56%)qsl:..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (1 cpu-clock samples, 0.30%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (33 cpu-clock samples, 10.03%)qsl::replay::a..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (4 cpu-clock samples, 1.22%)qsl::engine::OrderBook::cancel(unsigned long) (3 cpu-clock samples, 0.91%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.91%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.61%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.30%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.30%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (5 cpu-clock samples, 1.52%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (5 cpu-clock samples, 1.52%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.91%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.61%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.30%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (17 cpu-clock samples, 5.17%)qsl::..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (11 cpu-clock samples, 3.34%)qs..qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (7 cpu-clock samples, 2.13%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.61%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (2 cpu-clock samples, 0.61%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (3 cpu-clock samples, 0.91%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.91%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.61%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.30%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.30%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.30%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (3 cpu-clock samples, 0.91%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.61%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.61%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.30%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (18 cpu-clock samples, 5.47%)qsl::r..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (11 cpu-clock samples, 3.34%)qs..qsl::engine::OrderBook::contains(unsigned long) const (5 cpu-clock samples, 1.52%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (2 cpu-clock samples, 0.61%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.61%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.30%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.30%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.30%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (34 cpu-clock samples, 10.33%)qsl::replay::r..operator delete(void*, unsigned long) (1 cpu-clock samples, 0.30%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (26 cpu-clock samples, 7.90%)qsl::repla..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (3 cpu-clock samples, 0.91%)qsl::engine::OrderBook::cancel(unsigned long) (1 cpu-clock samples, 0.30%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.30%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.30%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.30%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (19 cpu-clock samples, 5.78%)qsl::e..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (17 cpu-clock samples, 5.17%)qsl::..operator delete(void*, unsigned long) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 1.22%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.30%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (11 cpu-clock samples, 3.34%)qs..qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (7 cpu-clock samples, 2.13%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (4 cpu-clock samples, 1.22%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (2 cpu-clock samples, 0.61%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (3 cpu-clock samples, 0.91%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.30%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (3 cpu-clock samples, 0.91%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.61%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.30%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.91%)operator new(unsigned long) (5 cpu-clock samples, 1.52%)malloc@plt (5 cpu-clock samples, 1.52%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.61%)posix_memalign@plt (2 cpu-clock samples, 0.61%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 1.22%)[unknown] (4 cpu-clock samples, 1.22%)[unknown] (4 cpu-clock samples, 1.22%)[unknown] (2 cpu-clock samples, 0.61%)__posix_memalign (2 cpu-clock samples, 0.61%)malloc (2 cpu-clock samples, 0.61%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.61%)__posix_memalign (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (7 cpu-clock samples, 2.13%)[unknown] (5 cpu-clock samples, 1.52%)[unknown] (5 cpu-clock samples, 1.52%)[unknown] (5 cpu-clock samples, 1.52%)[unknown] (1 cpu-clock samples, 0.30%)_mid_memalign (1 cpu-clock samples, 0.30%)__posix_memalign (4 cpu-clock samples, 1.22%)malloc (3 cpu-clock samples, 0.91%)operator new(unsigned long, std::align_val_t)@plt (2 cpu-clock samples, 0.61%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (10 cpu-clock samples, 3.04%)qs..[unknown] (9 cpu-clock samples, 2.74%)[..[unknown] (9 cpu-clock samples, 2.74%)[..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.91%)operator new(unsigned long) (6 cpu-clock samples, 1.82%)malloc (4 cpu-clock samples, 1.22%)operator delete(void*)@plt (1 cpu-clock samples, 0.30%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (8 cpu-clock samples, 2.43%)q..[unknown] (8 cpu-clock samples, 2.43%)[..[unknown] (8 cpu-clock samples, 2.43%)[..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.30%)operator new(unsigned long) (7 cpu-clock samples, 2.13%)malloc (4 cpu-clock samples, 1.22%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.30%)main (1 cpu-clock samples, 0.30%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.30%)operator new(unsigned long) (1 cpu-clock samples, 0.30%)malloc@plt (1 cpu-clock samples, 0.30%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.30%)posix_memalign@plt (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.91%)[unknown] (3 cpu-clock samples, 0.91%)[unknown] (3 cpu-clock samples, 0.91%)[unknown] (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)_mid_memalign (1 cpu-clock samples, 0.30%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.30%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.30%)__posix_memalign (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (3 cpu-clock samples, 0.91%)[unknown] (3 cpu-clock samples, 0.91%)[unknown] (3 cpu-clock samples, 0.91%)[unknown] (3 cpu-clock samples, 0.91%)[unknown] (1 cpu-clock samples, 0.30%)_mid_memalign (1 cpu-clock samples, 0.30%)__posix_memalign (2 cpu-clock samples, 0.61%)malloc (1 cpu-clock samples, 0.30%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (3 cpu-clock samples, 0.91%)[unknown] (2 cpu-clock samples, 0.61%)[unknown] (2 cpu-clock samples, 0.61%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.61%)free@plt (1 cpu-clock samples, 0.30%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.30%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.30%)__libc_start_call_main (9 cpu-clock samples, 2.74%)_..[unknown] (9 cpu-clock samples, 2.74%)[..[unknown] (9 cpu-clock samples, 2.74%)[..[unknown] (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)unlink_chunk.isra.0 (1 cpu-clock samples, 0.30%)cfree@GLIBC_2.17 (8 cpu-clock samples, 2.43%)c..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (4 cpu-clock samples, 1.22%)[unknown] (4 cpu-clock samples, 1.22%)[unknown] (4 cpu-clock samples, 1.22%)cfree@GLIBC_2.17 (4 cpu-clock samples, 1.22%)main (11 cpu-clock samples, 3.34%)main[unknown] (5 cpu-clock samples, 1.52%)[unknown] (5 cpu-clock samples, 1.52%)[unknown] (1 cpu-clock samples, 0.30%)_int_free_merge_chunk (1 cpu-clock samples, 0.30%)operator new(unsigned long) (4 cpu-clock samples, 1.22%)malloc (4 cpu-clock samples, 1.22%)free@plt (2 cpu-clock samples, 0.61%)operator delete(void*)@plt (3 cpu-clock samples, 0.91%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.30%)operator new(unsigned long) (4 cpu-clock samples, 1.22%)malloc@plt (4 cpu-clock samples, 1.22%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (8 cpu-clock samples, 2.43%)q..[unknown] (3 cpu-clock samples, 0.91%)[unknown] (3 cpu-clock samples, 0.91%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.61%)operator new(unsigned long) (1 cpu-clock samples, 0.30%)malloc (1 cpu-clock samples, 0.30%)free@plt (1 cpu-clock samples, 0.30%)operator delete(void*)@plt (1 cpu-clock samples, 0.30%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.30%)operator new(unsigned long)@plt (2 cpu-clock samples, 0.61%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.30%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (12 cpu-clock samples, 3.65%)qsl..[unknown] (10 cpu-clock samples, 3.04%)[u..[unknown] (10 cpu-clock samples, 3.04%)[u..[unknown] (7 cpu-clock samples, 2.13%)[unknown] (1 cpu-clock samples, 0.30%)_mid_memalign (1 cpu-clock samples, 0.30%)__posix_memalign (6 cpu-clock samples, 1.82%)malloc (4 cpu-clock samples, 1.22%)operator new(unsigned long, std::align_val_t) (3 cpu-clock samples, 0.91%)__posix_memalign (2 cpu-clock samples, 0.61%)memcpy@plt (1 cpu-clock samples, 0.30%)operator delete(void*)@plt (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (11 cpu-clock samples, 3.34%)qs..operator delete(void*, std::align_val_t)@plt (5 cpu-clock samples, 1.52%)operator delete(void*, unsigned long, std::align_val_t)@plt (5 cpu-clock samples, 1.52%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)@plt (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.30%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.30%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (1 cpu-clock samples, 0.30%)operator delete(void*)@plt (1 cpu-clock samples, 0.30%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (3 cpu-clock samples, 0.91%)[unknown] (2 cpu-clock samples, 0.61%)[unknown] (2 cpu-clock samples, 0.61%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.61%)memcpy@plt (1 cpu-clock samples, 0.30%)qsl::protocol::encode(qsl::protocol::Ack const&) (1 cpu-clock samples, 0.30%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.30%)qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (1 cpu-clock samples, 0.30%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.30%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)operator new(unsigned long) (1 cpu-clock samples, 0.30%)malloc (1 cpu-clock samples, 0.30%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (1 cpu-clock samples, 0.30%)memcpy@plt (1 cpu-clock samples, 0.30%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (7 cpu-clock samples, 2.13%)free@plt (2 cpu-clock samples, 0.61%)operator delete(void*, unsigned long, std::align_val_t)@plt (5 cpu-clock samples, 1.52%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.61%)free@plt (1 cpu-clock samples, 0.30%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)@plt (1 cpu-clock samples, 0.30%) diff --git a/results/flamegraph.txt b/results/flamegraph.txt index ea163fb..4969a22 100644 --- a/results/flamegraph.txt +++ b/results/flamegraph.txt @@ -1,5 +1,5 @@ Command: make flamegraph -Artifact: flamegraph (cpu-clock software sampling hot-symbol profile) +Artifact: flamegraph (software cpu-clock sampling hot-symbol profile) Hardware: aarch64 OS: Linux 6.19.14-400.asahi.fc44.aarch64+16k CPU: Avalanche-M2 @@ -8,19 +8,20 @@ Perf: perf version 6.19.14-400.asahi.fc44.aarch64 Perf paranoid: 2 Build type: Release Provenance version: 1 -Git commit (informational): 4aec1d0 -Source digest: sha256:619c700c4c9b872ffd42e0b4145d73f06548f971c50b2158398a7722b3d5f41a +Git commit (informational): 31070b1 +Source digest: sha256:6aa521e6295a99f9dbf7dee9e5bcef04e93174ed12c3e8de9b991a8bfc14c809 Source digest scope: flamegraph-benchmark Dirty inputs: no Generated output: results/flamegraph.svg -Date: 2026-06-22T01:28:01Z +Date: 2026-06-22T02:18:23Z Benchmark binary: build/bench/qsl-bench Dataset: qsl-bench default synthetic benchmark suite Call graph: dwarf Record event: cpu-clock Sample freq: 4000 Hz -Sample count: 416 -Folded stacks: 165 +Sample count (folded total): 329 +Sample count (perf record est.): 329 +Folded stacks: 159 Minimum samples for hot profile: 200 Insufficient samples: no Record status: 0 @@ -34,25 +35,25 @@ investigation. Frame width is proportional to on-CPU samples, not wall-clock latency or throughput, and is hardware/kernel/compiler/build dependent. Top 15 folded stacks (count stack): - 20 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] - 14 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) - 14 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) - 13 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) - 11 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 - 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long);std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&) - 9 qsl-bench;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];[unknown];[unknown];_mid_memalign - 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) - 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long);qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const - 7 qsl-bench;__libc_start_call_main;[unknown];[unknown];cfree@GLIBC_2.17 - 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) - 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&);qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - 6 qsl-bench;main;[unknown];[unknown];operator new(unsigned long);malloc - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) + 15 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) + 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long);qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const + 8 qsl-bench;__libc_start_call_main;[unknown];[unknown];cfree@GLIBC_2.17 + 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 5 qsl-bench;qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);operator delete(void*, std::align_val_t)@plt + 5 qsl-bench;qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);operator delete(void*, unsigned long, std::align_val_t)@plt + 5 qsl-bench;std::_Hashtable, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node, false>*);operator delete(void*, unsigned long, std::align_val_t)@plt + 5 qsl-bench;[unknown];[unknown];operator new(unsigned long);malloc@plt + 5 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long);qsl::engine::OrderBook::contains(unsigned long) const + 4 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 + 4 qsl-bench;main;[unknown];[unknown];operator new(unsigned long);malloc + 4 qsl-bench;operator new(unsigned long);malloc@plt Benchmark output: -order_book add/mod/cancel 200000 ops 132.9 ns/op 7523640 ops/sec -protocol encode+decode 500000 ops 19.8 ns/op 50418890 ops/sec -gateway session (fill) 200000 ops 127.6 ns/op 7838397 ops/sec -matching engine flow 5004 items 102.5 ns/item 9759934 items/sec -replay command log 5004 items 111.8 ns/item 8943232 items/sec +order_book add/mod/cancel 200000 ops 132.8 ns/op 7531861 ops/sec +protocol encode+decode 500000 ops 20.5 ns/op 48773893 ops/sec +gateway session (fill) 200000 ops 127.4 ns/op 7848348 ops/sec +matching engine flow 5004 items 101.6 ns/item 9840697 items/sec +replay command log 5004 items 112.0 ns/item 8928265 items/sec From f213ee71655eb1daf5fbc545167e61dc44c0b140 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 22:24:42 -0400 Subject: [PATCH 19/22] fix: enforce FIX envelope MsgType position + reject duplicate tags (Codex #131) Two Codex review findings on the FIX adapter's parser strictness: - check_envelope_shape now requires MsgType (35) as the first body field, immediately after BodyLength, so a non-standard header like 8/9/34/35/.../10 is rejected as Malformed instead of decoding via a first-match scan. - tokenize now rejects any repeated tag. This adapter maps each business tag exactly once (no repeating groups), so a duplicate such as `55=2` then `55=999` is an ambiguous/malformed frame rather than a silently-ignored later value. Adds a deterministic rejection test for each. (The earlier ClOrdID-required finding was already resolved by 3e4c8e3 and is covered by an existing test.) make check 263/263, make asan 263/263. Co-Authored-By: Claude Opus 4.8 --- src/protocol/fix.cpp | 12 +++++++++++- tests/unit/test_fix_protocol.cpp | 19 +++++++++++++++++++ 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/src/protocol/fix.cpp b/src/protocol/fix.cpp index a8c75ec..8d05efd 100644 --- a/src/protocol/fix.cpp +++ b/src/protocol/fix.cpp @@ -83,19 +83,29 @@ template [[nodiscard]] bool parse_int(std::string_view sv, Int &out) if (out.count >= kMaxFields) { return FixError::Malformed; // too many fields } + if (find_field(out, tag) != nullptr) { + // This adapter maps each business tag exactly once (no repeating + // groups), so a repeated tag is an ambiguous/malformed frame rather + // than a silently-ignored later value (e.g. 55=2 then 55=999). + return FixError::Malformed; // duplicate tag + } out.fields[out.count++] = Field{tag, msg.substr(eq + 1, soh - (eq + 1)), field_start}; pos = soh + 1; } return FixError::None; } -// Confirm the 8 / 9 / ... / 10 field ordering and the supported BeginString. +// Confirm the standard 8 / 9 / 35 / ... / 10 envelope: BeginString, BodyLength, +// MsgType as the first body field, CheckSum last, and a supported BeginString. [[nodiscard]] FixError check_envelope_shape(const Parsed &p) noexcept { if (p.count < 3) { return FixError::Malformed; } const Field &begin = p.fields[0]; + // MsgType (35) must be the first body field, immediately after BodyLength, so + // a frame like 8/9/34/35/.../10 is rejected rather than decoded. const bool ordered = begin.tag == kTagBeginString && p.fields[1].tag == kTagBodyLength && + p.fields[2].tag == kTagMsgType && p.fields[p.count - 1].tag == kTagCheckSum; if (!ordered) { return FixError::Malformed; diff --git a/tests/unit/test_fix_protocol.cpp b/tests/unit/test_fix_protocol.cpp index 489e5b4..1a6f5d2 100644 --- a/tests/unit/test_fix_protocol.cpp +++ b/tests/unit/test_fix_protocol.cpp @@ -156,6 +156,25 @@ TEST_CASE("FIX malformed framing rejects deterministically", "[fix]") { fix::FixError::Malformed); } +TEST_CASE("FIX MsgType must be the first body field", "[fix]") { + // 8/9/34/35/.../10 — every required NewOrder field is present, but MsgType + // (35) does not immediately follow BodyLength. A first-match scan would still + // decode this; the standard envelope requires 35 first, so it is malformed. + std::string body = field(34, "1") + field(35, "D") + field(11, "1") + field(55, "2") + + field(54, "1") + field(38, "10") + field(40, "2") + field(44, "100") + + field(59, "1"); + REQUIRE(fix::decode_new_order(wrap(body)).error == fix::FixError::Malformed); +} + +TEST_CASE("FIX duplicate tag rejects deterministically", "[fix]") { + // Symbol (55) repeated. First-wins parsing would silently take 2 and ignore + // 999; with no repeating groups, the frame is ambiguous and rejected. + std::string body = field(35, "D") + field(34, "1") + field(11, "1") + field(55, "2") + + field(55, "999") + field(54, "1") + field(38, "10") + field(40, "2") + + field(44, "100") + field(59, "1"); + REQUIRE(fix::decode_new_order(wrap(body)).error == fix::FixError::Malformed); +} + TEST_CASE("FIX oversized message rejects", "[fix]") { std::string body = field(35, "D"); body += field(34, "1"); From 2abb9ca890cebe9e918db4a03e2fbc87185c11d3 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 22:24:42 -0400 Subject: [PATCH 20/22] docs: drop delivered #29 from open-backlog anchors (Codex #131) The decision log marks #29 (FIX adapter) closed by this PR, but the current-state / resume anchors in PROGRESS.md and HANDOFF.md still listed it as open backlog, so /resume could send the next session to re-implement work this PR just added. Remove #29 from those backlog lists and note it as delivered in this PR. (#32 stays listed here; the v0.2.1 release PR removes it as part of its release sweep.) Co-Authored-By: Claude Opus 4.8 --- HANDOFF.md | 10 ++++++---- PROGRESS.md | 5 +++-- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/HANDOFF.md b/HANDOFF.md index 0bee945..ba1dafa 100644 --- a/HANDOFF.md +++ b/HANDOFF.md @@ -82,15 +82,16 @@ Current state: provides real cycles/instructions/branches/branch-misses but no cache-reference/cache-miss support - issues #95, #28, and #26 were closed by PR #112 - open review request issue: #94 -- legacy backlog still open: #29 and #32 +- legacy backlog still open: #32 (#29 delivered in this PR, `feat/fix-text-protocol-adapter`) ### Next milestone There is no active milestone. M0–M49, the Linux artifact refresh (PR #125), and the v0.2.0 release (PR #127) are merged. The highest-value remaining work is non-code and externally gated: issue #94 (independent external review — needs a human reviewer) and issue #90 (full cache-counter PMU -evidence — needs a PMU microarchitecture that exposes cache events). Low-signal backlog: #32 -(flamegraph) and #29 (FIX adapter). Do not invent a new milestone without an explicit human request. +evidence — needs a PMU microarchitecture that exposes cache events). #29 (FIX-like text protocol +adapter) is delivered in this PR; low-signal backlog: #32 (flamegraph). Do not invent a new +milestone without an explicit human request. ### Phase III / IV purpose @@ -107,7 +108,8 @@ Current priority order (post-v0.2.0): 2. Issue #90 — full cache-counter PMU evidence. The bare-metal Apple host gives real cycles/instructions/branches/branch-misses but no cache-reference/cache-miss counters, so this needs a PMU microarchitecture that exposes cache events (x86_64, or an ARM server core). -3. Low-signal backlog only after the above: #32 (flamegraph), #29 (FIX adapter). +3. Low-signal backlog only after the above: #32 (flamegraph). #29 (FIX adapter) is delivered in + this PR (`feat/fix-text-protocol-adapter`). ### Forbidden shortcuts diff --git a/PROGRESS.md b/PROGRESS.md index dca11dc..751757e 100644 --- a/PROGRESS.md +++ b/PROGRESS.md @@ -52,7 +52,8 @@ Do not rely on prior chat memory. - **Next action:** no active milestone. Highest-value remaining work is non-code and gated: issue #94 (independent external review — needs a human reviewer) and issue #90 (full cache-counter PMU evidence — needs a PMU microarchitecture that exposes cache events, e.g. - x86_64). Low-signal backlog: #32 (flamegraph), #29 (FIX adapter). + x86_64). #29 (FIX-like text protocol adapter) is delivered in this PR + (`feat/fix-text-protocol-adapter`). Low-signal backlog: #32 (flamegraph). - **Blockers:** issue #90 is now a *cache-counter* PMU gap, not a host-access gap — this bare-metal Apple M2 exposes real `cycles`/`instructions`/`branches`/`branch-misses` but its PMU does not implement `cache-references`/`cache-misses`; closing it needs a PMU microarchitecture that exposes @@ -60,7 +61,7 @@ Do not rely on prior chat memory. review (human-gated). Hardware NIC/offload latency measurement still requires suitable wired NIC hardware, driver support, timestamping/offload/RSS access, and a measured packet workload; the current `wld0` Wi-Fi capability observation is not NIC-offload latency evidence. Legacy backlog - still includes #32 and #29. Issues #95, #28, and #26 were closed by PR #112. + still includes #32 (#29 delivered in this PR). Issues #95, #28, and #26 were closed by PR #112. --- From 4a2aa67dd70769faef37a8a9cf983b302e0742c7 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Mon, 22 Jun 2026 12:28:33 -0400 Subject: [PATCH 21/22] docs: scope partial-PMU claim to perf-stat; perf-record is a software profile (Codex #129) The constraints bullet labeled all perf artifacts as partial hardware PMU evidence, but only results/perf_stat_linux.txt carries real PMU counters (cycles/instructions/branches/branch-misses). results/perf_report_linux.txt is a software cpu-clock sampling profile, not PMU evidence. Scope the claim to the perf-stat artifact and call out perf-record separately, identically in AGENTS.md and CLAUDE.md so the two memories stay in sync. Co-Authored-By: Claude Opus 4.8 --- AGENTS.md | 14 ++++++++------ CLAUDE.md | 14 ++++++++------ 2 files changed, 16 insertions(+), 12 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 3b303db..0a7abc4 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -159,12 +159,14 @@ Known constraints: - The gateway and feed are loopback-only, unauthenticated simulator surfaces. - The core engine cannot depend on wall-clock time or floating-point prices. -- Perf artifacts are now **partial hardware PMU evidence** from a bare-metal Apple M2 (aarch64) - Fedora Asahi host: real `cycles`/`instructions`/`branches`/`branch-misses`, but - `cache-references`/`cache-misses` are unsupported by the Apple Silicon PMU. Issue #90's residual - is the cache-counter set specifically, which needs a PMU microarchitecture that exposes it - (x86_64, or an ARM server core) — bare metal alone is not enough. Do not relabel these as either - "full PMU evidence" or "constrained Docker validation". +- The `perf stat` artifact (`results/perf_stat_linux.txt`) is now **partial hardware PMU evidence** + from a bare-metal Apple M2 (aarch64) Fedora Asahi host: real + `cycles`/`instructions`/`branches`/`branch-misses`, but `cache-references`/`cache-misses` are + unsupported by the Apple Silicon PMU. Issue #90's residual is the cache-counter set specifically, + which needs a PMU microarchitecture that exposes it (x86_64, or an ARM server core) — bare metal + alone is not enough. Do not relabel it "full PMU evidence" or "constrained Docker validation". The + `perf record` hot-symbol report (`results/perf_report_linux.txt`) is a **software cpu-clock + sampling** profile, not PMU evidence. - Issue #94 external review remains one of the highest remaining credibility signals; do not imply independent review has happened until `docs/review_feedback.md` records it. diff --git a/CLAUDE.md b/CLAUDE.md index 5c52266..5fbdf6c 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -159,12 +159,14 @@ Known constraints: - The gateway and feed are loopback-only, unauthenticated simulator surfaces. - The core engine cannot depend on wall-clock time or floating-point prices. -- Perf artifacts are now **partial hardware PMU evidence** from a bare-metal Apple M2 (aarch64) - Fedora Asahi host: real `cycles`/`instructions`/`branches`/`branch-misses`, but - `cache-references`/`cache-misses` are unsupported by the Apple Silicon PMU. Issue #90's residual - is the cache-counter set specifically, which needs a PMU microarchitecture that exposes it - (x86_64, or an ARM server core) — bare metal alone is not enough. Do not relabel these as either - "full PMU evidence" or "constrained Docker validation". +- The `perf stat` artifact (`results/perf_stat_linux.txt`) is now **partial hardware PMU evidence** + from a bare-metal Apple M2 (aarch64) Fedora Asahi host: real + `cycles`/`instructions`/`branches`/`branch-misses`, but `cache-references`/`cache-misses` are + unsupported by the Apple Silicon PMU. Issue #90's residual is the cache-counter set specifically, + which needs a PMU microarchitecture that exposes it (x86_64, or an ARM server core) — bare metal + alone is not enough. Do not relabel it "full PMU evidence" or "constrained Docker validation". The + `perf record` hot-symbol report (`results/perf_report_linux.txt`) is a **software cpu-clock + sampling** profile, not PMU evidence. - Issue #94 external review remains one of the highest remaining credibility signals; do not imply independent review has happened until `docs/review_feedback.md` records it. From 5093beb518180a53ecda8a8180aa12011f3bcad8 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Mon, 22 Jun 2026 12:40:30 -0400 Subject: [PATCH 22/22] docs: embed the flamegraph as a visible image in the README MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The flamegraph artifact, generator, provenance companion, and docs already existed but no page actually displayed the SVG — it was only referenced by filename. Embed the rendered results/flamegraph.svg as a visible image under the Benchmarks section, with a caption that classifies it honestly as a software cpu-clock sampling hot-symbol profile (not PMU evidence), names the hot frames, and links the provenance .txt and docs/perf_analysis.md. Co-Authored-By: Claude Opus 4.8 --- README.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/README.md b/README.md index 4332532..d1bb19d 100644 --- a/README.md +++ b/README.md @@ -109,6 +109,23 @@ Reproduce with `make bench` (numbers will differ by machine). The differential-t [`results/differential.txt`](results/differential.txt) — kept separate so it does not disturb the core numbers above. +### Flamegraph + +Where on-CPU time goes in the `qsl-bench` synthetic suite, rendered by `make flamegraph` +(`scripts/flamegraph.sh` → the dependency-free `scripts/flamegraph.py` — no external FlameGraph +toolchain): + +[![qsl-bench cpu-clock flamegraph](results/flamegraph.svg)](results/flamegraph.svg) + +This is a **software cpu-clock sampling** hot-symbol profile, **not** PMU evidence: frame width is +proportional to on-CPU samples (329 folded across 159 stacks on this run), not wall-clock latency or +throughput, and it is hardware/kernel/compiler/build dependent. The hot frames are protocol +`decode_new_order`, gateway session framing, `MatchingEngine::new_limit`, and order-book +cancel/allocation. Provenance and classification are in +[`results/flamegraph.txt`](results/flamegraph.txt); methodology in +[docs/perf_analysis.md](docs/perf_analysis.md). GitHub renders the SVG statically; download the raw +file for interactive zoom and search. + ## Limitations - **Synthetic and local.** No real market data, no real venue connectivity, no order types