From 0c3b401ff4bccc3c9565d6689db4caed1cb47b53 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 02:34:36 -0400 Subject: [PATCH 01/11] perf: add flamegraph generator and make target (#32) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add `make flamegraph`, the missing-flamegraph follow-up tracked by issue #32. The perf stat/record text workflow already existed; this renders a perf call-graph flamegraph. - scripts/flamegraph.sh: records `perf record --call-graph dwarf -F 4000 -g -e cpu-clock` on qsl-bench and writes results/flamegraph.svg plus a results/flamegraph.txt provenance/classification companion (top folded stacks). Mirrors perf_record.sh: Linux-only, reuses qsl_common.sh provenance + qsl_publish_artifact, and honours QSL_PERF_ALLOW_PARTIAL for constrained hosts. DWARF call graphs unwind correctly despite the Release `bench` preset omitting frame pointers. - scripts/flamegraph.py: dependency-free (stdlib-only) stackcollapse + SVG renderer, so the artifact is reproducible from the repo without vendoring the Perl FlameGraph toolkit. Deterministic: frames sorted by name, colors a pure function of the name, no RNG/timestamps in the drawn body. - tests/shell/test_flamegraph.sh: CTest-registered (python3-only, skips cleanly if absent) — folding (offset/dso stripping, perf-order reversal, comm-at-base, count aggregation, sortedness), SVG well-formedness, XML escaping, determinism, empty-input handling. - docs (perf_analysis.md, results/README.md), command lists (CLAUDE.md, AGENTS.md), MILESTONES.md backlog, PROGRESS.md log. `make check` 242/242. Full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 --- AGENTS.md | 1 + CLAUDE.md | 1 + MILESTONES.md | 4 +- Makefile | 9 +- PROGRESS.md | 16 ++ docs/perf_analysis.md | 34 +++- results/README.md | 6 + scripts/flamegraph.py | 306 +++++++++++++++++++++++++++++++++ scripts/flamegraph.sh | 238 +++++++++++++++++++++++++ tests/CMakeLists.txt | 7 + tests/shell/test_flamegraph.sh | 137 +++++++++++++++ 11 files changed, 755 insertions(+), 4 deletions(-) create mode 100755 scripts/flamegraph.py create mode 100755 scripts/flamegraph.sh create mode 100644 tests/shell/test_flamegraph.sh diff --git a/AGENTS.md b/AGENTS.md index 3b303db..63cae6e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -383,6 +383,7 @@ Keep this synchronized with the Makefile. - `make bench-recovery` — run M46 recovery benchmarking (full-replay restart vs book rebuild) - `make perf-stat` — run Linux `perf stat` workflow where supported - `make perf-record` — run Linux `perf record/report` workflow where supported +- `make flamegraph` — render a Linux `perf` call-graph flamegraph (SVG) where supported - `make numa-study` — run Linux CPU-affinity / scheduler-migration / NUMA-locality study where supported - `make false-sharing-study` — run benchmark-only packed-vs-padded SPSC cursor contention study - `make profile-io` — run Linux syscall/socket-path profiling where supported diff --git a/CLAUDE.md b/CLAUDE.md index 5c52266..ef85ff0 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -383,6 +383,7 @@ Keep this synchronized with the Makefile. - `make bench-recovery` — run M46 recovery benchmarking (full-replay restart vs book rebuild) - `make perf-stat` — run Linux `perf stat` workflow where supported - `make perf-record` — run Linux `perf record/report` workflow where supported +- `make flamegraph` — render a Linux `perf` call-graph flamegraph (SVG) where supported - `make numa-study` — run Linux CPU-affinity / scheduler-migration / NUMA-locality study where supported - `make false-sharing-study` — run benchmark-only packed-vs-padded SPSC cursor contention study - `make profile-io` — run Linux syscall/socket-path profiling where supported diff --git a/MILESTONES.md b/MILESTONES.md index 01d240c..c32aec8 100644 --- a/MILESTONES.md +++ b/MILESTONES.md @@ -484,7 +484,9 @@ Do not pull backlog items into earlier PRs. - FIX-like text protocol adapter. (#29) - Web dashboard for visualization. (#30) - Docker packaging. (#31) -- Perf/flamegraph docs. (#32) +- Perf/flamegraph docs. (#32) — **done**: `make flamegraph` renders a perf call-graph flamegraph + via the dependency-free `scripts/flamegraph.py` (`results/flamegraph.svg` + `.txt`), unit-tested in + `tests/shell/test_flamegraph.sh`. Full hardware cache-PMU evidence stays in #90. - GitHub Pages documentation site. (#33) ### Differential-testing follow-ups (prioritized) diff --git a/Makefile b/Makefile index 426e0bb..8c2e932 100644 --- a/Makefile +++ b/Makefile @@ -1,4 +1,4 @@ -.PHONY: configure build test check fmt fmt-check tidy bench bench-diff bench-allocator bench-storage bench-recovery perf-stat perf-record numa-study false-sharing-study profile-io socket-stress socket-load dpdk-check nic-offload-check crash-recovery concurrency-stress asan tsan demo check-fixtures check-manifest determinism divergence-demo clean +.PHONY: configure build test check fmt fmt-check tidy bench bench-diff bench-allocator bench-storage bench-recovery perf-stat perf-record flamegraph numa-study false-sharing-study profile-io socket-stress socket-load dpdk-check nic-offload-check crash-recovery concurrency-stress asan tsan demo check-fixtures check-manifest determinism divergence-demo clean BUILD_DIR := build/dev @@ -63,6 +63,13 @@ perf-record: cmake --build --preset bench --target qsl-bench QSL_BENCH_BIN=build/bench/qsl-bench bash scripts/perf_record.sh +# Issue #32: render a perf call-graph flamegraph (SVG) from the benchmark harness. Linux-only. +flamegraph: + @test "$$(uname -s)" = "Linux" || { echo "error: make flamegraph requires Linux perf; current OS is $$(uname -s)." >&2; exit 2; } + cmake --preset bench + cmake --build --preset bench --target qsl-bench + QSL_BENCH_BIN=build/bench/qsl-bench bash scripts/flamegraph.sh + # M43: CPU-affinity / scheduler-migration / NUMA locality study. Linux-only. numa-study: @if test "$$(uname -s)" != "Linux"; then \ diff --git a/PROGRESS.md b/PROGRESS.md index 070fece..7293498 100644 --- a/PROGRESS.md +++ b/PROGRESS.md @@ -362,6 +362,22 @@ Lower priority: (E-core) PMU carries live counts — the `apple_blizzard_pmu/...` rows read `` in `results/perf_stat_linux.txt` because the single-threaded benchmark stays on the Avalanche P-cores. Docs/memory only; no code or artifacts changed. +- [2026-06-21] Issue #32 flamegraph profiling artifact (`perf/flamegraph-artifact`, stacked on the + Codex-followup branch). Added `make flamegraph` → `scripts/flamegraph.sh`, which records + `perf record --call-graph dwarf -F 4000 -g -e cpu-clock` on `qsl-bench` and renders + `results/flamegraph.svg` (+ `results/flamegraph.txt` provenance/classification companion). The + fold + SVG render live in `scripts/flamegraph.py`, a dependency-free stdlib-only stackcollapse + + flamegraph renderer (no vendored Perl FlameGraph toolkit), deterministic by design (frames sorted + by name; colors a pure function of the name; no RNG/timestamps in the drawn body). DWARF call + graphs are used because the Release `bench` preset omits frame pointers; application symbols + (`OrderBook::add_limit`, `MatchingEngine::new_limit`, the replay path, …) still resolve from the + symtab. Added `tests/shell/test_flamegraph.sh` (CTest-registered, python3-only, skips cleanly if + absent) covering folding (offset/dso stripping, perf-order reversal, comm-at-base, count + aggregation, sortedness), SVG well-formedness, XML escaping, determinism, and empty-input + handling; `make check` 242/242. The committed `results/flamegraph.svg`/`.txt` were generated on + the bare-metal Fedora Asahi host (aarch64) from the clean committed tree (`Dirty inputs: no`). + This is a software cpu-clock sampling hot-symbol profile, not a latency/throughput claim; full + hardware cache-PMU evidence stays in #90. Do not merge from automation; human squash-merges. - [2026-06-03] M35: implemented a multi-client TCP connection-scaling load test (`scripts/socket_load.sh`, `make socket-load`, Linux-only) driving N concurrent `qsl-client`s against the portable TCP and epoll (M34) gateways; `results/socket_load_summary.txt` is Docker-generated and constrained. A `/code-review` (3 finder agents) caught and fixed real measurement-integrity bugs before the PR: a failed trial's `wall=0` no longer poisons the reported best (only trials whose gateway served count toward the min); the `completed` column reports the WORST per-trial completion, not the last, so partial/total trial failures are surfaced rather than masked; a per-client `timeout` bounds a hang if the gateway dies; and `QSL_LOAD_TRIALS` is validated. Post-PR hardening uses fresh monotonic ports per gateway start, retries transient startup/serve failures on new ports, and refuses to write a partial artifact unless `QSL_LOAD_ALLOW_PARTIAL=1` is set intentionally; the refreshed artifact records `Dirty tree: no`. The scaling-shape claim remains constrained to loopback connection setup, not a demonstrated production-capacity advantage for either transport. Deferred follow-up: a shared `scripts/lib` to remove the dirty-tree / `wait_ready` / gateway-stop duplication across the three socket scripts. - [2026-06-03] M35: started after M34 (#98) squash-merged (commit 9e3750b). Scope: multi-client load / socket-pressure testing of the gateway/feed path (TCP/UDP stress, socket-buffer pressure, connection scaling, backpressure) building on M34's epoll multi-client path and M30's socket tooling. Constraints: scripts/tests document load shape + environment; results must distinguish kernel/socket pressure from user-space engine cost; no production-capacity claims (honest constrained-environment framing, like M29/M30). - [2026-06-04] M35: PR #100 squash-merged to `main` as a86b701 after all CI jobs and review checks were green. M35 is now landed; original M36 NUMA remains deferred until the repository-health refactor analysis is completed or explicitly skipped by the human. diff --git a/docs/perf_analysis.md b/docs/perf_analysis.md index 3ab3881..7400f02 100644 --- a/docs/perf_analysis.md +++ b/docs/perf_analysis.md @@ -55,6 +55,30 @@ default is intentional: many CI, VM, and container environments do not expose ha to unprivileged processes, and the benchmark harness is short enough that a lower frequency can miss the minimum sample count needed for meaningful hot-symbol ordering. +Render a flamegraph (issue #32): + +```bash +make flamegraph +``` + +This runs `scripts/flamegraph.sh`, which records call-graph samples +(`perf record --call-graph dwarf -F 4000 -g -e cpu-clock`), folds them, and renders an SVG to +`results/flamegraph.svg` plus a text companion `results/flamegraph.txt` (provenance, classification, +and the top folded stacks). DWARF call graphs are used so stacks unwind correctly even though the +`bench` (Release) preset omits frame pointers — the application symbols (`OrderBook::add_limit`, +`MatchingEngine::new_limit`, the replay path, …) resolve from the symbol table without changing the +optimization level under measurement. + +The folding and SVG rendering live in `scripts/flamegraph.py`, a dependency-free Python script +(standard library only) that reimplements the `stackcollapse` + flamegraph data model rather than +vendoring Brendan Gregg's Perl toolkit, so the artifact is reproducible from this repository alone. +The renderer is deterministic — frames are sorted by name and colors are a pure function of the +frame name (no RNG, no timestamps in the drawn body) — and is unit-tested in +`tests/shell/test_flamegraph.sh` (registered with CTest, runs under `make check`). Frame width is +proportional to on-CPU samples; this is a software cpu-clock sampling profile for **hot-symbol +investigation**, not a latency or throughput measurement. Set `QSL_FLAMEGRAPH_EVENT=cycles` to +sample the hardware PMU cycles event instead, where the host exposes it. + ## Required Environment Both scripts are Linux-only and fail before running on non-Linux hosts. `perf stat` also fails @@ -113,8 +137,14 @@ counters, permission-limited sampling, or a sample report that is explicitly mar - `results/perf_report_linux.txt` records benchmark output, `perf record` stderr, and `perf report --stdio` output. It is useful as a hot-symbol profile only when `No samples: no`, `Insufficient samples: no`, and `Sample count` is at least `Minimum samples for hot profile`. -- `build/perf/qsl-bench.perf.data` is generated by `make perf-record` and is intentionally not - committed; it is host-specific binary profiler data. +- `results/flamegraph.svg` is the rendered flamegraph from `make flamegraph`; `results/flamegraph.txt` + is its provenance/classification companion (and lists the top folded stacks). Treat frame widths as + a hot-symbol guide only when the `.txt` reports a `flamegraph (...)` `Artifact:` and a `Sample + count` at least `Minimum samples for hot profile`; a `constrained-environment validation` label + means sampling did not capture enough stacks to trust. +- `build/perf/qsl-bench.perf.data` and `build/perf/qsl-bench.flame.data` are generated by + `make perf-record` / `make flamegraph` and are intentionally not committed; they are host-specific + binary profiler data. Each artifact includes hardware, kernel, compiler, perf version, build type, dataset, command, event set, and source-digest provenance. The `Source digest` is the authoritative source identity; diff --git a/results/README.md b/results/README.md index 49bd9d2..0f8b7aa 100644 --- a/results/README.md +++ b/results/README.md @@ -23,6 +23,12 @@ Benchmark results produced by `make bench` and scripts under `scripts/`. - `perf_report_linux.txt` — Linux `perf record/report` hot-symbol output for the benchmark harness (`make perf-record`). It is useful as a hot-symbol profile only when the file says `No samples: no`, `Insufficient samples: no`, and the sample count meets the reported minimum. +- `flamegraph.svg` / `flamegraph.txt` — Linux `perf` call-graph flamegraph (`make flamegraph`, + issue #32) rendered by the dependency-free `scripts/flamegraph.py`. The `.svg` is the visual + (frame width ∝ on-CPU samples) with provenance in a leading XML comment; the `.txt` carries + provenance, the `Artifact:` classification, and the top folded stacks. It is a software cpu-clock + sampling profile for hot-symbol investigation, not a latency/throughput claim — trust frame widths + only when the `.txt` reports a `flamegraph (...)` artifact with enough samples. - `numa_affinity_study.txt` — Linux CPU-affinity / scheduler-migration / NUMA-locality study output (`make numa-study`). It must self-classify as `full-linux-numa`, `linux-constrained`, or `unsupported-host`; only `full-linux-numa` is full NUMA evidence. diff --git a/scripts/flamegraph.py b/scripts/flamegraph.py new file mode 100755 index 0000000..966d0c7 --- /dev/null +++ b/scripts/flamegraph.py @@ -0,0 +1,306 @@ +#!/usr/bin/env python3 +"""Self-contained flamegraph generator for QSL perf profiles. + +Reads `perf script` output on stdin, folds it into collapsed stacks +(stackcollapse), and renders a deterministic SVG flamegraph on stdout. + +This is intentionally dependency-free (Python standard library only) so the +profiling artifact is reproducible from the repository alone, without vendoring +Brendan Gregg's Perl FlameGraph toolkit. The data model is identical: a +"collapsed stack" is `root;...;leafcount`, and the flamegraph is a +proportional, sorted, recursive layout of those stacks. + +Modes: + flamegraph.py perf script (stdin) -> SVG (stdout) + flamegraph.py --collapse-only perf script (stdin) -> collapsed stacks (stdout) + flamegraph.py --from-collapsed collapsed stacks (stdin) -> SVG (stdout) + +The rendering is deterministic: frames are sorted by name, and colors are a pure +function of the frame name (no RNG, no timestamps in the drawn body). The driver +script (scripts/flamegraph.sh) records run provenance separately so the SVG stays +reproducible for a given input. +""" + +from __future__ import annotations + +import argparse +import html +import re +import sys +import zlib + +# perf-script stack frame line: leading whitespace, hex address, symbol, "(dso)". +# C++ symbols contain spaces and parentheses, so the dso is taken as the final +# parenthesized group and the symbol is everything between the address and it. +_FRAME_RE = re.compile(r"^\s+(?P[0-9a-fA-F]+)\s+(?P.*\S)\s*$") +_OFFSET_RE = re.compile(r"\+0x[0-9a-fA-F]+$") + + +def _clean_symbol(rest: str) -> str: + """Turn a perf-script frame body into a folded frame name. + + Drops the trailing `(dso)` and the `+0xoffset`, matching stackcollapse-perf. + """ + # Strip the final "(...)" dso group if present (balanced at end of line). + if rest.endswith(")"): + depth = 0 + for i in range(len(rest) - 1, -1, -1): + if rest[i] == ")": + depth += 1 + elif rest[i] == "(": + depth -= 1 + if depth == 0: + rest = rest[:i].rstrip() + break + rest = _OFFSET_RE.sub("", rest).strip() + if not rest: + return "[unknown]" + return rest + + +def fold_perf_script(lines) -> dict[str, int]: + """Collapse `perf script` output into {stack_string: sample_count}.""" + folded: dict[str, int] = {} + comm = "" + stack: list[str] = [] + + def flush() -> None: + nonlocal stack, comm + if stack: + frames = list(reversed(stack)) + if comm: + frames.insert(0, comm) + key = ";".join(frames) + folded[key] = folded.get(key, 0) + 1 + stack = [] + + for raw in lines: + line = raw.rstrip("\n") + if not line.strip(): + flush() + comm = "" + continue + if line[0].isspace(): + m = _FRAME_RE.match(line) + if m: + stack.append(_clean_symbol(m.group("rest"))) + continue + # Header line: "comm pid timestamp: period event:" -> capture comm. + flush() + comm = line.split()[0] + flush() + return folded + + +def parse_collapsed(lines) -> dict[str, int]: + """Parse pre-collapsed `stack count` lines.""" + folded: dict[str, int] = {} + for raw in lines: + line = raw.rstrip("\n") + if not line.strip(): + continue + stack, _, count = line.rpartition(" ") + if not stack: + stack, _, count = line.rpartition("\t") + try: + n = int(count) + except ValueError: + continue + folded[stack] = folded.get(stack, 0) + n + return folded + + +class _Node: + __slots__ = ("name", "value", "children") + + def __init__(self, name: str) -> None: + self.name = name + self.value = 0 + self.children: dict[str, _Node] = {} + + +def build_tree(folded: dict[str, int], root_name: str) -> _Node: + root = _Node(root_name) + for stack, count in folded.items(): + root.value += count + node = root + for frame in stack.split(";"): + if not frame: + continue + child = node.children.get(frame) + if child is None: + child = _Node(frame) + node.children[frame] = child + child.value += count + node = child + return root + + +def _color(name: str) -> str: + """Deterministic warm 'hot' palette derived purely from the frame name.""" + h = zlib.crc32(name.encode("utf-8")) & 0xFFFFFFFF + r = 205 + (h % 51) + g = (h >> 8) % 231 + b = (h >> 16) % 56 + return f"rgb({r},{g},{b})" + + +def _layout(node: _Node, depth: int, x: int, total: int, out: list) -> None: + out.append((node, depth, x)) + cursor = x + for name in sorted(node.children): + child = node.children[name] + _layout(child, depth + 1, cursor, total, out) + cursor += child.value + + +def render_svg( + root: _Node, + *, + title: str, + subtitle: str, + width: int = 1200, + frame_height: int = 16, + min_px: float = 0.1, + countname: str = "samples", +) -> str: + total = root.value or 1 + placed: list = [] + _layout(root, 0, 0, total, placed) + max_depth = max((d for _, d, _ in placed), default=0) + + pad_top = 54 + pad_bottom = 16 + side = 10 + plot_width = width - 2 * side + height = pad_top + (max_depth + 1) * frame_height + pad_bottom + + def px(samples: int) -> float: + return samples / total * plot_width + + parts: list[str] = [] + parts.append( + f'\n' + f'' + ) + parts.append( + '' + ) + parts.append(_SEARCH_JS) + parts.append(f'') + parts.append( + f'{html.escape(title)}' + ) + parts.append( + f'' + f'{html.escape(subtitle)}' + ) + parts.append( + f'Search' + ) + parts.append( + f' ' + ) + + for node, depth, x in placed: + w = px(node.value) + if w < min_px: + continue + x_px = side + px(x) + y = pad_top + (max_depth - depth) * frame_height + pct = node.value / total * 100.0 + label = node.name + # Approx 7px per char at this font; reserve 6px padding. + maxchars = int((w - 6) / 7) + text = "" + if maxchars >= 3: + text = label if len(label) <= maxchars else label[: maxchars - 2] + ".." + tip = f"{label} ({node.value} {countname}, {pct:.2f}%)" + parts.append(f'') + parts.append(f"{html.escape(tip)}") + parts.append( + f'' + ) + if text: + parts.append( + f'{html.escape(text)}' + ) + parts.append("") + + parts.append("\n") + return "".join(parts) + + +# Minimal, self-contained search affordance (highlight matches, report % of +# matched samples). No external assets; deterministic; no zoom to keep the +# artifact robust across renderers. +_SEARCH_JS = ( + "" +) + + +def main(argv=None) -> int: + ap = argparse.ArgumentParser(description=__doc__) + ap.add_argument("--collapse-only", action="store_true", + help="emit collapsed stacks instead of SVG") + ap.add_argument("--from-collapsed", action="store_true", + help="read collapsed stacks instead of perf script output") + ap.add_argument("--title", default="QSL Flame Graph") + ap.add_argument("--subtitle", default="") + ap.add_argument("--countname", default="samples") + ap.add_argument("--root-name", default="all") + ap.add_argument("--width", type=int, default=1200) + args = ap.parse_args(argv) + + if args.from_collapsed: + folded = parse_collapsed(sys.stdin) + else: + folded = fold_perf_script(sys.stdin) + + if args.collapse_only: + for stack in sorted(folded): + sys.stdout.write(f"{stack} {folded[stack]}\n") + return 0 + + if not folded: + sys.stderr.write("flamegraph.py: no stacks parsed from input\n") + return 1 + + root = build_tree(folded, args.root_name) + sys.stdout.write( + render_svg( + root, + title=args.title, + subtitle=args.subtitle, + width=args.width, + countname=args.countname, + ) + ) + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/scripts/flamegraph.sh b/scripts/flamegraph.sh new file mode 100755 index 0000000..7324f64 --- /dev/null +++ b/scripts/flamegraph.sh @@ -0,0 +1,238 @@ +#!/usr/bin/env bash +# Generate a Linux perf flamegraph from the benchmark harness. +# +# Records call-graph samples with `perf record --call-graph dwarf`, folds them +# with scripts/flamegraph.py (a dependency-free stackcollapse + SVG renderer), +# and writes: +# results/flamegraph.svg -- the visual flamegraph (provenance embedded as a +# leading XML comment + a visible subtitle) +# results/flamegraph.txt -- provenance + classification + top folded stacks +# +# Defaults to software cpu-clock sampling so the artifact stays a portable +# hot-symbol *investigation* aid, not a latency/throughput claim. This is the +# missing-flamegraph follow-up tracked by issue #32 (the perf stat/record text +# workflow already exists; full hardware-PMU cache evidence stays in #90). +set -euo pipefail + +cd "$(dirname "$0")/.." +# shellcheck source=scripts/qsl_common.sh +source scripts/qsl_common.sh + +BIN="${QSL_BENCH_BIN:-build/bench/qsl-bench}" +OUT_SVG="${QSL_FLAMEGRAPH_SVG:-results/flamegraph.svg}" +OUT_TXT="${QSL_FLAMEGRAPH_TXT:-results/flamegraph.txt}" +DATA="${QSL_FLAMEGRAPH_DATA:-build/perf/qsl-bench.flame.data}" +EVENT="${QSL_FLAMEGRAPH_EVENT:-cpu-clock}" +FREQ="${QSL_FLAMEGRAPH_FREQ:-4000}" +CALLGRAPH="${QSL_FLAMEGRAPH_CALLGRAPH:-dwarf}" +MIN_SAMPLES="${QSL_FLAMEGRAPH_MIN_SAMPLES:-200}" +TOP_STACKS="${QSL_FLAMEGRAPH_TOP_STACKS:-15}" +BUILD_DIR="$(dirname "$BIN")" +PROVENANCE_SCOPE="flamegraph-benchmark" +PROVENANCE_INPUTS=( + Makefile + CMakeLists.txt + CMakePresets.json + cmake + include + src + apps/qsl-bench + benchmarks + scripts/flamegraph.sh + scripts/flamegraph.py + scripts/qsl_common.sh +) + +perf_version_line() { + perf --version 2>&1 | head -1 || true +} + +parse_sample_count_token() { + awk -v raw="$1" ' + BEGIN { + gsub(/,/, "", raw) + suffix = substr(raw, length(raw), 1) + mult = 1 + if (suffix == "K" || suffix == "k") { mult = 1000; raw = substr(raw, 1, length(raw) - 1) } + else if (suffix == "M" || suffix == "m") { mult = 1000000; raw = substr(raw, 1, length(raw) - 1) } + if (raw ~ /^[0-9]+([.][0-9]+)?$/) printf "%d\n", raw * mult + }' +} + +qsl_require_linux "scripts/flamegraph.sh" "perf" + +if ! command -v perf >/dev/null 2>&1; then + echo "error: perf not found. Install linux perf tooling for this kernel." >&2 + exit 2 +fi +if ! command -v python3 >/dev/null 2>&1; then + echo "error: python3 is required to render the flamegraph." >&2 + exit 2 +fi +if [[ ! -x "$BIN" ]]; then + echo "error: $BIN not found; build the benchmark preset first (make flamegraph)." >&2 + exit 1 +fi + +mkdir -p "$(dirname "$OUT_SVG")" "$(dirname "$DATA")" + +BENCH_OUT="$(mktemp)" +RECORD_BENCH_OUT="$(mktemp)" +RECORD_ERR="$(mktemp)" +SCRIPT_OUT="$(mktemp)" +SCRIPT_ERR="$(mktemp)" +FOLDED="$(mktemp)" +SVG_TMP="$(mktemp)" +TXT_TMP="$(mktemp)" +trap 'rm -f "$BENCH_OUT" "$RECORD_BENCH_OUT" "$RECORD_ERR" "$SCRIPT_OUT" "$SCRIPT_ERR" "$FOLDED" "$SVG_TMP" "$TXT_TMP"' EXIT + +# Fail fast if the benchmark itself is broken (partial mode must not mask this). +BENCH_STATUS=0 +"$BIN" >"$BENCH_OUT" 2>&1 || BENCH_STATUS=$? +if [[ "$BENCH_STATUS" -ne 0 ]]; then + echo "error: benchmark command failed before perf record (status $BENCH_STATUS); partial mode cannot override this." >&2 + cat "$BENCH_OUT" >&2 + exit 4 +fi + +RECORD_STATUS=0 +perf record --call-graph "$CALLGRAPH" -F "$FREQ" -g -e "$EVENT" -o "$DATA" -- "$BIN" \ + >"$RECORD_BENCH_OUT" 2>"$RECORD_ERR" || RECORD_STATUS=$? + +SCRIPT_STATUS=0 +if [[ "$RECORD_STATUS" -eq 0 ]]; then + perf script -i "$DATA" >"$SCRIPT_OUT" 2>"$SCRIPT_ERR" || SCRIPT_STATUS=$? +fi + +PERF_LIMITATION=no +if grep -Eiq 'No samples|failed to open|Permission denied|Operation not permitted|perf_event_open|not supported|Operation not supported|perf not found for kernel|linux-tools' \ + "$RECORD_ERR" "$SCRIPT_ERR"; then + PERF_LIMITATION=yes +fi + +SAMPLE_TOKEN="$(sed -nE 's/.*\(([0-9][0-9.,]*[KkMm]?) samples\).*/\1/p' "$RECORD_ERR" | head -1)" +SAMPLE_COUNT="$(parse_sample_count_token "$SAMPLE_TOKEN")" +[[ -z "$SAMPLE_COUNT" ]] && SAMPLE_COUNT=0 + +# Fold to collapsed stacks for the text summary and as an SVG precondition. +STACK_COUNT=0 +if [[ "$SCRIPT_STATUS" -eq 0 && -s "$SCRIPT_OUT" ]]; then + python3 scripts/flamegraph.py --collapse-only <"$SCRIPT_OUT" >"$FOLDED" 2>/dev/null || true + STACK_COUNT="$(wc -l <"$FOLDED" | tr -d ' ')" +fi + +INSUFFICIENT_SAMPLES=no +if [[ "$RECORD_STATUS" -eq 0 && "$SCRIPT_STATUS" -eq 0 && "$SAMPLE_COUNT" -lt "$MIN_SAMPLES" ]]; then + INSUFFICIENT_SAMPLES=yes +fi + +ARTIFACT_TYPE="flamegraph ($EVENT software sampling hot-symbol profile)" +if [[ "$EVENT" == "cycles" ]]; then + ARTIFACT_TYPE="flamegraph (cycles hardware-PMU sampling hot-symbol profile)" +fi +if [[ "$RECORD_STATUS" -ne 0 || "$SCRIPT_STATUS" -ne 0 || "$STACK_COUNT" -eq 0 ]]; then + ARTIFACT_TYPE="constrained-environment validation (partial; no clean sample report)" +elif [[ "$INSUFFICIENT_SAMPLES" == "yes" ]]; then + ARTIFACT_TYPE="constrained-environment validation (partial; insufficient samples for hot-symbol conclusions)" +fi + +PROVENANCE="$(qsl_emit_provenance "$PROVENANCE_SCOPE" "$OUT_SVG" "${PROVENANCE_INPUTS[@]}")" +HOST="$(uname -s) $(uname -m)" +DATE="$(qsl_utc_timestamp)" +SUBTITLE="$ARTIFACT_TYPE | $HOST | $EVENT @ ${FREQ}Hz | ${SAMPLE_COUNT} samples | ${STACK_COUNT} stacks | $DATE" + +# Render the SVG (deterministic for a fixed folded input + fixed subtitle). +if [[ "$STACK_COUNT" -gt 0 ]]; then + { + echo '' + # Keep the delimiters on their own lines and squeeze any "--" + # out of the interior: a double hyphen is illegal inside an XML comment. + echo "" + # Drop the renderer's own XML declaration; we emitted ours above. + python3 scripts/flamegraph.py \ + --title "QSL Matching-Engine Flame Graph (qsl-bench)" \ + --subtitle "$SUBTITLE" \ + --countname "$EVENT samples" \ + --from-collapsed <"$FOLDED" | tail -n +2 + } >"$SVG_TMP" + qsl_publish_artifact "$SVG_TMP" "$OUT_SVG" +fi + +# Text companion: provenance + classification + top folded stacks (human/queryable). +{ + echo "Command: make flamegraph" + echo "Artifact: $ARTIFACT_TYPE" + echo "Hardware: $(uname -m)" + echo "OS: $(uname -s) $(uname -r)" + echo "CPU: $(qsl_cpu_model)" + echo "Compiler: $(qsl_build_compiler_version "$BUILD_DIR")" + echo "Perf: $(perf_version_line)" + echo "Perf paranoid: $(cat /proc/sys/kernel/perf_event_paranoid 2>/dev/null || echo unknown)" + echo "Build type: $(qsl_build_type "$BUILD_DIR")" + echo "$PROVENANCE" + echo "Benchmark binary: $BIN" + echo "Dataset: qsl-bench default synthetic benchmark suite" + echo "Call graph: $CALLGRAPH" + echo "Record event: $EVENT" + echo "Sample freq: $FREQ Hz" + echo "Sample count: $SAMPLE_COUNT" + echo "Folded stacks: $STACK_COUNT" + echo "Minimum samples for hot profile: $MIN_SAMPLES" + echo "Insufficient samples: $INSUFFICIENT_SAMPLES" + echo "Record status: $RECORD_STATUS" + echo "Script status: $SCRIPT_STATUS" + echo "Perf access limitation: $PERF_LIMITATION" + echo "Flamegraph SVG: $(qsl_repo_relative_or_empty "$OUT_SVG")" + echo "Perf data: $DATA (generated, not intended for commit)" + echo + if [[ "$ARTIFACT_TYPE" == flamegraph* ]]; then + echo "Caveat: this flamegraph is a software cpu-clock sampling profile for hot-symbol" + echo "investigation. Frame width is proportional to on-CPU samples, not wall-clock" + echo "latency or throughput, and is hardware/kernel/compiler/build dependent." + else + echo "Caveat: constrained/partial perf validation, not a hot-symbol flamegraph. Treat" + echo "frame widths as unusable until sampling succeeds and Sample count meets the" + echo "Minimum samples for hot profile." + fi + echo + echo "Top $TOP_STACKS folded stacks (count stack):" + if [[ -s "$FOLDED" ]]; then + # The final awk limits to $TOP_STACKS rows by reading all input (NR<=top) + # rather than `head`, so `sort` is never sent SIGPIPE under `pipefail`. + awk '{ n=$NF; $NF=""; sub(/[[:space:]]+$/,""); printf "%s\t%s\n", n, $0 }' "$FOLDED" | + sort -t"$(printf '\t')" -k1,1nr | + awk -F"$(printf '\t')" -v top="$TOP_STACKS" 'NR<=top { printf "%8d %s\n", $1, $2 }' + else + echo " (none)" + fi + echo + echo "Benchmark output:" + cat "$BENCH_OUT" +} >"$TXT_TMP" +qsl_publish_artifact "$TXT_TMP" "$OUT_TXT" +echo "wrote $OUT_TXT" +[[ "$STACK_COUNT" -gt 0 ]] && echo "wrote $OUT_SVG" + +if [[ ("$RECORD_STATUS" -ne 0 || "$SCRIPT_STATUS" -ne 0) && "$PERF_LIMITATION" != "yes" ]]; then + echo "error: perf record/script failed for a reason other than a perf access limitation." >&2 + exit 3 +fi +if [[ "$STACK_COUNT" -eq 0 || "$INSUFFICIENT_SAMPLES" == "yes" ]]; then + if [[ "${QSL_PERF_ALLOW_PARTIAL:-0}" != "1" ]]; then + echo "error: flamegraph did not capture enough samples for a clean profile." >&2 + echo " Re-run on Linux with perf sampling access, or set QSL_PERF_ALLOW_PARTIAL=1" >&2 + echo " only when intentionally documenting a constrained environment." >&2 + exit 3 + fi +fi diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt index 4e95e46..cb617a9 100644 --- a/tests/CMakeLists.txt +++ b/tests/CMakeLists.txt @@ -89,6 +89,13 @@ add_test( NAME qsl_common_publish_artifact COMMAND bash "${CMAKE_CURRENT_LIST_DIR}/shell/test_qsl_common.sh") +# Shell unit tests for the dependency-free flamegraph renderer (scripts/flamegraph.py: +# perf-script folding + deterministic SVG rendering) behind `make flamegraph` (#32). +# Portable: needs only python3 (skips cleanly if absent); does not require perf. +add_test( + NAME qsl_flamegraph_render + COMMAND bash "${CMAKE_CURRENT_LIST_DIR}/shell/test_flamegraph.sh") + if(EXISTS "/dev/full") add_test( NAME qsl_replay_generate_append_failure diff --git a/tests/shell/test_flamegraph.sh b/tests/shell/test_flamegraph.sh new file mode 100644 index 0000000..2ba305d --- /dev/null +++ b/tests/shell/test_flamegraph.sh @@ -0,0 +1,137 @@ +#!/usr/bin/env bash +# Unit tests for scripts/flamegraph.py — the dependency-free stackcollapse + SVG +# renderer behind `make flamegraph` (issue #32). +# +# The shell driver (scripts/flamegraph.sh) needs Linux `perf`, which CI does not +# have, so these tests exercise the deterministic, portable core instead: +# 1. `perf script` output folds into correct collapsed stacks (innermost-first +# perf order reversed to root-first, comm at the base, dso + "+0xoffset" +# stripped, C++ symbols with spaces/parens preserved). +# 2. identical stacks aggregate their counts. +# 3. collapsed output is sorted and deterministic. +# 4. the SVG render is well-formed, escapes XML metacharacters, contains the +# expected frames, and is byte-identical across runs (no RNG, no timestamps). +# 5. empty input is handled (exit 1 for SVG, empty for --collapse-only). +# +# Registered with CTest (see tests/CMakeLists.txt); runs under `make check`. +# Run directly: bash tests/shell/test_flamegraph.sh + +set -uo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)" +FG="$REPO_ROOT/scripts/flamegraph.py" + +if ! command -v python3 >/dev/null 2>&1; then + echo "SKIP: python3 not found; flamegraph renderer tests skipped" + exit 0 +fi + +PASS=0 +FAIL=0 + +expect_eq() { + local name="$1" expected="$2" actual="$3" + if [[ "$actual" == "$expected" ]]; then + printf 'PASS: %s\n' "$name" + PASS=$((PASS + 1)) + else + printf 'FAIL: %s\n expected: %q\n actual: %q\n' "$name" "$expected" "$actual" + FAIL=$((FAIL + 1)) + fi +} + +expect_contains() { + local name="$1" needle="$2" haystack="$3" + if [[ "$haystack" == *"$needle"* ]]; then + printf 'PASS: %s\n' "$name" + PASS=$((PASS + 1)) + else + printf 'FAIL: %s\n missing: %q\n' "$name" "$needle" + FAIL=$((FAIL + 1)) + fi +} + +expect_not_contains() { + local name="$1" needle="$2" haystack="$3" + if [[ "$haystack" != *"$needle"* ]]; then + printf 'PASS: %s\n' "$name" + PASS=$((PASS + 1)) + else + printf 'FAIL: %s\n unexpected: %q\n' "$name" "$needle" + FAIL=$((FAIL + 1)) + fi +} + +# Build a synthetic `perf script` block. Frame lines must start with a TAB; the +# header line for each sample must start in column 0. +TAB=$'\t' +make_perf_script() { + printf '%s\n' \ + "qsl-bench 100 1.0: 1000 cpu-clock:u:" \ + "${TAB}415cd0 qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side)+0x310 (/path/qsl-bench)" \ + "${TAB}402887 main+0x127 (/path/qsl-bench)" \ + "" \ + "qsl-bench 100 2.0: 1000 cpu-clock:u:" \ + "${TAB}415cd0 qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side)+0x300 (/path/qsl-bench)" \ + "${TAB}402887 main+0x100 (/path/qsl-bench)" \ + "" \ + "qsl-bench 100 3.0: 1000 cpu-clock:u:" \ + "${TAB}aaaa cfree+0x5 (/usr/lib64/libc.so.6)" \ + "${TAB}402887 main+0x10 (/path/qsl-bench)" \ + "" +} + +# --- Folding (stackcollapse) ------------------------------------------------ + +FOLDED="$(make_perf_script | python3 "$FG" --collapse-only)" + +# Innermost-first perf order is reversed to root-first, comm prepended, dso and +# "+0xoffset" stripped. The two add_limit samples (different offsets) collapse to +# one stack with count 2. +expect_contains "add_limit stack folds with comm at base, offset+dso stripped, count 2" \ + 'qsl-bench;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side) 2' \ + "$FOLDED" +expect_contains "libc leaf folds to one sample" \ + 'qsl-bench;main;cfree 1' \ + "$FOLDED" +expect_not_contains "dso paths are stripped from frames" "/usr/lib64/libc.so.6" "$FOLDED" +expect_not_contains "raw +0x offsets are stripped from frames" "+0x" "$FOLDED" + +# Collapsed output is sorted (deterministic) and stable across runs. +FOLDED2="$(make_perf_script | python3 "$FG" --collapse-only)" +expect_eq "collapse-only is deterministic" "$FOLDED" "$FOLDED2" +SORTED="$(printf '%s\n' "$FOLDED" | LC_ALL=C sort)" +expect_eq "collapse-only output is sorted" "$SORTED" "$FOLDED" + +# --- SVG rendering ---------------------------------------------------------- + +SVG="$(make_perf_script | python3 "$FG" --title "T" --subtitle "S")" +expect_contains "svg has XML declaration" '' "$SVG" +expect_contains "svg carries the title" '>T' "$SVG" +expect_contains "svg renders the add_limit frame" 'add_limit' "$SVG" +expect_contains "svg renders rect frames" 'class="frame"' "$SVG" + +# Deterministic: byte-identical across two renders of the same input. +SVG2="$(make_perf_script | python3 "$FG" --title "T" --subtitle "S")" +expect_eq "svg render is deterministic" "$SVG" "$SVG2" + +# XML metacharacters in frame names are escaped, not emitted raw. +ESC_SVG="$(printf 'bench;a&c 3\n' | python3 "$FG" --from-collapsed)" +expect_contains "frame names are XML-escaped" '<b>&c' "$ESC_SVG" +expect_not_contains "raw unescaped angle bracket is not emitted in a frame title" 'a<b>' "$ESC_SVG" + +# --- Empty input ------------------------------------------------------------ + +EMPTY_COLLAPSE="$(printf '' | python3 "$FG" --collapse-only)" +expect_eq "empty input yields empty collapse" "" "$EMPTY_COLLAPSE" + +printf '' | python3 "$FG" >/dev/null 2>&1 +rc=$? +expect_eq "empty input fails SVG render with exit 1" "1" "$rc" + +# --- Summary ---------------------------------------------------------------- + +printf '\nResults: %d passed, %d failed\n' "$PASS" "$FAIL" +[[ "$FAIL" -eq 0 ]] From beec2d0c115c9b53cfd07784335149b652de414d Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 02:37:31 -0400 Subject: [PATCH 02/11] perf: add generated flamegraph artifact on bare-metal Fedora Asahi (#32) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit results/flamegraph.svg + results/flamegraph.txt generated by `make flamegraph` from the clean committed tree on the bare-metal Apple M2 (aarch64) Fedora Asahi host: 397 cpu-clock samples, 171 folded stacks, `Dirty inputs: no`. The hot paths resolve to real engine symbols (OrderBook::modify/cancel/add_limit, the dispatch_storage cancel path, decode_new_order, the gateway Session path, replay::generate_flow). Software cpu-clock sampling hot-symbol profile — not a latency/throughput claim; full hardware cache-PMU evidence stays in #90. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --- results/flamegraph.svg | 31 ++++++++++++++++++++++ results/flamegraph.txt | 58 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 89 insertions(+) create mode 100644 results/flamegraph.svg create mode 100644 results/flamegraph.txt diff --git a/results/flamegraph.svg b/results/flamegraph.svg new file mode 100644 index 0000000..2281d5b --- /dev/null +++ b/results/flamegraph.svg @@ -0,0 +1,31 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!-- +QSL flamegraph provenance + Provenance version: 1 + Git commit (informational): 0c3b401 + Source digest: sha256:0d8061b5c92b9a8a1f3bffd14a340e733f28674b14d5716c2eaa6bdb00b31242 + Source digest scope: flamegraph-benchmark + Dirty inputs: no + Generated output: results/flamegraph.svg + Date: 2026-06-21T06:36:51Z + Command: make flamegraph + Artifact: flamegraph (cpu-clock software sampling hot-symbol profile) + Record: perf record [call-graph dwarf | -F 4000 | -g | -e cpu-clock] + Samples: 397 | Folded stacks: 171 + Caveat: software cpu-clock sampling shows on-CPU time by symbol; it is + not a latency or throughput measurement and is hardware/build dependent. +--> +<svg version="1.1" width="1200" height="310" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 1200 310" font-family="Verdana,Helvetica,sans-serif" font-size="12"><style>.frame:hover{stroke:#000;stroke-width:0.5} .hl{stroke:#000;stroke-width:1}</style><script type="text/ecmascript"><![CDATA[ +function qslSearch(){ + var term=prompt('Search frame (regex):',''); + var detail=document.getElementById('qsl-detail'); + var gs=document.getElementsByClassName('func'); + var i; + if(!term){for(i=0;i<gs.length;i++){gs[i].getElementsByTagName('rect')[0].classList.remove('hl');}if(detail)detail.textContent=' ';return;} + var re;try{re=new RegExp(term);}catch(e){return;} + for(i=0;i<gs.length;i++){var r=gs[i].getElementsByTagName('rect')[0]; + if(re.test(gs[i].getAttribute('data-name'))){r.classList.add('hl');} + else{r.classList.remove('hl');}} + if(detail)detail.textContent='Search: '+term; +} +]]></script><rect width="1200" height="310" fill="#f8f8f8"/><text x="600" y="24" text-anchor="middle" font-size="17" font-weight="bold">QSL Matching-Engine Flame Graph (qsl-bench)</text><text x="600" y="40" text-anchor="middle" fill="#555">flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 397 samples | 171 stacks | 2026-06-21T06:36:51Z</text><text id="qsl-search" x="1190" y="24" text-anchor="end" fill="#990000" onclick="qslSearch()" style="cursor:pointer">Search</text><text id="qsl-detail" x="10" y="306" fill="#333"> </text><g class="func" data-name="all"><title>all (397 cpu-clock samples, 100.00%)allqsl-bench (397 cpu-clock samples, 100.00%)qsl-bench[unknown] (300 cpu-clock samples, 75.57%)[unknown][unknown] (278 cpu-clock samples, 70.03%)[unknown][unknown] (221 cpu-clock samples, 55.67%)[unknown][unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (1 cpu-clock samples, 0.25%)check_match (1 cpu-clock samples, 0.25%)do_lookup_x (2 cpu-clock samples, 0.50%)__libc_start_call_main (218 cpu-clock samples, 54.91%)__libc_start_call_mainmain (218 cpu-clock samples, 54.91%)mainqsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (30 cpu-clock samples, 7.56%)qsl::engi..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (4 cpu-clock samples, 1.01%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (21 cpu-clock samples, 5.29%)qsl::e..operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (13 cpu-clock samples, 3.27%)qs..std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (10 cpu-clock samples, 2.52%)s..std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (5 cpu-clock samples, 1.26%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (5 cpu-clock samples, 1.26%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (2 cpu-clock samples, 0.50%)std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const (1 cpu-clock samples, 0.25%)std::pmr::(anonymous namespace)::newdel_res_t::do_allocate(unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::cancel(unsigned long) (23 cpu-clock samples, 5.79%)qsl::e..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (22 cpu-clock samples, 5.54%)declty..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (13 cpu-clock samples, 3.27%)qs..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.76%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (5 cpu-clock samples, 1.26%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (7 cpu-clock samples, 1.76%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (3 cpu-clock samples, 0.76%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (56 cpu-clock samples, 14.11%)qsl::gateway::Sessio..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (56 cpu-clock samples, 14.11%)qsl::gateway::Sessio..__memcpy_generic (1 cpu-clock samples, 0.25%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (46 cpu-clock samples, 11.59%)qsl::gateway::Se..qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (10 cpu-clock samples, 2.52%)q..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (3 cpu-clock samples, 0.76%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::Fill const&) (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (28 cpu-clock samples, 7.05%)qsl::gate..qsl::engine::MatchingEngine::can_store_limit(unsigned int, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (2 cpu-clock samples, 0.50%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (18 cpu-clock samples, 4.53%)qsl:..operator new(unsigned long) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (9 cpu-clock samples, 2.27%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (2 cpu-clock samples, 0.50%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (3 cpu-clock samples, 0.76%)qsl::engine::check_limit(qsl::engine::RiskConfig const&, qsl::core::Side, long, unsigned int) (2 cpu-clock samples, 0.50%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (6 cpu-clock samples, 1.51%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (5 cpu-clock samples, 1.26%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (15 cpu-clock samples, 3.78%)qsl..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (4 cpu-clock samples, 1.01%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (32 cpu-clock samples, 8.06%)qsl::repla..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::cancel(unsigned long) (3 cpu-clock samples, 0.76%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (4 cpu-clock samples, 1.01%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (18 cpu-clock samples, 4.53%)qsl:..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.27%)qs..__memcpy_generic (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 1.01%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (6 cpu-clock samples, 1.51%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 1.01%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (3 cpu-clock samples, 0.76%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.25%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::contains(unsigned long) const (3 cpu-clock samples, 0.76%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (16 cpu-clock samples, 4.03%)qsl..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::contains(unsigned long) const (5 cpu-clock samples, 1.26%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (5 cpu-clock samples, 1.26%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.25%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (32 cpu-clock samples, 8.06%)qsl::repla..qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (30 cpu-clock samples, 7.56%)qsl::repl..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::cancel(unsigned long) (3 cpu-clock samples, 0.76%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.76%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (2 cpu-clock samples, 0.50%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.50%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (18 cpu-clock samples, 4.53%)qsl:..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (8 cpu-clock samples, 2.02%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.50%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (4 cpu-clock samples, 1.01%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (1 cpu-clock samples, 0.25%)operator new(unsigned long) (4 cpu-clock samples, 1.01%)malloc@plt (4 cpu-clock samples, 1.01%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (14 cpu-clock samples, 3.53%)qsl..[unknown] (14 cpu-clock samples, 3.53%)[un..[unknown] (14 cpu-clock samples, 3.53%)[un..[unknown] (9 cpu-clock samples, 2.27%)[unknown] (2 cpu-clock samples, 0.50%)_mid_memalign (2 cpu-clock samples, 0.50%)__posix_memalign (7 cpu-clock samples, 1.76%)malloc (5 cpu-clock samples, 1.26%)operator new(unsigned long, std::align_val_t) (5 cpu-clock samples, 1.26%)__posix_memalign (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (11 cpu-clock samples, 2.77%)q..[unknown] (9 cpu-clock samples, 2.27%)[unknown] (9 cpu-clock samples, 2.27%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (3 cpu-clock samples, 0.76%)_mid_memalign (3 cpu-clock samples, 0.76%)__posix_memalign (2 cpu-clock samples, 0.50%)malloc (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t) (4 cpu-clock samples, 1.01%)__posix_memalign (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (13 cpu-clock samples, 3.27%)qs..[unknown] (11 cpu-clock samples, 2.77%)[..[unknown] (11 cpu-clock samples, 2.77%)[..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (9 cpu-clock samples, 2.27%)malloc (5 cpu-clock samples, 1.26%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (14 cpu-clock samples, 3.53%)qsl..[unknown] (14 cpu-clock samples, 3.53%)[un..[unknown] (14 cpu-clock samples, 3.53%)[un..cfree@GLIBC_2.17 (7 cpu-clock samples, 1.76%)operator new(unsigned long) (7 cpu-clock samples, 1.76%)malloc (5 cpu-clock samples, 1.26%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)malloc@plt (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)posix_memalign@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (1 cpu-clock samples, 0.25%)_mid_memalign (1 cpu-clock samples, 0.25%)__posix_memalign (1 cpu-clock samples, 0.25%)malloc (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)__posix_memalign (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)operator new(unsigned long) (5 cpu-clock samples, 1.26%)malloc (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)_mid_memalign (4 cpu-clock samples, 1.01%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (4 cpu-clock samples, 1.01%)[unknown] (4 cpu-clock samples, 1.01%)[unknown] (4 cpu-clock samples, 1.01%)cfree@GLIBC_2.17 (4 cpu-clock samples, 1.01%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t)@plt (2 cpu-clock samples, 0.50%)__libc_start_call_main (7 cpu-clock samples, 1.76%)[unknown] (7 cpu-clock samples, 1.76%)[unknown] (7 cpu-clock samples, 1.76%)cfree@GLIBC_2.17 (7 cpu-clock samples, 1.76%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (6 cpu-clock samples, 1.51%)[unknown] (6 cpu-clock samples, 1.51%)[unknown] (6 cpu-clock samples, 1.51%)cfree@GLIBC_2.17 (6 cpu-clock samples, 1.51%)main (16 cpu-clock samples, 4.03%)main[unknown] (11 cpu-clock samples, 2.77%)[..[unknown] (11 cpu-clock samples, 2.77%)[..operator new(unsigned long) (11 cpu-clock samples, 2.77%)o..malloc (9 cpu-clock samples, 2.27%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (4 cpu-clock samples, 1.01%)operator new(unsigned long) (5 cpu-clock samples, 1.26%)malloc@plt (5 cpu-clock samples, 1.26%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (6 cpu-clock samples, 1.51%)[unknown] (4 cpu-clock samples, 1.01%)[unknown] (4 cpu-clock samples, 1.01%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)malloc (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (2 cpu-clock samples, 0.50%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (14 cpu-clock samples, 3.53%)qsl..[unknown] (10 cpu-clock samples, 2.52%)[..[unknown] (10 cpu-clock samples, 2.52%)[..[unknown] (5 cpu-clock samples, 1.26%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)_mid_memalign (2 cpu-clock samples, 0.50%)__posix_memalign (2 cpu-clock samples, 0.50%)malloc (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t) (5 cpu-clock samples, 1.26%)__posix_memalign (4 cpu-clock samples, 1.01%)memcpy@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (17 cpu-clock samples, 4.28%)qsl:..free@plt (2 cpu-clock samples, 0.50%)operator delete(void*, std::align_val_t)@plt (6 cpu-clock samples, 1.51%)operator delete(void*, unsigned long, std::align_val_t)@plt (6 cpu-clock samples, 1.51%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)@plt (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook()@plt (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (2 cpu-clock samples, 0.50%)free@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (3 cpu-clock samples, 0.76%)operator new(unsigned long)@plt (3 cpu-clock samples, 0.76%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)malloc (2 cpu-clock samples, 0.50%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (3 cpu-clock samples, 0.76%)memcpy@plt (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (6 cpu-clock samples, 1.51%)free@plt (2 cpu-clock samples, 0.50%)operator delete(void*, unsigned long, std::align_val_t)@plt (4 cpu-clock samples, 1.01%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (5 cpu-clock samples, 1.26%)operator new(unsigned long, std::align_val_t)@plt (5 cpu-clock samples, 1.26%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)operator delete(void*, std::align_val_t)@plt (1 cpu-clock samples, 0.25%) diff --git a/results/flamegraph.txt b/results/flamegraph.txt new file mode 100644 index 0000000..4560ad7 --- /dev/null +++ b/results/flamegraph.txt @@ -0,0 +1,58 @@ +Command: make flamegraph +Artifact: flamegraph (cpu-clock software sampling hot-symbol profile) +Hardware: aarch64 +OS: Linux 6.19.14-400.asahi.fc44.aarch64+16k +CPU: Avalanche-M2 +Compiler: c++ (GCC) 16.1.1 20260515 (Red Hat 16.1.1-2) +Perf: perf version 6.19.14-400.asahi.fc44.aarch64 +Perf paranoid: 2 +Build type: Release +Provenance version: 1 +Git commit (informational): 0c3b401 +Source digest: sha256:0d8061b5c92b9a8a1f3bffd14a340e733f28674b14d5716c2eaa6bdb00b31242 +Source digest scope: flamegraph-benchmark +Dirty inputs: no +Generated output: results/flamegraph.svg +Date: 2026-06-21T06:36:51Z +Benchmark binary: build/bench/qsl-bench +Dataset: qsl-bench default synthetic benchmark suite +Call graph: dwarf +Record event: cpu-clock +Sample freq: 4000 Hz +Sample count: 397 +Folded stacks: 171 +Minimum samples for hot profile: 200 +Insufficient samples: no +Record status: 0 +Script status: 0 +Perf access limitation: no +Flamegraph SVG: results/flamegraph.svg +Perf data: build/perf/qsl-bench.flame.data (generated, not intended for commit) + +Caveat: this flamegraph is a software cpu-clock sampling profile for hot-symbol +investigation. Frame width is proportional to on-CPU samples, not wall-clock +latency or throughput, and is hardware/kernel/compiler/build dependent. + +Top 15 folded stacks (count stack): + 15 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) + 9 qsl-bench;main;[unknown];[unknown];operator new(unsigned long);malloc + 7 qsl-bench;__libc_start_call_main;[unknown];[unknown];cfree@GLIBC_2.17 + 7 qsl-bench;[unknown];[unknown];qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];cfree@GLIBC_2.17 + 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main + 6 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 + 6 qsl-bench;qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);operator delete(void*, std::align_val_t)@plt + 6 qsl-bench;qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);operator delete(void*, unsigned long, std::align_val_t)@plt + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::protocol::decode_new_order(std::span) + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&);qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 5 qsl-bench;operator new(unsigned long);malloc@plt + 5 qsl-bench;std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&);operator new(unsigned long, std::align_val_t)@plt + 5 qsl-bench;[unknown];qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&);[unknown];[unknown];operator new(unsigned long);malloc + 5 qsl-bench;[unknown];[unknown];qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long);[unknown];[unknown];[unknown];__posix_memalign;malloc + 5 qsl-bench;[unknown];[unknown];qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long);[unknown];[unknown];operator new(unsigned long);malloc + +Benchmark output: +order_book add/mod/cancel 200000 ops 151.3 ns/op 6607667 ops/sec +protocol encode+decode 500000 ops 21.8 ns/op 45829279 ops/sec +gateway session (fill) 200000 ops 132.3 ns/op 7556487 ops/sec +matching engine flow 5004 items 104.7 ns/item 9553139 items/sec +replay command log 5004 items 115.1 ns/item 8690129 items/sec From 872600ad18b65e15259fe382c0cdbe6a7e1b9e2f Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 08:54:49 -0400 Subject: [PATCH 03/11] perf: harden flamegraph collapsed-stack parsing (Codex review) Address two Codex review findings in scripts/flamegraph.py::parse_collapsed: - Prefer a tab separator when present so a tab-separated folded line whose stack contains spaces (C++ signatures) splits on the trailing count instead of an interior space and is silently dropped. - Ignore non-positive sample counts, so hand-crafted --from-collapsed input with 0/negative counts cannot render a misleading SVG (all-non-positive input now fails with exit 1 via the existing empty-folded guard). Adds test coverage in tests/shell/test_flamegraph.sh (19/19). Co-Authored-By: Claude Opus 4.8 --- scripts/flamegraph.py | 17 +++++++++++++---- tests/shell/test_flamegraph.sh | 16 ++++++++++++++++ 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/scripts/flamegraph.py b/scripts/flamegraph.py index 966d0c7..3af5110 100755 --- a/scripts/flamegraph.py +++ b/scripts/flamegraph.py @@ -93,19 +93,28 @@ def flush() -> None: def parse_collapsed(lines) -> dict[str, int]: - """Parse pre-collapsed `stack count` lines.""" + """Parse pre-collapsed `stackcount` lines. + + The canonical folded separator is a space, but a tab is tolerated. Tab is + preferred when present so a stack containing spaces (C++ signatures) still + splits on the trailing count rather than on an interior space. Non-positive + counts are ignored. + """ folded: dict[str, int] = {} for raw in lines: line = raw.rstrip("\n") if not line.strip(): continue - stack, _, count = line.rpartition(" ") - if not stack: - stack, _, count = line.rpartition("\t") + sep = "\t" if "\t" in line else " " + stack, found, count = line.rpartition(sep) + if not found: + continue try: n = int(count) except ValueError: continue + if n <= 0: + continue folded[stack] = folded.get(stack, 0) + n return folded diff --git a/tests/shell/test_flamegraph.sh b/tests/shell/test_flamegraph.sh index 2ba305d..585ba34 100644 --- a/tests/shell/test_flamegraph.sh +++ b/tests/shell/test_flamegraph.sh @@ -122,6 +122,22 @@ ESC_SVG="$(printf 'bench;a&c 3\n' | python3 "$FG" --from-collapsed)" expect_contains "frame names are XML-escaped" '<b>&c' "$ESC_SVG" expect_not_contains "raw unescaped angle bracket is not emitted in a frame title" 'a<b>' "$ESC_SVG" +# --- Collapsed input parsing ------------------------------------------------ + +# A tab-separated stack that itself contains spaces must split on the count, not +# on an interior space. +TAB_COLLAPSED="$(printf 'main;foo(unsigned int)\t7\n' | python3 "$FG" --from-collapsed --collapse-only)" +expect_eq "tab-separated collapsed line keeps its count" \ + 'main;foo(unsigned int) 7' "$TAB_COLLAPSED" + +# Non-positive counts are ignored; a stack with only such counts yields nothing. +NONPOS="$(printf 'a;b 0\nc;d -3\n' | python3 "$FG" --from-collapsed --collapse-only)" +expect_eq "non-positive collapsed counts are dropped" "" "$NONPOS" + +printf 'a;b 0\n' | python3 "$FG" --from-collapsed >/dev/null 2>&1 +rc=$? +expect_eq "all-non-positive collapsed input fails SVG with exit 1" "1" "$rc" + # --- Empty input ------------------------------------------------------------ EMPTY_COLLAPSE="$(printf '' | python3 "$FG" --collapse-only)" From 0201d54e593456add8e08ea881a5b14e36b273fa Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 20:49:32 -0400 Subject: [PATCH 04/11] perf: regenerate flamegraph artifact after parser hardening flamegraph.py is a provenance input, so regenerate results/flamegraph.svg + .txt from the clean tree to keep the Source digest consistent (423 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --- results/flamegraph.svg | 10 ++++----- results/flamegraph.txt | 48 +++++++++++++++++++++--------------------- 2 files changed, 29 insertions(+), 29 deletions(-) diff --git a/results/flamegraph.svg b/results/flamegraph.svg index 2281d5b..fc87dda 100644 --- a/results/flamegraph.svg +++ b/results/flamegraph.svg @@ -2,16 +2,16 @@ <!-- QSL flamegraph provenance Provenance version: 1 - Git commit (informational): 0c3b401 - Source digest: sha256:0d8061b5c92b9a8a1f3bffd14a340e733f28674b14d5716c2eaa6bdb00b31242 + Git commit (informational): 872600a + Source digest: sha256:211e5835552616102fbe44d8f10dfa7cb6a4b35495dca98243bc87d37c45cfb0 Source digest scope: flamegraph-benchmark Dirty inputs: no Generated output: results/flamegraph.svg - Date: 2026-06-21T06:36:51Z + Date: 2026-06-21T12:54:50Z Command: make flamegraph Artifact: flamegraph (cpu-clock software sampling hot-symbol profile) Record: perf record [call-graph dwarf | -F 4000 | -g | -e cpu-clock] - Samples: 397 | Folded stacks: 171 + Samples: 423 | Folded stacks: 163 Caveat: software cpu-clock sampling shows on-CPU time by symbol; it is not a latency or throughput measurement and is hardware/build dependent. --> @@ -28,4 +28,4 @@ function qslSearch(){ else{r.classList.remove('hl');}} if(detail)detail.textContent='Search: '+term; } -]]></script><rect width="1200" height="310" fill="#f8f8f8"/><text x="600" y="24" text-anchor="middle" font-size="17" font-weight="bold">QSL Matching-Engine Flame Graph (qsl-bench)</text><text x="600" y="40" text-anchor="middle" fill="#555">flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 397 samples | 171 stacks | 2026-06-21T06:36:51Z</text><text id="qsl-search" x="1190" y="24" text-anchor="end" fill="#990000" onclick="qslSearch()" style="cursor:pointer">Search</text><text id="qsl-detail" x="10" y="306" fill="#333"> </text><g class="func" data-name="all"><title>all (397 cpu-clock samples, 100.00%)allqsl-bench (397 cpu-clock samples, 100.00%)qsl-bench[unknown] (300 cpu-clock samples, 75.57%)[unknown][unknown] (278 cpu-clock samples, 70.03%)[unknown][unknown] (221 cpu-clock samples, 55.67%)[unknown][unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (1 cpu-clock samples, 0.25%)check_match (1 cpu-clock samples, 0.25%)do_lookup_x (2 cpu-clock samples, 0.50%)__libc_start_call_main (218 cpu-clock samples, 54.91%)__libc_start_call_mainmain (218 cpu-clock samples, 54.91%)mainqsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (30 cpu-clock samples, 7.56%)qsl::engi..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (4 cpu-clock samples, 1.01%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (21 cpu-clock samples, 5.29%)qsl::e..operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (13 cpu-clock samples, 3.27%)qs..std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (10 cpu-clock samples, 2.52%)s..std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (5 cpu-clock samples, 1.26%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (5 cpu-clock samples, 1.26%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (2 cpu-clock samples, 0.50%)std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const (1 cpu-clock samples, 0.25%)std::pmr::(anonymous namespace)::newdel_res_t::do_allocate(unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::cancel(unsigned long) (23 cpu-clock samples, 5.79%)qsl::e..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (22 cpu-clock samples, 5.54%)declty..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (13 cpu-clock samples, 3.27%)qs..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.76%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (5 cpu-clock samples, 1.26%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (7 cpu-clock samples, 1.76%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (3 cpu-clock samples, 0.76%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (56 cpu-clock samples, 14.11%)qsl::gateway::Sessio..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (56 cpu-clock samples, 14.11%)qsl::gateway::Sessio..__memcpy_generic (1 cpu-clock samples, 0.25%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (46 cpu-clock samples, 11.59%)qsl::gateway::Se..qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (10 cpu-clock samples, 2.52%)q..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (3 cpu-clock samples, 0.76%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::Fill const&) (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (28 cpu-clock samples, 7.05%)qsl::gate..qsl::engine::MatchingEngine::can_store_limit(unsigned int, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (2 cpu-clock samples, 0.50%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (18 cpu-clock samples, 4.53%)qsl:..operator new(unsigned long) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (9 cpu-clock samples, 2.27%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (2 cpu-clock samples, 0.50%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (3 cpu-clock samples, 0.76%)qsl::engine::check_limit(qsl::engine::RiskConfig const&, qsl::core::Side, long, unsigned int) (2 cpu-clock samples, 0.50%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (6 cpu-clock samples, 1.51%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (5 cpu-clock samples, 1.26%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (15 cpu-clock samples, 3.78%)qsl..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (4 cpu-clock samples, 1.01%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (32 cpu-clock samples, 8.06%)qsl::repla..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::cancel(unsigned long) (3 cpu-clock samples, 0.76%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (4 cpu-clock samples, 1.01%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (18 cpu-clock samples, 4.53%)qsl:..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.27%)qs..__memcpy_generic (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 1.01%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.25%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (6 cpu-clock samples, 1.51%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 1.01%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (3 cpu-clock samples, 0.76%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.25%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::contains(unsigned long) const (3 cpu-clock samples, 0.76%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (16 cpu-clock samples, 4.03%)qsl..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::contains(unsigned long) const (5 cpu-clock samples, 1.26%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (5 cpu-clock samples, 1.26%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.25%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (32 cpu-clock samples, 8.06%)qsl::repla..qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (30 cpu-clock samples, 7.56%)qsl::repl..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::cancel(unsigned long) (3 cpu-clock samples, 0.76%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.76%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (2 cpu-clock samples, 0.50%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.50%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (18 cpu-clock samples, 4.53%)qsl:..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (8 cpu-clock samples, 2.02%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (3 cpu-clock samples, 0.76%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.50%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (4 cpu-clock samples, 1.01%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (1 cpu-clock samples, 0.25%)operator new(unsigned long) (4 cpu-clock samples, 1.01%)malloc@plt (4 cpu-clock samples, 1.01%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (14 cpu-clock samples, 3.53%)qsl..[unknown] (14 cpu-clock samples, 3.53%)[un..[unknown] (14 cpu-clock samples, 3.53%)[un..[unknown] (9 cpu-clock samples, 2.27%)[unknown] (2 cpu-clock samples, 0.50%)_mid_memalign (2 cpu-clock samples, 0.50%)__posix_memalign (7 cpu-clock samples, 1.76%)malloc (5 cpu-clock samples, 1.26%)operator new(unsigned long, std::align_val_t) (5 cpu-clock samples, 1.26%)__posix_memalign (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (11 cpu-clock samples, 2.77%)q..[unknown] (9 cpu-clock samples, 2.27%)[unknown] (9 cpu-clock samples, 2.27%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (3 cpu-clock samples, 0.76%)_mid_memalign (3 cpu-clock samples, 0.76%)__posix_memalign (2 cpu-clock samples, 0.50%)malloc (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t) (4 cpu-clock samples, 1.01%)__posix_memalign (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (13 cpu-clock samples, 3.27%)qs..[unknown] (11 cpu-clock samples, 2.77%)[..[unknown] (11 cpu-clock samples, 2.77%)[..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (9 cpu-clock samples, 2.27%)malloc (5 cpu-clock samples, 1.26%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (14 cpu-clock samples, 3.53%)qsl..[unknown] (14 cpu-clock samples, 3.53%)[un..[unknown] (14 cpu-clock samples, 3.53%)[un..cfree@GLIBC_2.17 (7 cpu-clock samples, 1.76%)operator new(unsigned long) (7 cpu-clock samples, 1.76%)malloc (5 cpu-clock samples, 1.26%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)malloc@plt (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)posix_memalign@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (1 cpu-clock samples, 0.25%)_mid_memalign (1 cpu-clock samples, 0.25%)__posix_memalign (1 cpu-clock samples, 0.25%)malloc (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)__posix_memalign (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)operator new(unsigned long) (5 cpu-clock samples, 1.26%)malloc (5 cpu-clock samples, 1.26%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (5 cpu-clock samples, 1.26%)[unknown] (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)_mid_memalign (4 cpu-clock samples, 1.01%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (4 cpu-clock samples, 1.01%)[unknown] (4 cpu-clock samples, 1.01%)[unknown] (4 cpu-clock samples, 1.01%)cfree@GLIBC_2.17 (4 cpu-clock samples, 1.01%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t)@plt (2 cpu-clock samples, 0.50%)__libc_start_call_main (7 cpu-clock samples, 1.76%)[unknown] (7 cpu-clock samples, 1.76%)[unknown] (7 cpu-clock samples, 1.76%)cfree@GLIBC_2.17 (7 cpu-clock samples, 1.76%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (6 cpu-clock samples, 1.51%)[unknown] (6 cpu-clock samples, 1.51%)[unknown] (6 cpu-clock samples, 1.51%)cfree@GLIBC_2.17 (6 cpu-clock samples, 1.51%)main (16 cpu-clock samples, 4.03%)main[unknown] (11 cpu-clock samples, 2.77%)[..[unknown] (11 cpu-clock samples, 2.77%)[..operator new(unsigned long) (11 cpu-clock samples, 2.77%)o..malloc (9 cpu-clock samples, 2.27%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (4 cpu-clock samples, 1.01%)operator new(unsigned long) (5 cpu-clock samples, 1.26%)malloc@plt (5 cpu-clock samples, 1.26%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (6 cpu-clock samples, 1.51%)[unknown] (4 cpu-clock samples, 1.01%)[unknown] (4 cpu-clock samples, 1.01%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)malloc (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (2 cpu-clock samples, 0.50%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (14 cpu-clock samples, 3.53%)qsl..[unknown] (10 cpu-clock samples, 2.52%)[..[unknown] (10 cpu-clock samples, 2.52%)[..[unknown] (5 cpu-clock samples, 1.26%)[unknown] (3 cpu-clock samples, 0.76%)[unknown] (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)_mid_memalign (2 cpu-clock samples, 0.50%)__posix_memalign (2 cpu-clock samples, 0.50%)malloc (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t) (5 cpu-clock samples, 1.26%)__posix_memalign (4 cpu-clock samples, 1.01%)memcpy@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (17 cpu-clock samples, 4.28%)qsl:..free@plt (2 cpu-clock samples, 0.50%)operator delete(void*, std::align_val_t)@plt (6 cpu-clock samples, 1.51%)operator delete(void*, unsigned long, std::align_val_t)@plt (6 cpu-clock samples, 1.51%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)@plt (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook()@plt (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (2 cpu-clock samples, 0.50%)free@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (3 cpu-clock samples, 0.76%)operator new(unsigned long)@plt (3 cpu-clock samples, 0.76%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)malloc (2 cpu-clock samples, 0.50%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (3 cpu-clock samples, 0.76%)memcpy@plt (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (6 cpu-clock samples, 1.51%)free@plt (2 cpu-clock samples, 0.50%)operator delete(void*, unsigned long, std::align_val_t)@plt (4 cpu-clock samples, 1.01%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (5 cpu-clock samples, 1.26%)operator new(unsigned long, std::align_val_t)@plt (5 cpu-clock samples, 1.26%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)operator delete(void*, std::align_val_t)@plt (1 cpu-clock samples, 0.25%) +]]>QSL Matching-Engine Flame Graph (qsl-bench)flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 423 samples | 163 stacks | 2026-06-21T12:54:50ZSearch all (423 cpu-clock samples, 100.00%)allqsl-bench (423 cpu-clock samples, 100.00%)qsl-bench[unknown] (343 cpu-clock samples, 81.09%)[unknown][unknown] (324 cpu-clock samples, 76.60%)[unknown][unknown] (286 cpu-clock samples, 67.61%)[unknown]__libc_start_call_main (286 cpu-clock samples, 67.61%)__libc_start_call_mainmain (286 cpu-clock samples, 67.61%)maincfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (47 cpu-clock samples, 11.11%)qsl::engine::Or..qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (6 cpu-clock samples, 1.42%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (33 cpu-clock samples, 7.80%)qsl::engin..qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (14 cpu-clock samples, 3.31%)qs..std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (11 cpu-clock samples, 2.60%)s..std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (9 cpu-clock samples, 2.13%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (5 cpu-clock samples, 1.18%)std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const (2 cpu-clock samples, 0.47%)std::pmr::(anonymous namespace)::newdel_res_t::do_allocate(unsigned long, unsigned long) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::cancel(unsigned long) (30 cpu-clock samples, 7.09%)qsl::engi..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (30 cpu-clock samples, 7.09%)decltype(..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (17 cpu-clock samples, 4.02%)qsl..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.47%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (3 cpu-clock samples, 0.71%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (23 cpu-clock samples, 5.44%)qsl::e..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (67 cpu-clock samples, 15.84%)qsl::gateway::Session::..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (67 cpu-clock samples, 15.84%)qsl::gateway::Session::..__memcpy_generic (1 cpu-clock samples, 0.24%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (61 cpu-clock samples, 14.42%)qsl::gateway::Session..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (15 cpu-clock samples, 3.55%)qsl..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.47%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (7 cpu-clock samples, 1.65%)__memcpy_generic (1 cpu-clock samples, 0.24%)operator new(unsigned long) (2 cpu-clock samples, 0.47%)qsl::protocol::encode(qsl::protocol::Fill const&) (2 cpu-clock samples, 0.47%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (36 cpu-clock samples, 8.51%)qsl::gatewa..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (7 cpu-clock samples, 1.65%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (24 cpu-clock samples, 5.67%)qsl::e..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (3 cpu-clock samples, 0.71%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (12 cpu-clock samples, 2.84%)q..__memcpy_generic (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (2 cpu-clock samples, 0.47%)operator new(unsigned long) (2 cpu-clock samples, 0.47%)malloc (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::check_limit(qsl::engine::RiskConfig const&, qsl::core::Side, long, unsigned int) (3 cpu-clock samples, 0.71%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (5 cpu-clock samples, 1.18%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.71%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (2 cpu-clock samples, 0.47%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (28 cpu-clock samples, 6.62%)qsl::pro..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (5 cpu-clock samples, 1.18%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (26 cpu-clock samples, 6.15%)qsl::re..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::cancel(unsigned long) (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (17 cpu-clock samples, 4.02%)qsl..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (10 cpu-clock samples, 2.36%)q..qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.71%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (7 cpu-clock samples, 1.65%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (6 cpu-clock samples, 1.42%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (4 cpu-clock samples, 0.95%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.47%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)__memcpy_generic (1 cpu-clock samples, 0.24%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (17 cpu-clock samples, 4.02%)qsl..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (6 cpu-clock samples, 1.42%)qsl::engine::OrderBook::contains(unsigned long) const (6 cpu-clock samples, 1.42%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (3 cpu-clock samples, 0.71%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.47%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (36 cpu-clock samples, 8.51%)qsl::replay..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (34 cpu-clock samples, 8.04%)qsl::repla..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (6 cpu-clock samples, 1.42%)qsl::engine::OrderBook::cancel(unsigned long) (5 cpu-clock samples, 1.18%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.47%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (8 cpu-clock samples, 1.89%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (6 cpu-clock samples, 1.42%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (5 cpu-clock samples, 1.18%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (15 cpu-clock samples, 3.55%)qsl..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.07%)qs..__memcpy_generic (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (10 cpu-clock samples, 2.36%)q..qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (8 cpu-clock samples, 1.89%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (6 cpu-clock samples, 1.42%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 0.95%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (10 cpu-clock samples, 2.36%)q..[unknown] (10 cpu-clock samples, 2.36%)[..[unknown] (10 cpu-clock samples, 2.36%)[..[unknown] (6 cpu-clock samples, 1.42%)[unknown] (4 cpu-clock samples, 0.95%)_mid_memalign (4 cpu-clock samples, 0.95%)__posix_memalign (2 cpu-clock samples, 0.47%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (4 cpu-clock samples, 0.95%)__posix_memalign (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (6 cpu-clock samples, 1.42%)[unknown] (6 cpu-clock samples, 1.42%)[unknown] (6 cpu-clock samples, 1.42%)[unknown] (5 cpu-clock samples, 1.18%)[unknown] (2 cpu-clock samples, 0.47%)_mid_memalign (2 cpu-clock samples, 0.47%)__posix_memalign (3 cpu-clock samples, 0.71%)malloc (2 cpu-clock samples, 0.47%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (10 cpu-clock samples, 2.36%)q..[unknown] (5 cpu-clock samples, 1.18%)[unknown] (5 cpu-clock samples, 1.18%)operator new(unsigned long) (5 cpu-clock samples, 1.18%)malloc (3 cpu-clock samples, 0.71%)free@plt (2 cpu-clock samples, 0.47%)operator delete(void*)@plt (2 cpu-clock samples, 0.47%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (11 cpu-clock samples, 2.60%)q..[unknown] (11 cpu-clock samples, 2.60%)[..[unknown] (11 cpu-clock samples, 2.60%)[..cfree@GLIBC_2.17 (8 cpu-clock samples, 1.89%)operator new(unsigned long) (3 cpu-clock samples, 0.71%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.71%)[unknown] (3 cpu-clock samples, 0.71%)[unknown] (3 cpu-clock samples, 0.71%)[unknown] (2 cpu-clock samples, 0.47%)[unknown] (1 cpu-clock samples, 0.24%)_mid_memalign (1 cpu-clock samples, 0.24%)__posix_memalign (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)[unknown] (2 cpu-clock samples, 0.47%)[unknown] (2 cpu-clock samples, 0.47%)operator new(unsigned long) (2 cpu-clock samples, 0.47%)malloc (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (7 cpu-clock samples, 1.65%)[unknown] (5 cpu-clock samples, 1.18%)[unknown] (5 cpu-clock samples, 1.18%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (1 cpu-clock samples, 0.24%)_mid_memalign (1 cpu-clock samples, 0.24%)__posix_memalign (3 cpu-clock samples, 0.71%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)__posix_memalign (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)cfree@GLIBC_2.17 (4 cpu-clock samples, 0.95%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)__libc_start_call_main (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)cfree@GLIBC_2.17 (7 cpu-clock samples, 1.65%)_start (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (3 cpu-clock samples, 0.71%)[unknown] (1 cpu-clock samples, 0.24%)do_lookup_x (1 cpu-clock samples, 0.24%)dl_relocate_ld (1 cpu-clock samples, 0.24%)_dl_lookup_symbol_x (2 cpu-clock samples, 0.47%)_dl_new_hash (1 cpu-clock samples, 0.24%)_dl_relocate_object_no_relro (1 cpu-clock samples, 0.24%)elf_dynamic_do_Rela (1 cpu-clock samples, 0.24%)elf_machine_rela (1 cpu-clock samples, 0.24%)resolve_map (1 cpu-clock samples, 0.24%)dl_symbol_visibility_binds_local_p (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (19 cpu-clock samples, 4.49%)decl..[unknown] (19 cpu-clock samples, 4.49%)[unk..[unknown] (19 cpu-clock samples, 4.49%)[unk..cfree@GLIBC_2.17 (19 cpu-clock samples, 4.49%)cfre..main (5 cpu-clock samples, 1.18%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (3 cpu-clock samples, 0.71%)malloc (3 cpu-clock samples, 0.71%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator new(unsigned long) (5 cpu-clock samples, 1.18%)malloc@plt (5 cpu-clock samples, 1.18%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (5 cpu-clock samples, 1.18%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (2 cpu-clock samples, 0.47%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (10 cpu-clock samples, 2.36%)q..[unknown] (9 cpu-clock samples, 2.13%)[unknown] (9 cpu-clock samples, 2.13%)[unknown] (6 cpu-clock samples, 1.42%)[unknown] (2 cpu-clock samples, 0.47%)_mid_memalign (2 cpu-clock samples, 0.47%)__posix_memalign (4 cpu-clock samples, 0.95%)malloc (3 cpu-clock samples, 0.71%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.47%)free@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.71%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_unhook()@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (5 cpu-clock samples, 1.18%)free@plt (1 cpu-clock samples, 0.24%)memcpy@plt (2 cpu-clock samples, 0.47%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (5 cpu-clock samples, 1.18%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)cfree@GLIBC_2.17 (4 cpu-clock samples, 0.95%)memcpy@plt (1 cpu-clock samples, 0.24%)qsl::protocol::encode(qsl::protocol::Ack const&) (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.47%)operator new(unsigned long) (5 cpu-clock samples, 1.18%)malloc (5 cpu-clock samples, 1.18%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (2 cpu-clock samples, 0.47%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%) diff --git a/results/flamegraph.txt b/results/flamegraph.txt index 4560ad7..0cbec7f 100644 --- a/results/flamegraph.txt +++ b/results/flamegraph.txt @@ -8,19 +8,19 @@ Perf: perf version 6.19.14-400.asahi.fc44.aarch64 Perf paranoid: 2 Build type: Release Provenance version: 1 -Git commit (informational): 0c3b401 -Source digest: sha256:0d8061b5c92b9a8a1f3bffd14a340e733f28674b14d5716c2eaa6bdb00b31242 +Git commit (informational): 872600a +Source digest: sha256:211e5835552616102fbe44d8f10dfa7cb6a4b35495dca98243bc87d37c45cfb0 Source digest scope: flamegraph-benchmark Dirty inputs: no Generated output: results/flamegraph.svg -Date: 2026-06-21T06:36:51Z +Date: 2026-06-21T12:54:50Z Benchmark binary: build/bench/qsl-bench Dataset: qsl-bench default synthetic benchmark suite Call graph: dwarf Record event: cpu-clock Sample freq: 4000 Hz -Sample count: 397 -Folded stacks: 171 +Sample count: 423 +Folded stacks: 163 Minimum samples for hot profile: 200 Insufficient samples: no Record status: 0 @@ -34,25 +34,25 @@ investigation. Frame width is proportional to on-CPU samples, not wall-clock latency or throughput, and is hardware/kernel/compiler/build dependent. Top 15 folded stacks (count stack): - 15 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) - 9 qsl-bench;main;[unknown];[unknown];operator new(unsigned long);malloc + 28 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) + 23 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) + 19 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 + 14 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) + 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long);std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&) + 10 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] + 9 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) + 8 qsl-bench;[unknown];[unknown];qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];cfree@GLIBC_2.17 + 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) 7 qsl-bench;__libc_start_call_main;[unknown];[unknown];cfree@GLIBC_2.17 - 7 qsl-bench;[unknown];[unknown];qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];cfree@GLIBC_2.17 - 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main - 6 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 - 6 qsl-bench;qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);operator delete(void*, std::align_val_t)@plt - 6 qsl-bench;qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);operator delete(void*, unsigned long, std::align_val_t)@plt - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::protocol::decode_new_order(std::span) - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&);qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - 5 qsl-bench;operator new(unsigned long);malloc@plt - 5 qsl-bench;std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&);operator new(unsigned long, std::align_val_t)@plt - 5 qsl-bench;[unknown];qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&);[unknown];[unknown];operator new(unsigned long);malloc - 5 qsl-bench;[unknown];[unknown];qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long);[unknown];[unknown];[unknown];__posix_memalign;malloc - 5 qsl-bench;[unknown];[unknown];qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector >&, unsigned long);[unknown];[unknown];operator new(unsigned long);malloc + 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::has_symbol(unsigned int) const + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long);qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const Benchmark output: -order_book add/mod/cancel 200000 ops 151.3 ns/op 6607667 ops/sec -protocol encode+decode 500000 ops 21.8 ns/op 45829279 ops/sec -gateway session (fill) 200000 ops 132.3 ns/op 7556487 ops/sec -matching engine flow 5004 items 104.7 ns/item 9553139 items/sec -replay command log 5004 items 115.1 ns/item 8690129 items/sec +order_book add/mod/cancel 200000 ops 140.7 ns/op 7107229 ops/sec +protocol encode+decode 500000 ops 21.0 ns/op 47719996 ops/sec +gateway session (fill) 200000 ops 129.6 ns/op 7715309 ops/sec +matching engine flow 5004 items 102.3 ns/item 9773521 items/sec +replay command log 5004 items 111.8 ns/item 8946368 items/sec From 52de5b8cbaed5fb79ac0a14304d6e0b4070a0dd4 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 21:13:08 -0400 Subject: [PATCH 05/11] refactor: improve flamegraph.py code health (CodeScene gate) CodeScene's delta gate scored scripts/flamegraph.py at 7.81 (render_svg: Large Method + Excess Arguments + complexity; fold_perf_script: Bumpy Road / nested complexity). Restructure without changing output: - fold_perf_script: move per-line state into a small _Folder helper so the parsing loop is a flat if/elif/else instead of a nested block. - render_svg: bundle styling knobs into a FlameOptions dataclass (2 args, was 7) and extract _append_chrome, _frame_svg, _truncate; geometry constants (_SIDE/_PAD_TOP/_PAD_BOTTOM) hoisted to module scope and a _Canvas dataclass carries derived geometry. Emitted SVG/collapsed bytes are unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 --- scripts/flamegraph.py | 228 ++++++++++++++++++++++++++---------------- 1 file changed, 142 insertions(+), 86 deletions(-) diff --git a/scripts/flamegraph.py b/scripts/flamegraph.py index 3af5110..96accb8 100755 --- a/scripts/flamegraph.py +++ b/scripts/flamegraph.py @@ -28,6 +28,12 @@ import re import sys import zlib +from dataclasses import dataclass + +# SVG layout constants (pixels). +_SIDE = 10 # left/right margin +_PAD_TOP = 54 # space above the frames for title/subtitle +_PAD_BOTTOM = 16 # space below the frames for the detail line # perf-script stack frame line: leading whitespace, hex address, symbol, "(dso)". # C++ symbols contain spaces and parentheses, so the dso is taken as the final @@ -58,38 +64,59 @@ def _clean_symbol(rest: str) -> str: return rest -def fold_perf_script(lines) -> dict[str, int]: - """Collapse `perf script` output into {stack_string: sample_count}.""" - folded: dict[str, int] = {} - comm = "" - stack: list[str] = [] - - def flush() -> None: - nonlocal stack, comm - if stack: - frames = list(reversed(stack)) - if comm: - frames.insert(0, comm) +class _Folder: + """Accumulates `perf script` samples into collapsed {stack: count} pairs. + + Keeping the per-line state transitions as small methods keeps the parsing + loop flat (one if/elif/else) instead of a deeply nested block. + """ + + def __init__(self) -> None: + self.folded: dict[str, int] = {} + self._comm = "" + self._stack: list[str] = [] + + def start_sample(self, header: str) -> None: + # Header line: "comm pid timestamp: period event:". Finalize any prior + # sample (perf usually separates with a blank line, but not always). + self._flush() + self._comm = header.split()[0] + + def add_frame(self, line: str) -> None: + m = _FRAME_RE.match(line) + if m: + self._stack.append(_clean_symbol(m.group("rest"))) + + def end_sample(self) -> None: + self._flush() + self._comm = "" + + def _flush(self) -> None: + if self._stack: + frames = list(reversed(self._stack)) # perf prints leaf-first + if self._comm: + frames.insert(0, self._comm) key = ";".join(frames) - folded[key] = folded.get(key, 0) + 1 - stack = [] + self.folded[key] = self.folded.get(key, 0) + 1 + self._stack = [] + + def result(self) -> dict[str, int]: + self._flush() + return self.folded + +def fold_perf_script(lines) -> dict[str, int]: + """Collapse `perf script` output into {stack_string: sample_count}.""" + folder = _Folder() for raw in lines: line = raw.rstrip("\n") if not line.strip(): - flush() - comm = "" - continue - if line[0].isspace(): - m = _FRAME_RE.match(line) - if m: - stack.append(_clean_symbol(m.group("rest"))) - continue - # Header line: "comm pid timestamp: period event:" -> capture comm. - flush() - comm = line.split()[0] - flush() - return folded + folder.end_sample() + elif line[0].isspace(): + folder.add_frame(line) + else: + folder.start_sample(line) + return folder.result() def parse_collapsed(lines) -> dict[str, int]: @@ -163,31 +190,34 @@ def _layout(node: _Node, depth: int, x: int, total: int, out: list) -> None: cursor += child.value -def render_svg( - root: _Node, - *, - title: str, - subtitle: str, - width: int = 1200, - frame_height: int = 16, - min_px: float = 0.1, - countname: str = "samples", -) -> str: - total = root.value or 1 - placed: list = [] - _layout(root, 0, 0, total, placed) - max_depth = max((d for _, d, _ in placed), default=0) +@dataclass +class FlameOptions: + """Styling/labelling knobs for an SVG render.""" - pad_top = 54 - pad_bottom = 16 - side = 10 - plot_width = width - 2 * side - height = pad_top + (max_depth + 1) * frame_height + pad_bottom + title: str = "QSL Flame Graph" + subtitle: str = "" + countname: str = "samples" + width: int = 1200 + frame_height: int = 16 + min_px: float = 0.1 - def px(samples: int) -> float: - return samples / total * plot_width - parts: list[str] = [] +@dataclass +class _Canvas: + """Derived geometry passed to per-frame rendering.""" + + total: int + max_depth: int + height: int + plot_width: int + frame_height: int + min_px: float + countname: str + + +def _append_chrome(parts: list, opts: FlameOptions, height: int) -> None: + """Append the static page furniture: SVG root, style, title, controls.""" + width = opts.width parts.append( f'\n' f' float: parts.append(f'') parts.append( f'{html.escape(title)}' + f'font-size="17" font-weight="bold">{html.escape(opts.title)}' ) parts.append( f'' - f'{html.escape(subtitle)}' + f'{html.escape(opts.subtitle)}' ) parts.append( - f'Search' ) parts.append( - f' ' + f' ' ) - for node, depth, x in placed: - w = px(node.value) - if w < min_px: - continue - x_px = side + px(x) - y = pad_top + (max_depth - depth) * frame_height - pct = node.value / total * 100.0 - label = node.name - # Approx 7px per char at this font; reserve 6px padding. - maxchars = int((w - 6) / 7) - text = "" - if maxchars >= 3: - text = label if len(label) <= maxchars else label[: maxchars - 2] + ".." - tip = f"{label} ({node.value} {countname}, {pct:.2f}%)" - parts.append(f'') - parts.append(f"{html.escape(tip)}") - parts.append( - f'' + +def _truncate(label: str, width_px: float) -> str: + """Fit a label into a frame, ~7px/char with 6px padding (else nothing).""" + maxchars = int((width_px - 6) / 7) + if maxchars < 3: + return "" + return label if len(label) <= maxchars else label[: maxchars - 2] + ".." + + +def _frame_svg(c: _Canvas, node: _Node, depth: int, x: int) -> str: + """Render one frame's group, or "" when narrower than the cutoff.""" + w = node.value / c.total * c.plot_width + if w < c.min_px: + return "" + x_px = _SIDE + x / c.total * c.plot_width + y = _PAD_TOP + (c.max_depth - depth) * c.frame_height + pct = node.value / c.total * 100.0 + tip = f"{node.name} ({node.value} {c.countname}, {pct:.2f}%)" + out = [ + f'', + f"{html.escape(tip)}", + f'', + ] + text = _truncate(node.name, w) + if text: + out.append( + f'{html.escape(text)}' ) - if text: - parts.append( - f'{html.escape(text)}' - ) - parts.append("") + out.append("") + return "".join(out) + +def render_svg(root: _Node, opts: FlameOptions | None = None) -> str: + opts = opts or FlameOptions() + total = root.value or 1 + placed: list = [] + _layout(root, 0, 0, total, placed) + max_depth = max((d for _, d, _ in placed), default=0) + height = _PAD_TOP + (max_depth + 1) * opts.frame_height + _PAD_BOTTOM + canvas = _Canvas( + total=total, + max_depth=max_depth, + height=height, + plot_width=opts.width - 2 * _SIDE, + frame_height=opts.frame_height, + min_px=opts.min_px, + countname=opts.countname, + ) + + parts: list[str] = [] + _append_chrome(parts, opts, height) + for node, depth, x in placed: + parts.append(_frame_svg(canvas, node, depth, x)) parts.append("\n") return "".join(parts) @@ -299,15 +357,13 @@ def main(argv=None) -> int: return 1 root = build_tree(folded, args.root_name) - sys.stdout.write( - render_svg( - root, - title=args.title, - subtitle=args.subtitle, - width=args.width, - countname=args.countname, - ) + opts = FlameOptions( + title=args.title, + subtitle=args.subtitle, + countname=args.countname, + width=args.width, ) + sys.stdout.write(render_svg(root, opts)) return 0 From d4be2daf640d442415ed3b315ce7a0a564cdccdd Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 21:16:15 -0400 Subject: [PATCH 06/11] perf: regenerate flamegraph artifact after code-health refactor flamegraph.py is a provenance input; regenerate results/flamegraph.svg + .txt from the clean tree (402 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 --- results/flamegraph.svg | 12 +++++------ results/flamegraph.txt | 48 +++++++++++++++++++++--------------------- 2 files changed, 30 insertions(+), 30 deletions(-) diff --git a/results/flamegraph.svg b/results/flamegraph.svg index fc87dda..7882ae3 100644 --- a/results/flamegraph.svg +++ b/results/flamegraph.svg @@ -2,20 +2,20 @@ -QSL Matching-Engine Flame Graph (qsl-bench)flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 423 samples | 163 stacks | 2026-06-21T12:54:50ZSearch all (423 cpu-clock samples, 100.00%)allqsl-bench (423 cpu-clock samples, 100.00%)qsl-bench[unknown] (343 cpu-clock samples, 81.09%)[unknown][unknown] (324 cpu-clock samples, 76.60%)[unknown][unknown] (286 cpu-clock samples, 67.61%)[unknown]__libc_start_call_main (286 cpu-clock samples, 67.61%)__libc_start_call_mainmain (286 cpu-clock samples, 67.61%)maincfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (47 cpu-clock samples, 11.11%)qsl::engine::Or..qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (6 cpu-clock samples, 1.42%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (33 cpu-clock samples, 7.80%)qsl::engin..qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (14 cpu-clock samples, 3.31%)qs..std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (11 cpu-clock samples, 2.60%)s..std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (9 cpu-clock samples, 2.13%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (5 cpu-clock samples, 1.18%)std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const (2 cpu-clock samples, 0.47%)std::pmr::(anonymous namespace)::newdel_res_t::do_allocate(unsigned long, unsigned long) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::cancel(unsigned long) (30 cpu-clock samples, 7.09%)qsl::engi..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (30 cpu-clock samples, 7.09%)decltype(..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (17 cpu-clock samples, 4.02%)qsl..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.47%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (3 cpu-clock samples, 0.71%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (23 cpu-clock samples, 5.44%)qsl::e..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (67 cpu-clock samples, 15.84%)qsl::gateway::Session::..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (67 cpu-clock samples, 15.84%)qsl::gateway::Session::..__memcpy_generic (1 cpu-clock samples, 0.24%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (61 cpu-clock samples, 14.42%)qsl::gateway::Session..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (15 cpu-clock samples, 3.55%)qsl..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.47%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (7 cpu-clock samples, 1.65%)__memcpy_generic (1 cpu-clock samples, 0.24%)operator new(unsigned long) (2 cpu-clock samples, 0.47%)qsl::protocol::encode(qsl::protocol::Fill const&) (2 cpu-clock samples, 0.47%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (36 cpu-clock samples, 8.51%)qsl::gatewa..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (7 cpu-clock samples, 1.65%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (24 cpu-clock samples, 5.67%)qsl::e..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (3 cpu-clock samples, 0.71%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (12 cpu-clock samples, 2.84%)q..__memcpy_generic (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (2 cpu-clock samples, 0.47%)operator new(unsigned long) (2 cpu-clock samples, 0.47%)malloc (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::check_limit(qsl::engine::RiskConfig const&, qsl::core::Side, long, unsigned int) (3 cpu-clock samples, 0.71%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (5 cpu-clock samples, 1.18%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.71%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (2 cpu-clock samples, 0.47%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (28 cpu-clock samples, 6.62%)qsl::pro..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (5 cpu-clock samples, 1.18%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (26 cpu-clock samples, 6.15%)qsl::re..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::cancel(unsigned long) (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (17 cpu-clock samples, 4.02%)qsl..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (10 cpu-clock samples, 2.36%)q..qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.71%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (7 cpu-clock samples, 1.65%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (6 cpu-clock samples, 1.42%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (4 cpu-clock samples, 0.95%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.47%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)__memcpy_generic (1 cpu-clock samples, 0.24%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (17 cpu-clock samples, 4.02%)qsl..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (6 cpu-clock samples, 1.42%)qsl::engine::OrderBook::contains(unsigned long) const (6 cpu-clock samples, 1.42%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (3 cpu-clock samples, 0.71%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.47%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (36 cpu-clock samples, 8.51%)qsl::replay..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (34 cpu-clock samples, 8.04%)qsl::repla..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (6 cpu-clock samples, 1.42%)qsl::engine::OrderBook::cancel(unsigned long) (5 cpu-clock samples, 1.18%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.47%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (8 cpu-clock samples, 1.89%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (6 cpu-clock samples, 1.42%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (5 cpu-clock samples, 1.18%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (15 cpu-clock samples, 3.55%)qsl..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.07%)qs..__memcpy_generic (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (10 cpu-clock samples, 2.36%)q..qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (8 cpu-clock samples, 1.89%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (6 cpu-clock samples, 1.42%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (4 cpu-clock samples, 0.95%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 0.95%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (10 cpu-clock samples, 2.36%)q..[unknown] (10 cpu-clock samples, 2.36%)[..[unknown] (10 cpu-clock samples, 2.36%)[..[unknown] (6 cpu-clock samples, 1.42%)[unknown] (4 cpu-clock samples, 0.95%)_mid_memalign (4 cpu-clock samples, 0.95%)__posix_memalign (2 cpu-clock samples, 0.47%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (4 cpu-clock samples, 0.95%)__posix_memalign (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (6 cpu-clock samples, 1.42%)[unknown] (6 cpu-clock samples, 1.42%)[unknown] (6 cpu-clock samples, 1.42%)[unknown] (5 cpu-clock samples, 1.18%)[unknown] (2 cpu-clock samples, 0.47%)_mid_memalign (2 cpu-clock samples, 0.47%)__posix_memalign (3 cpu-clock samples, 0.71%)malloc (2 cpu-clock samples, 0.47%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (10 cpu-clock samples, 2.36%)q..[unknown] (5 cpu-clock samples, 1.18%)[unknown] (5 cpu-clock samples, 1.18%)operator new(unsigned long) (5 cpu-clock samples, 1.18%)malloc (3 cpu-clock samples, 0.71%)free@plt (2 cpu-clock samples, 0.47%)operator delete(void*)@plt (2 cpu-clock samples, 0.47%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (11 cpu-clock samples, 2.60%)q..[unknown] (11 cpu-clock samples, 2.60%)[..[unknown] (11 cpu-clock samples, 2.60%)[..cfree@GLIBC_2.17 (8 cpu-clock samples, 1.89%)operator new(unsigned long) (3 cpu-clock samples, 0.71%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.71%)[unknown] (3 cpu-clock samples, 0.71%)[unknown] (3 cpu-clock samples, 0.71%)[unknown] (2 cpu-clock samples, 0.47%)[unknown] (1 cpu-clock samples, 0.24%)_mid_memalign (1 cpu-clock samples, 0.24%)__posix_memalign (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.47%)[unknown] (2 cpu-clock samples, 0.47%)[unknown] (2 cpu-clock samples, 0.47%)operator new(unsigned long) (2 cpu-clock samples, 0.47%)malloc (2 cpu-clock samples, 0.47%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (7 cpu-clock samples, 1.65%)[unknown] (5 cpu-clock samples, 1.18%)[unknown] (5 cpu-clock samples, 1.18%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (1 cpu-clock samples, 0.24%)_mid_memalign (1 cpu-clock samples, 0.24%)__posix_memalign (3 cpu-clock samples, 0.71%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)__posix_memalign (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)cfree@GLIBC_2.17 (4 cpu-clock samples, 0.95%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)__libc_start_call_main (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)cfree@GLIBC_2.17 (7 cpu-clock samples, 1.65%)_start (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (3 cpu-clock samples, 0.71%)[unknown] (1 cpu-clock samples, 0.24%)do_lookup_x (1 cpu-clock samples, 0.24%)dl_relocate_ld (1 cpu-clock samples, 0.24%)_dl_lookup_symbol_x (2 cpu-clock samples, 0.47%)_dl_new_hash (1 cpu-clock samples, 0.24%)_dl_relocate_object_no_relro (1 cpu-clock samples, 0.24%)elf_dynamic_do_Rela (1 cpu-clock samples, 0.24%)elf_machine_rela (1 cpu-clock samples, 0.24%)resolve_map (1 cpu-clock samples, 0.24%)dl_symbol_visibility_binds_local_p (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (19 cpu-clock samples, 4.49%)decl..[unknown] (19 cpu-clock samples, 4.49%)[unk..[unknown] (19 cpu-clock samples, 4.49%)[unk..cfree@GLIBC_2.17 (19 cpu-clock samples, 4.49%)cfre..main (5 cpu-clock samples, 1.18%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (3 cpu-clock samples, 0.71%)malloc (3 cpu-clock samples, 0.71%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator new(unsigned long) (5 cpu-clock samples, 1.18%)malloc@plt (5 cpu-clock samples, 1.18%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (5 cpu-clock samples, 1.18%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (2 cpu-clock samples, 0.47%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (10 cpu-clock samples, 2.36%)q..[unknown] (9 cpu-clock samples, 2.13%)[unknown] (9 cpu-clock samples, 2.13%)[unknown] (6 cpu-clock samples, 1.42%)[unknown] (2 cpu-clock samples, 0.47%)_mid_memalign (2 cpu-clock samples, 0.47%)__posix_memalign (4 cpu-clock samples, 0.95%)malloc (3 cpu-clock samples, 0.71%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.47%)free@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.71%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_unhook()@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (5 cpu-clock samples, 1.18%)free@plt (1 cpu-clock samples, 0.24%)memcpy@plt (2 cpu-clock samples, 0.47%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (5 cpu-clock samples, 1.18%)[unknown] (4 cpu-clock samples, 0.95%)[unknown] (4 cpu-clock samples, 0.95%)cfree@GLIBC_2.17 (4 cpu-clock samples, 0.95%)memcpy@plt (1 cpu-clock samples, 0.24%)qsl::protocol::encode(qsl::protocol::Ack const&) (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)[unknown] (7 cpu-clock samples, 1.65%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.47%)operator new(unsigned long) (5 cpu-clock samples, 1.18%)malloc (5 cpu-clock samples, 1.18%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (2 cpu-clock samples, 0.47%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%) +]]>QSL Matching-Engine Flame Graph (qsl-bench)flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 402 samples | 164 stacks | 2026-06-22T01:13:09ZSearch all (402 cpu-clock samples, 100.00%)allqsl-bench (402 cpu-clock samples, 100.00%)qsl-bench[unknown] (322 cpu-clock samples, 80.10%)[unknown][unknown] (296 cpu-clock samples, 73.63%)[unknown][unknown] (245 cpu-clock samples, 60.95%)[unknown][unknown] (4 cpu-clock samples, 1.00%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)_dl_cache_libcmp (1 cpu-clock samples, 0.25%)check_match (2 cpu-clock samples, 0.50%)strcmp (1 cpu-clock samples, 0.25%)_dl_relocate_object_no_relro (1 cpu-clock samples, 0.25%)elf_dynamic_do_Rela (1 cpu-clock samples, 0.25%)elf_machine_rela (1 cpu-clock samples, 0.25%)resolve_map (1 cpu-clock samples, 0.25%)dl_symbol_visibility_binds_local_p (1 cpu-clock samples, 0.25%)__libc_start_call_main (241 cpu-clock samples, 59.95%)__libc_start_call_mainmain (241 cpu-clock samples, 59.95%)maincfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (43 cpu-clock samples, 10.70%)qsl::engine::Or..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (8 cpu-clock samples, 1.99%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (26 cpu-clock samples, 6.47%)qsl::eng..operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (8 cpu-clock samples, 1.99%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (5 cpu-clock samples, 1.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (12 cpu-clock samples, 2.99%)st..std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (5 cpu-clock samples, 1.24%)qsl::engine::OrderBook::cancel(unsigned long) (33 cpu-clock samples, 8.21%)qsl::engin..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (33 cpu-clock samples, 8.21%)decltype(a..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (18 cpu-clock samples, 4.48%)qsl:..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.75%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (6 cpu-clock samples, 1.49%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (3 cpu-clock samples, 0.75%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (11 cpu-clock samples, 2.74%)q..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (54 cpu-clock samples, 13.43%)qsl::gateway::Sessi..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (54 cpu-clock samples, 13.43%)qsl::gateway::Sessi..__memcpy_generic (1 cpu-clock samples, 0.25%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (47 cpu-clock samples, 11.69%)qsl::gateway::Se..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (11 cpu-clock samples, 2.74%)q..qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (5 cpu-clock samples, 1.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::Ack const&) (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::Fill const&) (3 cpu-clock samples, 0.75%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (29 cpu-clock samples, 7.21%)qsl::gate..qsl::engine::MatchingEngine::can_store_limit(unsigned int, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (19 cpu-clock samples, 4.73%)qsl::..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (11 cpu-clock samples, 2.74%)q..__memcpy_generic (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (8 cpu-clock samples, 1.99%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.50%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (4 cpu-clock samples, 1.00%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (1 cpu-clock samples, 0.25%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (16 cpu-clock samples, 3.98%)qsl..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (2 cpu-clock samples, 0.50%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (26 cpu-clock samples, 6.47%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (6 cpu-clock samples, 1.49%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (6 cpu-clock samples, 1.49%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.50%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.23%)qs..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (7 cpu-clock samples, 1.74%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (4 cpu-clock samples, 1.00%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (15 cpu-clock samples, 3.73%)qsl..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (6 cpu-clock samples, 1.49%)qsl::engine::OrderBook::contains(unsigned long) const (4 cpu-clock samples, 1.00%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (4 cpu-clock samples, 1.00%)qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::cancel(unsigned long) (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.25%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (34 cpu-clock samples, 8.46%)qsl::replay..__memcpy_generic (1 cpu-clock samples, 0.25%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (28 cpu-clock samples, 6.97%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::cancel(unsigned long) (4 cpu-clock samples, 1.00%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.75%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (7 cpu-clock samples, 1.74%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (6 cpu-clock samples, 1.49%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.75%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (1 cpu-clock samples, 0.25%)std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (10 cpu-clock samples, 2.49%)q..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (7 cpu-clock samples, 1.74%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.25%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 1.00%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.50%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (7 cpu-clock samples, 1.74%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.75%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.75%)std::_Rb_tree<unsigned int, std::pair<unsigned int const, qsl::engine::OrderBook>, std::_Select1st<std::pair<unsigned int const, qsl::engine::OrderBook> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, qsl::engine::OrderBook> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned int const, qsl::engine::OrderBook> >*) [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::~OrderBook() (1 cpu-clock samples, 0.25%)std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_erase(std::_Rb_tree_node<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >*) (1 cpu-clock samples, 0.25%)operator new(unsigned long) (5 cpu-clock samples, 1.24%)malloc@plt (5 cpu-clock samples, 1.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (8 cpu-clock samples, 1.99%)[unknown] (8 cpu-clock samples, 1.99%)[unknown] (8 cpu-clock samples, 1.99%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (3 cpu-clock samples, 0.75%)_mid_memalign (3 cpu-clock samples, 0.75%)__posix_memalign (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (4 cpu-clock samples, 1.00%)__posix_memalign (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (14 cpu-clock samples, 3.48%)qsl..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (11 cpu-clock samples, 2.74%)[..[unknown] (6 cpu-clock samples, 1.49%)_mid_memalign (6 cpu-clock samples, 1.49%)__posix_memalign (5 cpu-clock samples, 1.24%)malloc (5 cpu-clock samples, 1.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.50%)__posix_memalign (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (12 cpu-clock samples, 2.99%)qs..[unknown] (10 cpu-clock samples, 2.49%)[..[unknown] (10 cpu-clock samples, 2.49%)[..cfree@GLIBC_2.17 (4 cpu-clock samples, 1.00%)operator new(unsigned long) (6 cpu-clock samples, 1.49%)malloc (5 cpu-clock samples, 1.24%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (12 cpu-clock samples, 2.99%)qs..[unknown] (12 cpu-clock samples, 2.99%)[u..[unknown] (12 cpu-clock samples, 2.99%)[u..cfree@GLIBC_2.17 (6 cpu-clock samples, 1.49%)operator new(unsigned long) (6 cpu-clock samples, 1.49%)malloc (4 cpu-clock samples, 1.00%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)main (3 cpu-clock samples, 0.75%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.75%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)malloc@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.50%)posix_memalign@plt (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)_mid_memalign (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)operator new(unsigned long) (3 cpu-clock samples, 0.75%)malloc (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (6 cpu-clock samples, 1.49%)[unknown] (5 cpu-clock samples, 1.24%)[unknown] (5 cpu-clock samples, 1.24%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (1 cpu-clock samples, 0.25%)__libc_malloc2 (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)__posix_memalign (1 cpu-clock samples, 0.25%)malloc (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (3 cpu-clock samples, 0.75%)__posix_memalign (2 cpu-clock samples, 0.50%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (5 cpu-clock samples, 1.24%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (4 cpu-clock samples, 1.00%)cfree@GLIBC_2.17 (4 cpu-clock samples, 1.00%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)malloc (1 cpu-clock samples, 0.25%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t)@plt (2 cpu-clock samples, 0.50%)__libc_start_call_main (5 cpu-clock samples, 1.24%)[unknown] (5 cpu-clock samples, 1.24%)[unknown] (5 cpu-clock samples, 1.24%)cfree@GLIBC_2.17 (5 cpu-clock samples, 1.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (9 cpu-clock samples, 2.24%)[unknown] (9 cpu-clock samples, 2.24%)[unknown] (9 cpu-clock samples, 2.24%)cfree@GLIBC_2.17 (9 cpu-clock samples, 2.24%)main (17 cpu-clock samples, 4.23%)main[unknown] (12 cpu-clock samples, 2.99%)[u..[unknown] (12 cpu-clock samples, 2.99%)[u..operator new(unsigned long) (12 cpu-clock samples, 2.99%)op..malloc (8 cpu-clock samples, 1.99%)operator delete(void*)@plt (3 cpu-clock samples, 0.75%)operator delete(void*, unsigned long)@plt (2 cpu-clock samples, 0.50%)operator new(unsigned long) (3 cpu-clock samples, 0.75%)malloc@plt (3 cpu-clock samples, 0.75%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.23%)qs..[unknown] (6 cpu-clock samples, 1.49%)[unknown] (6 cpu-clock samples, 1.49%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (4 cpu-clock samples, 1.00%)malloc (2 cpu-clock samples, 0.50%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (2 cpu-clock samples, 0.50%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (16 cpu-clock samples, 3.98%)qsl..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (11 cpu-clock samples, 2.74%)[..[unknown] (7 cpu-clock samples, 1.74%)[unknown] (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)_mid_memalign (6 cpu-clock samples, 1.49%)__posix_memalign (4 cpu-clock samples, 1.00%)malloc (4 cpu-clock samples, 1.00%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.50%)__posix_memalign (2 cpu-clock samples, 0.50%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook()@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (1 cpu-clock samples, 0.25%)memcpy@plt (1 cpu-clock samples, 0.25%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (4 cpu-clock samples, 1.00%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.75%)memcpy@plt (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (2 cpu-clock samples, 0.50%)operator new(unsigned long)@plt (2 cpu-clock samples, 0.50%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)operator new(unsigned long) (3 cpu-clock samples, 0.75%)malloc (3 cpu-clock samples, 0.75%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (3 cpu-clock samples, 0.75%)free@plt (2 cpu-clock samples, 0.50%)operator delete(void*, unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.25%) diff --git a/results/flamegraph.txt b/results/flamegraph.txt index 0cbec7f..b0d8682 100644 --- a/results/flamegraph.txt +++ b/results/flamegraph.txt @@ -8,19 +8,19 @@ Perf: perf version 6.19.14-400.asahi.fc44.aarch64 Perf paranoid: 2 Build type: Release Provenance version: 1 -Git commit (informational): 872600a -Source digest: sha256:211e5835552616102fbe44d8f10dfa7cb6a4b35495dca98243bc87d37c45cfb0 +Git commit (informational): 52de5b8 +Source digest: sha256:75c1d53ba776085cb43ed6c600692286ab547ec20c9dc7a2018a56c222673f3c Source digest scope: flamegraph-benchmark Dirty inputs: no Generated output: results/flamegraph.svg -Date: 2026-06-21T12:54:50Z +Date: 2026-06-22T01:13:09Z Benchmark binary: build/bench/qsl-bench Dataset: qsl-bench default synthetic benchmark suite Call graph: dwarf Record event: cpu-clock Sample freq: 4000 Hz -Sample count: 423 -Folded stacks: 163 +Sample count: 402 +Folded stacks: 164 Minimum samples for hot profile: 200 Insufficient samples: no Record status: 0 @@ -34,25 +34,25 @@ investigation. Frame width is proportional to on-CPU samples, not wall-clock latency or throughput, and is hardware/kernel/compiler/build dependent. Top 15 folded stacks (count stack): - 28 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) - 23 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) - 19 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 - 14 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) - 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long);std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&) - 10 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] - 9 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) - 8 qsl-bench;[unknown];[unknown];qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];cfree@GLIBC_2.17 - 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - 7 qsl-bench;__libc_start_call_main;[unknown];[unknown];cfree@GLIBC_2.17 - 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::has_symbol(unsigned int) const - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) + 16 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) + 12 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] + 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) + 9 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 + 8 qsl-bench;main;[unknown];[unknown];operator new(unsigned long);malloc + 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) + 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) + 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) + 6 qsl-bench;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];[unknown];[unknown];_mid_memalign + 6 qsl-bench;[unknown];[unknown];qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);[unknown];[unknown];[unknown];[unknown];_mid_memalign + 6 qsl-bench;[unknown];[unknown];qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];cfree@GLIBC_2.17 + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long);qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const Benchmark output: -order_book add/mod/cancel 200000 ops 140.7 ns/op 7107229 ops/sec -protocol encode+decode 500000 ops 21.0 ns/op 47719996 ops/sec -gateway session (fill) 200000 ops 129.6 ns/op 7715309 ops/sec -matching engine flow 5004 items 102.3 ns/item 9773521 items/sec -replay command log 5004 items 111.8 ns/item 8946368 items/sec +order_book add/mod/cancel 200000 ops 133.5 ns/op 7487925 ops/sec +protocol encode+decode 500000 ops 20.7 ns/op 48254784 ops/sec +gateway session (fill) 200000 ops 128.0 ns/op 7812016 ops/sec +matching engine flow 5004 items 102.3 ns/item 9773237 items/sec +replay command log 5004 items 112.3 ns/item 8905762 items/sec From 4aec1d0b81d40a799d9c64077d87b57ed81ef230 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 21:28:01 -0400 Subject: [PATCH 07/11] refactor: flatten flamegraph.py remaining complexity (CodeScene) Clear the last two CodeScene flags on scripts/flamegraph.py: - _clean_symbol: replace the balanced-paren dso scan (a deep nested loop) with a flat regex _DSO_RE. perf prints a space before the "(dso)" and dso strings never contain parens, so a non-nested " (...)$" match is exact and won't strip a C++ signature's own parentheses. - _layout: drop the unused `total` parameter (5 args -> 4). Output unchanged; tests/shell/test_flamegraph.sh 19/19. Co-Authored-By: Claude Opus 4.8 --- scripts/flamegraph.py | 27 ++++++++++----------------- 1 file changed, 10 insertions(+), 17 deletions(-) diff --git a/scripts/flamegraph.py b/scripts/flamegraph.py index 96accb8..a9cc7f3 100755 --- a/scripts/flamegraph.py +++ b/scripts/flamegraph.py @@ -40,6 +40,10 @@ # parenthesized group and the symbol is everything between the address and it. _FRAME_RE = re.compile(r"^\s+(?P[0-9a-fA-F]+)\s+(?P.*\S)\s*$") _OFFSET_RE = re.compile(r"\+0x[0-9a-fA-F]+$") +# Trailing " (dso)" group. perf prints a space before the dso, and dso strings +# (paths or "[unknown]") never contain parens, so a non-nested match is exact and +# avoids stripping a C++ signature's own "(...)" (which has no preceding space). +_DSO_RE = re.compile(r"\s+\([^()]*\)$") def _clean_symbol(rest: str) -> str: @@ -47,21 +51,9 @@ def _clean_symbol(rest: str) -> str: Drops the trailing `(dso)` and the `+0xoffset`, matching stackcollapse-perf. """ - # Strip the final "(...)" dso group if present (balanced at end of line). - if rest.endswith(")"): - depth = 0 - for i in range(len(rest) - 1, -1, -1): - if rest[i] == ")": - depth += 1 - elif rest[i] == "(": - depth -= 1 - if depth == 0: - rest = rest[:i].rstrip() - break + rest = _DSO_RE.sub("", rest) rest = _OFFSET_RE.sub("", rest).strip() - if not rest: - return "[unknown]" - return rest + return rest if rest else "[unknown]" class _Folder: @@ -181,12 +173,13 @@ def _color(name: str) -> str: return f"rgb({r},{g},{b})" -def _layout(node: _Node, depth: int, x: int, total: int, out: list) -> None: +def _layout(node: _Node, depth: int, x: int, out: list) -> None: + """Pre-order walk assigning each node a (depth, x-offset-in-samples).""" out.append((node, depth, x)) cursor = x for name in sorted(node.children): child = node.children[name] - _layout(child, depth + 1, cursor, total, out) + _layout(child, depth + 1, cursor, out) cursor += child.value @@ -286,7 +279,7 @@ def render_svg(root: _Node, opts: FlameOptions | None = None) -> str: opts = opts or FlameOptions() total = root.value or 1 placed: list = [] - _layout(root, 0, 0, total, placed) + _layout(root, 0, 0, placed) max_depth = max((d for _, d, _ in placed), default=0) height = _PAD_TOP + (max_depth + 1) * opts.frame_height + _PAD_BOTTOM canvas = _Canvas( From 3905059ebade2a924e1fdb20d04b1aca512d8943 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 21:29:36 -0400 Subject: [PATCH 08/11] perf: regenerate flamegraph artifact after complexity flattening Provenance input changed; regenerate from clean tree (416 samples, Dirty inputs: no). Co-Authored-By: Claude Opus 4.8 --- results/flamegraph.svg | 12 +++++----- results/flamegraph.txt | 50 +++++++++++++++++++++--------------------- 2 files changed, 31 insertions(+), 31 deletions(-) diff --git a/results/flamegraph.svg b/results/flamegraph.svg index 7882ae3..378b45b 100644 --- a/results/flamegraph.svg +++ b/results/flamegraph.svg @@ -2,20 +2,20 @@ -QSL Matching-Engine Flame Graph (qsl-bench)flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 402 samples | 164 stacks | 2026-06-22T01:13:09ZSearch all (402 cpu-clock samples, 100.00%)allqsl-bench (402 cpu-clock samples, 100.00%)qsl-bench[unknown] (322 cpu-clock samples, 80.10%)[unknown][unknown] (296 cpu-clock samples, 73.63%)[unknown][unknown] (245 cpu-clock samples, 60.95%)[unknown][unknown] (4 cpu-clock samples, 1.00%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)_dl_cache_libcmp (1 cpu-clock samples, 0.25%)check_match (2 cpu-clock samples, 0.50%)strcmp (1 cpu-clock samples, 0.25%)_dl_relocate_object_no_relro (1 cpu-clock samples, 0.25%)elf_dynamic_do_Rela (1 cpu-clock samples, 0.25%)elf_machine_rela (1 cpu-clock samples, 0.25%)resolve_map (1 cpu-clock samples, 0.25%)dl_symbol_visibility_binds_local_p (1 cpu-clock samples, 0.25%)__libc_start_call_main (241 cpu-clock samples, 59.95%)__libc_start_call_mainmain (241 cpu-clock samples, 59.95%)maincfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (43 cpu-clock samples, 10.70%)qsl::engine::Or..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (8 cpu-clock samples, 1.99%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (26 cpu-clock samples, 6.47%)qsl::eng..operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (8 cpu-clock samples, 1.99%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (5 cpu-clock samples, 1.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (12 cpu-clock samples, 2.99%)st..std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (5 cpu-clock samples, 1.24%)qsl::engine::OrderBook::cancel(unsigned long) (33 cpu-clock samples, 8.21%)qsl::engin..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (33 cpu-clock samples, 8.21%)decltype(a..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (18 cpu-clock samples, 4.48%)qsl:..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.75%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (6 cpu-clock samples, 1.49%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (3 cpu-clock samples, 0.75%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (11 cpu-clock samples, 2.74%)q..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (54 cpu-clock samples, 13.43%)qsl::gateway::Sessi..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (54 cpu-clock samples, 13.43%)qsl::gateway::Sessi..__memcpy_generic (1 cpu-clock samples, 0.25%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (47 cpu-clock samples, 11.69%)qsl::gateway::Se..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (11 cpu-clock samples, 2.74%)q..qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (5 cpu-clock samples, 1.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::Ack const&) (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::Fill const&) (3 cpu-clock samples, 0.75%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (29 cpu-clock samples, 7.21%)qsl::gate..qsl::engine::MatchingEngine::can_store_limit(unsigned int, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (19 cpu-clock samples, 4.73%)qsl::..cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (11 cpu-clock samples, 2.74%)q..__memcpy_generic (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (8 cpu-clock samples, 1.99%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.50%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (4 cpu-clock samples, 1.00%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (1 cpu-clock samples, 0.25%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (16 cpu-clock samples, 3.98%)qsl..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (2 cpu-clock samples, 0.50%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (26 cpu-clock samples, 6.47%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (6 cpu-clock samples, 1.49%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (6 cpu-clock samples, 1.49%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.50%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.23%)qs..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (7 cpu-clock samples, 1.74%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (4 cpu-clock samples, 1.00%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (15 cpu-clock samples, 3.73%)qsl..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (6 cpu-clock samples, 1.49%)qsl::engine::OrderBook::contains(unsigned long) const (4 cpu-clock samples, 1.00%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (4 cpu-clock samples, 1.00%)qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::cancel(unsigned long) (1 cpu-clock samples, 0.25%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.25%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (34 cpu-clock samples, 8.46%)qsl::replay..__memcpy_generic (1 cpu-clock samples, 0.25%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (28 cpu-clock samples, 6.97%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::cancel(unsigned long) (4 cpu-clock samples, 1.00%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.75%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (7 cpu-clock samples, 1.74%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (6 cpu-clock samples, 1.49%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (4 cpu-clock samples, 1.00%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.75%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.25%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (1 cpu-clock samples, 0.25%)std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (10 cpu-clock samples, 2.49%)q..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (7 cpu-clock samples, 1.74%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.25%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 1.00%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.50%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.25%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (7 cpu-clock samples, 1.74%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.75%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.25%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.75%)std::_Rb_tree<unsigned int, std::pair<unsigned int const, qsl::engine::OrderBook>, std::_Select1st<std::pair<unsigned int const, qsl::engine::OrderBook> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, qsl::engine::OrderBook> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned int const, qsl::engine::OrderBook> >*) [clone .isra.0] (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::~OrderBook() (1 cpu-clock samples, 0.25%)std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_erase(std::_Rb_tree_node<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >*) (1 cpu-clock samples, 0.25%)operator new(unsigned long) (5 cpu-clock samples, 1.24%)malloc@plt (5 cpu-clock samples, 1.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (8 cpu-clock samples, 1.99%)[unknown] (8 cpu-clock samples, 1.99%)[unknown] (8 cpu-clock samples, 1.99%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (3 cpu-clock samples, 0.75%)_mid_memalign (3 cpu-clock samples, 0.75%)__posix_memalign (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (4 cpu-clock samples, 1.00%)__posix_memalign (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (14 cpu-clock samples, 3.48%)qsl..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (11 cpu-clock samples, 2.74%)[..[unknown] (6 cpu-clock samples, 1.49%)_mid_memalign (6 cpu-clock samples, 1.49%)__posix_memalign (5 cpu-clock samples, 1.24%)malloc (5 cpu-clock samples, 1.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.50%)__posix_memalign (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (12 cpu-clock samples, 2.99%)qs..[unknown] (10 cpu-clock samples, 2.49%)[..[unknown] (10 cpu-clock samples, 2.49%)[..cfree@GLIBC_2.17 (4 cpu-clock samples, 1.00%)operator new(unsigned long) (6 cpu-clock samples, 1.49%)malloc (5 cpu-clock samples, 1.24%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (12 cpu-clock samples, 2.99%)qs..[unknown] (12 cpu-clock samples, 2.99%)[u..[unknown] (12 cpu-clock samples, 2.99%)[u..cfree@GLIBC_2.17 (6 cpu-clock samples, 1.49%)operator new(unsigned long) (6 cpu-clock samples, 1.49%)malloc (4 cpu-clock samples, 1.00%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)main (3 cpu-clock samples, 0.75%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.75%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)malloc@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.50%)posix_memalign@plt (2 cpu-clock samples, 0.50%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)_mid_memalign (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)operator new(unsigned long) (3 cpu-clock samples, 0.75%)malloc (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (6 cpu-clock samples, 1.49%)[unknown] (5 cpu-clock samples, 1.24%)[unknown] (5 cpu-clock samples, 1.24%)[unknown] (2 cpu-clock samples, 0.50%)[unknown] (1 cpu-clock samples, 0.25%)__libc_malloc2 (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)__posix_memalign (1 cpu-clock samples, 0.25%)malloc (1 cpu-clock samples, 0.25%)operator new(unsigned long, std::align_val_t) (3 cpu-clock samples, 0.75%)__posix_memalign (2 cpu-clock samples, 0.50%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.25%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (5 cpu-clock samples, 1.24%)[unknown] (4 cpu-clock samples, 1.00%)[unknown] (4 cpu-clock samples, 1.00%)cfree@GLIBC_2.17 (4 cpu-clock samples, 1.00%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)operator new(unsigned long) (1 cpu-clock samples, 0.25%)malloc (1 cpu-clock samples, 0.25%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (2 cpu-clock samples, 0.50%)operator new(unsigned long, std::align_val_t)@plt (2 cpu-clock samples, 0.50%)__libc_start_call_main (5 cpu-clock samples, 1.24%)[unknown] (5 cpu-clock samples, 1.24%)[unknown] (5 cpu-clock samples, 1.24%)cfree@GLIBC_2.17 (5 cpu-clock samples, 1.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (9 cpu-clock samples, 2.24%)[unknown] (9 cpu-clock samples, 2.24%)[unknown] (9 cpu-clock samples, 2.24%)cfree@GLIBC_2.17 (9 cpu-clock samples, 2.24%)main (17 cpu-clock samples, 4.23%)main[unknown] (12 cpu-clock samples, 2.99%)[u..[unknown] (12 cpu-clock samples, 2.99%)[u..operator new(unsigned long) (12 cpu-clock samples, 2.99%)op..malloc (8 cpu-clock samples, 1.99%)operator delete(void*)@plt (3 cpu-clock samples, 0.75%)operator delete(void*, unsigned long)@plt (2 cpu-clock samples, 0.50%)operator new(unsigned long) (3 cpu-clock samples, 0.75%)malloc@plt (3 cpu-clock samples, 0.75%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (13 cpu-clock samples, 3.23%)qs..[unknown] (6 cpu-clock samples, 1.49%)[unknown] (6 cpu-clock samples, 1.49%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.50%)operator new(unsigned long) (4 cpu-clock samples, 1.00%)malloc (2 cpu-clock samples, 0.50%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (2 cpu-clock samples, 0.50%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (3 cpu-clock samples, 0.75%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (16 cpu-clock samples, 3.98%)qsl..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (13 cpu-clock samples, 3.23%)[u..[unknown] (11 cpu-clock samples, 2.74%)[..[unknown] (7 cpu-clock samples, 1.74%)[unknown] (1 cpu-clock samples, 0.25%)_int_malloc (1 cpu-clock samples, 0.25%)_mid_memalign (6 cpu-clock samples, 1.49%)__posix_memalign (4 cpu-clock samples, 1.00%)malloc (4 cpu-clock samples, 1.00%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.50%)__posix_memalign (2 cpu-clock samples, 0.50%)free@plt (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.25%)std::__detail::_List_node_base::_M_unhook()@plt (1 cpu-clock samples, 0.25%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)[unknown] (1 cpu-clock samples, 0.25%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.25%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (1 cpu-clock samples, 0.25%)memcpy@plt (1 cpu-clock samples, 0.25%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (4 cpu-clock samples, 1.00%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.75%)memcpy@plt (1 cpu-clock samples, 0.25%)qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (2 cpu-clock samples, 0.50%)operator new(unsigned long)@plt (2 cpu-clock samples, 0.50%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)[unknown] (3 cpu-clock samples, 0.75%)operator new(unsigned long) (3 cpu-clock samples, 0.75%)malloc (3 cpu-clock samples, 0.75%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (1 cpu-clock samples, 0.25%)operator delete(void*)@plt (1 cpu-clock samples, 0.25%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (3 cpu-clock samples, 0.75%)free@plt (2 cpu-clock samples, 0.50%)operator delete(void*, unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.25%) +]]>QSL Matching-Engine Flame Graph (qsl-bench)flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 416 samples | 165 stacks | 2026-06-22T01:28:01ZSearch all (416 cpu-clock samples, 100.00%)allqsl-bench (416 cpu-clock samples, 100.00%)qsl-bench[unknown] (335 cpu-clock samples, 80.53%)[unknown][unknown] (317 cpu-clock samples, 76.20%)[unknown][unknown] (276 cpu-clock samples, 66.35%)[unknown][unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (2 cpu-clock samples, 0.48%)do_lookup_x (2 cpu-clock samples, 0.48%)_dl_lookup_symbol_x (1 cpu-clock samples, 0.24%)_dl_new_hash (1 cpu-clock samples, 0.24%)__libc_start_call_main (273 cpu-clock samples, 65.62%)__libc_start_call_mainmain (273 cpu-clock samples, 65.62%)maincfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (39 cpu-clock samples, 9.38%)qsl::engine:..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (5 cpu-clock samples, 1.20%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (30 cpu-clock samples, 7.21%)qsl::engi..operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (13 cpu-clock samples, 3.12%)qs..std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (12 cpu-clock samples, 2.88%)st..std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (8 cpu-clock samples, 1.92%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (2 cpu-clock samples, 0.48%)std::pmr::(anonymous namespace)::newdel_res_t::do_allocate(unsigned long, unsigned long) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::cancel(unsigned long) (42 cpu-clock samples, 10.10%)qsl::engine::O..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (41 cpu-clock samples, 9.86%)decltype(auto..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (19 cpu-clock samples, 4.57%)qsl:..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (2 cpu-clock samples, 0.48%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.24%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (13 cpu-clock samples, 3.12%)qs..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (74 cpu-clock samples, 17.79%)qsl::gateway::Session::on_b..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (73 cpu-clock samples, 17.55%)qsl::gateway::Session::on_..qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (68 cpu-clock samples, 16.35%)qsl::gateway::Session::p..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (20 cpu-clock samples, 4.81%)qsl::..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (10 cpu-clock samples, 2.40%)q..__memcpy_generic (4 cpu-clock samples, 0.96%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::protocol::encode(qsl::protocol::Ack const&) (1 cpu-clock samples, 0.24%)qsl::protocol::encode(qsl::protocol::Fill const&) (2 cpu-clock samples, 0.48%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (34 cpu-clock samples, 8.17%)qsl::gatew..qsl::engine::MatchingEngine::can_store_limit(unsigned int, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (3 cpu-clock samples, 0.72%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (25 cpu-clock samples, 6.01%)qsl::en..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (14 cpu-clock samples, 3.37%)qs..__memcpy_generic (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (9 cpu-clock samples, 2.16%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (2 cpu-clock samples, 0.48%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (5 cpu-clock samples, 1.20%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.72%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (14 cpu-clock samples, 3.37%)qs..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (6 cpu-clock samples, 1.44%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (29 cpu-clock samples, 6.97%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (3 cpu-clock samples, 0.72%)qsl::engine::OrderBook::cancel(unsigned long) (2 cpu-clock samples, 0.48%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (4 cpu-clock samples, 0.96%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (2 cpu-clock samples, 0.48%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.48%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (17 cpu-clock samples, 4.09%)qsl:..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (12 cpu-clock samples, 2.88%)qs..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (6 cpu-clock samples, 1.44%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (4 cpu-clock samples, 0.96%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 0.96%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (3 cpu-clock samples, 0.72%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (18 cpu-clock samples, 4.33%)qsl:..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (8 cpu-clock samples, 1.92%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (4 cpu-clock samples, 0.96%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.48%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (33 cpu-clock samples, 7.93%)qsl::repla..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (28 cpu-clock samples, 6.73%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::cancel(unsigned long) (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (9 cpu-clock samples, 2.16%)qsl::engine::OrderBook::can_apply_modify(unsigned long, long, unsigned int) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (5 cpu-clock samples, 1.20%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.72%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.72%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (15 cpu-clock samples, 3.61%)qsl..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (8 cpu-clock samples, 1.92%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.20%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.48%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (4 cpu-clock samples, 0.96%)operator new(unsigned long) (5 cpu-clock samples, 1.20%)malloc@plt (5 cpu-clock samples, 1.20%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (10 cpu-clock samples, 2.40%)q..[unknown] (10 cpu-clock samples, 2.40%)[..[unknown] (10 cpu-clock samples, 2.40%)[..[unknown] (7 cpu-clock samples, 1.68%)[unknown] (5 cpu-clock samples, 1.20%)_mid_memalign (5 cpu-clock samples, 1.20%)__posix_memalign (2 cpu-clock samples, 0.48%)_mid_memalign (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (3 cpu-clock samples, 0.72%)__posix_memalign (3 cpu-clock samples, 0.72%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (10 cpu-clock samples, 2.40%)q..[unknown] (9 cpu-clock samples, 2.16%)[unknown] (9 cpu-clock samples, 2.16%)[unknown] (8 cpu-clock samples, 1.92%)[unknown] (3 cpu-clock samples, 0.72%)_mid_memalign (3 cpu-clock samples, 0.72%)__posix_memalign (5 cpu-clock samples, 1.20%)malloc (5 cpu-clock samples, 1.20%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (9 cpu-clock samples, 2.16%)[unknown] (5 cpu-clock samples, 1.20%)[unknown] (5 cpu-clock samples, 1.20%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.48%)operator new(unsigned long) (3 cpu-clock samples, 0.72%)malloc (2 cpu-clock samples, 0.48%)free@plt (3 cpu-clock samples, 0.72%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)operator new(unsigned long) (4 cpu-clock samples, 0.96%)malloc (3 cpu-clock samples, 0.72%)operator new(unsigned long) (3 cpu-clock samples, 0.72%)malloc@plt (3 cpu-clock samples, 0.72%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)posix_memalign@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)__posix_memalign (2 cpu-clock samples, 0.48%)malloc (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.20%)[unknown] (5 cpu-clock samples, 1.20%)[unknown] (5 cpu-clock samples, 1.20%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (1 cpu-clock samples, 0.24%)_mid_memalign (1 cpu-clock samples, 0.24%)__posix_memalign (2 cpu-clock samples, 0.48%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.48%)__posix_memalign (1 cpu-clock samples, 0.24%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (2 cpu-clock samples, 0.48%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (4 cpu-clock samples, 0.96%)operator new(unsigned long, std::align_val_t)@plt (4 cpu-clock samples, 0.96%)__libc_start_call_main (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)cfree@GLIBC_2.17 (7 cpu-clock samples, 1.68%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (11 cpu-clock samples, 2.64%)d..[unknown] (11 cpu-clock samples, 2.64%)[..[unknown] (11 cpu-clock samples, 2.64%)[..cfree@GLIBC_2.17 (11 cpu-clock samples, 2.64%)c..main (14 cpu-clock samples, 3.37%)main[unknown] (10 cpu-clock samples, 2.40%)[..[unknown] (10 cpu-clock samples, 2.40%)[..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (9 cpu-clock samples, 2.16%)malloc (6 cpu-clock samples, 1.44%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (2 cpu-clock samples, 0.48%)operator new(unsigned long) (5 cpu-clock samples, 1.20%)malloc@plt (5 cpu-clock samples, 1.20%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)posix_memalign@plt (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (4 cpu-clock samples, 0.96%)[unknown] (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (20 cpu-clock samples, 4.81%)qsl::..[unknown] (17 cpu-clock samples, 4.09%)[unk..[unknown] (17 cpu-clock samples, 4.09%)[unk..[unknown] (13 cpu-clock samples, 3.12%)[u..[unknown] (9 cpu-clock samples, 2.16%)_mid_memalign (9 cpu-clock samples, 2.16%)__posix_memalign (4 cpu-clock samples, 0.96%)malloc (3 cpu-clock samples, 0.72%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.48%)__posix_memalign (1 cpu-clock samples, 0.24%)memcpy@plt (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.48%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (4 cpu-clock samples, 0.96%)free@plt (1 cpu-clock samples, 0.24%)memcpy@plt (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (2 cpu-clock samples, 0.48%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (6 cpu-clock samples, 1.44%)[unknown] (6 cpu-clock samples, 1.44%)[unknown] (6 cpu-clock samples, 1.44%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)operator new(unsigned long) (3 cpu-clock samples, 0.72%)malloc (3 cpu-clock samples, 0.72%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (2 cpu-clock samples, 0.48%)memcpy@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%) diff --git a/results/flamegraph.txt b/results/flamegraph.txt index b0d8682..ea163fb 100644 --- a/results/flamegraph.txt +++ b/results/flamegraph.txt @@ -8,19 +8,19 @@ Perf: perf version 6.19.14-400.asahi.fc44.aarch64 Perf paranoid: 2 Build type: Release Provenance version: 1 -Git commit (informational): 52de5b8 -Source digest: sha256:75c1d53ba776085cb43ed6c600692286ab547ec20c9dc7a2018a56c222673f3c +Git commit (informational): 4aec1d0 +Source digest: sha256:619c700c4c9b872ffd42e0b4145d73f06548f971c50b2158398a7722b3d5f41a Source digest scope: flamegraph-benchmark Dirty inputs: no Generated output: results/flamegraph.svg -Date: 2026-06-22T01:13:09Z +Date: 2026-06-22T01:28:01Z Benchmark binary: build/bench/qsl-bench Dataset: qsl-bench default synthetic benchmark suite Call graph: dwarf Record event: cpu-clock Sample freq: 4000 Hz -Sample count: 402 -Folded stacks: 164 +Sample count: 416 +Folded stacks: 165 Minimum samples for hot profile: 200 Insufficient samples: no Record status: 0 @@ -34,25 +34,25 @@ investigation. Frame width is proportional to on-CPU samples, not wall-clock latency or throughput, and is hardware/kernel/compiler/build dependent. Top 15 folded stacks (count stack): - 16 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) - 12 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] - 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) - 9 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 - 8 qsl-bench;main;[unknown];[unknown];operator new(unsigned long);malloc - 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) - 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) - 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) - 6 qsl-bench;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];[unknown];[unknown];_mid_memalign - 6 qsl-bench;[unknown];[unknown];qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);[unknown];[unknown];[unknown];[unknown];_mid_memalign - 6 qsl-bench;[unknown];[unknown];qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];cfree@GLIBC_2.17 - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long);qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const + 20 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] + 14 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) + 14 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) + 13 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) + 11 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 + 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long);std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&) + 9 qsl-bench;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];[unknown];[unknown];_mid_memalign + 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) + 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long);qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const + 7 qsl-bench;__libc_start_call_main;[unknown];[unknown];cfree@GLIBC_2.17 + 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) + 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&);qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 6 qsl-bench;main;[unknown];[unknown];operator new(unsigned long);malloc + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) Benchmark output: -order_book add/mod/cancel 200000 ops 133.5 ns/op 7487925 ops/sec -protocol encode+decode 500000 ops 20.7 ns/op 48254784 ops/sec -gateway session (fill) 200000 ops 128.0 ns/op 7812016 ops/sec -matching engine flow 5004 items 102.3 ns/item 9773237 items/sec -replay command log 5004 items 112.3 ns/item 8905762 items/sec +order_book add/mod/cancel 200000 ops 132.9 ns/op 7523640 ops/sec +protocol encode+decode 500000 ops 19.8 ns/op 50418890 ops/sec +gateway session (fill) 200000 ops 127.6 ns/op 7838397 ops/sec +matching engine flow 5004 items 102.5 ns/item 9759934 items/sec +replay command log 5004 items 111.8 ns/item 8943232 items/sec From 31070b17677186aff151aff9402b6655a1f59723 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 22:18:17 -0400 Subject: [PATCH 09/11] perf: harden flamegraph.sh classification + sample gating (Codex #130) Address five Codex review findings on the flamegraph driver: 1. Classify `zero-sized data` (perf script's no-sample report) as a perf limitation, matching scripts/perf_record.sh, so the documented QSL_PERF_ALLOW_PARTIAL=1 constrained-host path works instead of tripping the unexpected-failure exit. 2. Remove any prior results/flamegraph.svg when a partial run captures no folded stacks, so a constrained rerun cannot leave a previous host's SVG beside a .txt that says there is no sample report. 3. Accept perf's `(~N samples)` estimate marker (optional `~`), and base the minimum-sample gate on the authoritative folded sample total rather than perf record's self-described estimate. Report both counts. 4. Capture flamegraph.py --collapse-only's exit status instead of `|| true`; a renderer/parser failure now exits 4 (unmaskable) rather than being published as a constrained-environment artifact. 5. Derive the sampling-kind label/caveat from the selected event (software cpu-clock/task-clock vs hardware-PMU) so the artifact type, SVG comment, and text companion stay consistent for QSL_FLAMEGRAPH_EVENT=cycles etc. Co-Authored-By: Claude Opus 4.8 --- scripts/flamegraph.sh | 76 +++++++++++++++++++++++++++++++------------ 1 file changed, 56 insertions(+), 20 deletions(-) diff --git a/scripts/flamegraph.sh b/scripts/flamegraph.sh index 7324f64..3d7dbfa 100755 --- a/scripts/flamegraph.sh +++ b/scripts/flamegraph.sh @@ -82,9 +82,10 @@ RECORD_ERR="$(mktemp)" SCRIPT_OUT="$(mktemp)" SCRIPT_ERR="$(mktemp)" FOLDED="$(mktemp)" +COLLAPSE_ERR="$(mktemp)" SVG_TMP="$(mktemp)" TXT_TMP="$(mktemp)" -trap 'rm -f "$BENCH_OUT" "$RECORD_BENCH_OUT" "$RECORD_ERR" "$SCRIPT_OUT" "$SCRIPT_ERR" "$FOLDED" "$SVG_TMP" "$TXT_TMP"' EXIT +trap 'rm -f "$BENCH_OUT" "$RECORD_BENCH_OUT" "$RECORD_ERR" "$SCRIPT_OUT" "$SCRIPT_ERR" "$FOLDED" "$COLLAPSE_ERR" "$SVG_TMP" "$TXT_TMP"' EXIT # Fail fast if the benchmark itself is broken (partial mode must not mask this). BENCH_STATUS=0 @@ -105,31 +106,52 @@ if [[ "$RECORD_STATUS" -eq 0 ]]; then fi PERF_LIMITATION=no -if grep -Eiq 'No samples|failed to open|Permission denied|Operation not permitted|perf_event_open|not supported|Operation not supported|perf not found for kernel|linux-tools' \ +# `zero-sized data` is how `perf script` reports a no-sample capture; classify it +# as a perf limitation here exactly as scripts/perf_record.sh does, so the +# documented constrained-host (QSL_PERF_ALLOW_PARTIAL=1) path works instead of +# tripping the unexpected-failure exit. +if grep -Eiq 'zero-sized data|No samples|failed to open|Permission denied|Operation not permitted|perf_event_open|not supported|Operation not supported|perf not found for kernel|linux-tools' \ "$RECORD_ERR" "$SCRIPT_ERR"; then PERF_LIMITATION=yes fi -SAMPLE_TOKEN="$(sed -nE 's/.*\(([0-9][0-9.,]*[KkMm]?) samples\).*/\1/p' "$RECORD_ERR" | head -1)" -SAMPLE_COUNT="$(parse_sample_count_token "$SAMPLE_TOKEN")" -[[ -z "$SAMPLE_COUNT" ]] && SAMPLE_COUNT=0 +# perf record prints its sample summary as "(N samples)" or, on some versions, +# "(~N samples)" — and that count is only its own estimate. Accept the optional +# `~` so the token is not dropped, but keep this value informational; the sample +# gate below uses the authoritative folded total, not this estimate. +SAMPLE_TOKEN="$(sed -nE 's/.*\(~?([0-9][0-9.,]*[KkMm]?) samples\).*/\1/p' "$RECORD_ERR" | head -1)" +PERF_EST_SAMPLES="$(parse_sample_count_token "$SAMPLE_TOKEN")" +[[ -z "$PERF_EST_SAMPLES" ]] && PERF_EST_SAMPLES=0 -# Fold to collapsed stacks for the text summary and as an SVG precondition. +# Fold to collapsed stacks for the text summary and as an SVG precondition. A +# nonzero COLLAPSE_STATUS means the renderer/parser itself failed (a generator +# regression), which is handled as an unexpected failure below — never masked as +# a perf sampling limitation. FOLDED_SAMPLES is the real sample total carried by +# the folded stacks (sum of trailing counts), the authoritative gate input. STACK_COUNT=0 +FOLDED_SAMPLES=0 +COLLAPSE_STATUS=0 if [[ "$SCRIPT_STATUS" -eq 0 && -s "$SCRIPT_OUT" ]]; then - python3 scripts/flamegraph.py --collapse-only <"$SCRIPT_OUT" >"$FOLDED" 2>/dev/null || true + python3 scripts/flamegraph.py --collapse-only <"$SCRIPT_OUT" >"$FOLDED" 2>"$COLLAPSE_ERR" || + COLLAPSE_STATUS=$? STACK_COUNT="$(wc -l <"$FOLDED" | tr -d ' ')" + FOLDED_SAMPLES="$(awk '{ s += $NF } END { printf "%d\n", s + 0 }' "$FOLDED")" fi INSUFFICIENT_SAMPLES=no -if [[ "$RECORD_STATUS" -eq 0 && "$SCRIPT_STATUS" -eq 0 && "$SAMPLE_COUNT" -lt "$MIN_SAMPLES" ]]; then +if [[ "$RECORD_STATUS" -eq 0 && "$SCRIPT_STATUS" -eq 0 && "$COLLAPSE_STATUS" -eq 0 && + "$FOLDED_SAMPLES" -lt "$MIN_SAMPLES" ]]; then INSUFFICIENT_SAMPLES=yes fi -ARTIFACT_TYPE="flamegraph ($EVENT software sampling hot-symbol profile)" -if [[ "$EVENT" == "cycles" ]]; then - ARTIFACT_TYPE="flamegraph (cycles hardware-PMU sampling hot-symbol profile)" -fi +# Describe the sampling source once so every label/caveat (artifact type, SVG +# comment, text companion) stays consistent: software timers vs a hardware PMU +# event. cpu-clock/task-clock are software; cycles/instructions/etc. are PMU. +case "$EVENT" in +cpu-clock | task-clock) SAMPLE_KIND="software $EVENT sampling" ;; +*) SAMPLE_KIND="$EVENT hardware-PMU sampling" ;; +esac +ARTIFACT_TYPE="flamegraph ($SAMPLE_KIND hot-symbol profile)" if [[ "$RECORD_STATUS" -ne 0 || "$SCRIPT_STATUS" -ne 0 || "$STACK_COUNT" -eq 0 ]]; then ARTIFACT_TYPE="constrained-environment validation (partial; no clean sample report)" elif [[ "$INSUFFICIENT_SAMPLES" == "yes" ]]; then @@ -139,7 +161,7 @@ fi PROVENANCE="$(qsl_emit_provenance "$PROVENANCE_SCOPE" "$OUT_SVG" "${PROVENANCE_INPUTS[@]}")" HOST="$(uname -s) $(uname -m)" DATE="$(qsl_utc_timestamp)" -SUBTITLE="$ARTIFACT_TYPE | $HOST | $EVENT @ ${FREQ}Hz | ${SAMPLE_COUNT} samples | ${STACK_COUNT} stacks | $DATE" +SUBTITLE="$ARTIFACT_TYPE | $HOST | $EVENT @ ${FREQ}Hz | ${FOLDED_SAMPLES} samples | ${STACK_COUNT} stacks | $DATE" # Render the SVG (deterministic for a fixed folded input + fixed subtitle). if [[ "$STACK_COUNT" -gt 0 ]]; then @@ -154,9 +176,9 @@ if [[ "$STACK_COUNT" -gt 0 ]]; then echo " Command: make flamegraph" echo " Artifact: $ARTIFACT_TYPE" echo " Record: perf record [call-graph $CALLGRAPH | -F $FREQ | -g | -e $EVENT]" - echo " Samples: $SAMPLE_COUNT | Folded stacks: $STACK_COUNT" - echo " Caveat: software cpu-clock sampling shows on-CPU time by symbol; it is" - echo " not a latency or throughput measurement and is hardware/build dependent." + echo " Samples (folded): $FOLDED_SAMPLES | perf record estimate: $PERF_EST_SAMPLES | Folded stacks: $STACK_COUNT" + echo " Caveat: $SAMPLE_KIND shows on-CPU time by symbol; it is not a latency" + echo " or throughput measurement and is hardware/build dependent." } | sed 's/--/- -/g' echo "-->" # Drop the renderer's own XML declaration; we emitted ours above. @@ -167,6 +189,11 @@ if [[ "$STACK_COUNT" -gt 0 ]]; then --from-collapsed <"$FOLDED" | tail -n +2 } >"$SVG_TMP" qsl_publish_artifact "$SVG_TMP" "$OUT_SVG" +else + # No clean folded stacks. Remove any prior SVG so a constrained rerun cannot + # leave a previous host's flamegraph beside a .txt that says there is no + # sample report — which could be committed as if the two still matched. + rm -f "$OUT_SVG" fi # Text companion: provenance + classification + top folded stacks (human/queryable). @@ -186,7 +213,8 @@ fi echo "Call graph: $CALLGRAPH" echo "Record event: $EVENT" echo "Sample freq: $FREQ Hz" - echo "Sample count: $SAMPLE_COUNT" + echo "Sample count (folded total): $FOLDED_SAMPLES" + echo "Sample count (perf record est.): $PERF_EST_SAMPLES" echo "Folded stacks: $STACK_COUNT" echo "Minimum samples for hot profile: $MIN_SAMPLES" echo "Insufficient samples: $INSUFFICIENT_SAMPLES" @@ -197,13 +225,13 @@ fi echo "Perf data: $DATA (generated, not intended for commit)" echo if [[ "$ARTIFACT_TYPE" == flamegraph* ]]; then - echo "Caveat: this flamegraph is a software cpu-clock sampling profile for hot-symbol" + echo "Caveat: this flamegraph is a $SAMPLE_KIND profile for hot-symbol" echo "investigation. Frame width is proportional to on-CPU samples, not wall-clock" echo "latency or throughput, and is hardware/kernel/compiler/build dependent." else echo "Caveat: constrained/partial perf validation, not a hot-symbol flamegraph. Treat" - echo "frame widths as unusable until sampling succeeds and Sample count meets the" - echo "Minimum samples for hot profile." + echo "frame widths as unusable until sampling succeeds and the folded sample total" + echo "meets the Minimum samples for hot profile." fi echo echo "Top $TOP_STACKS folded stacks (count stack):" @@ -224,6 +252,14 @@ qsl_publish_artifact "$TXT_TMP" "$OUT_TXT" echo "wrote $OUT_TXT" [[ "$STACK_COUNT" -gt 0 ]] && echo "wrote $OUT_SVG" +# A renderer/parser failure (perf script succeeded but flamegraph.py errored) is +# a generator bug, not a perf sampling limitation — fail hard so partial mode +# cannot publish a Python/parser regression as a constrained-environment artifact. +if [[ "$SCRIPT_STATUS" -eq 0 && "$COLLAPSE_STATUS" -ne 0 ]]; then + echo "error: flamegraph.py --collapse-only failed (status $COLLAPSE_STATUS); this is a renderer/parser failure, not a perf limitation, and partial mode cannot mask it." >&2 + cat "$COLLAPSE_ERR" >&2 + exit 4 +fi if [[ ("$RECORD_STATUS" -ne 0 || "$SCRIPT_STATUS" -ne 0) && "$PERF_LIMITATION" != "yes" ]]; then echo "error: perf record/script failed for a reason other than a perf access limitation." >&2 exit 3 From 06b76759216bd8f1e4937105749f8bdeb325683e Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Sun, 21 Jun 2026 22:18:39 -0400 Subject: [PATCH 10/11] perf: regenerate flamegraph artifact after classification hardening Bare-metal Apple M2 (aarch64) Fedora Asahi, cpu-clock @ 4000Hz: 329 folded samples / 159 stacks, classified `flamegraph (software cpu-clock sampling hot-symbol profile)`, `Dirty inputs: no`. Source digest now covers the hardened scripts/flamegraph.sh; the .txt reports both the folded total and perf record's estimate. Co-Authored-By: Claude Opus 4.8 --- results/flamegraph.svg | 16 ++++++------- results/flamegraph.txt | 53 +++++++++++++++++++++--------------------- 2 files changed, 35 insertions(+), 34 deletions(-) diff --git a/results/flamegraph.svg b/results/flamegraph.svg index 378b45b..80466d2 100644 --- a/results/flamegraph.svg +++ b/results/flamegraph.svg @@ -2,18 +2,18 @@ QSL Matching-Engine Flame Graph (qsl-bench)flamegraph (cpu-clock software sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 416 samples | 165 stacks | 2026-06-22T01:28:01ZSearch all (416 cpu-clock samples, 100.00%)allqsl-bench (416 cpu-clock samples, 100.00%)qsl-bench[unknown] (335 cpu-clock samples, 80.53%)[unknown][unknown] (317 cpu-clock samples, 76.20%)[unknown][unknown] (276 cpu-clock samples, 66.35%)[unknown][unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (2 cpu-clock samples, 0.48%)do_lookup_x (2 cpu-clock samples, 0.48%)_dl_lookup_symbol_x (1 cpu-clock samples, 0.24%)_dl_new_hash (1 cpu-clock samples, 0.24%)__libc_start_call_main (273 cpu-clock samples, 65.62%)__libc_start_call_mainmain (273 cpu-clock samples, 65.62%)maincfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (39 cpu-clock samples, 9.38%)qsl::engine:..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (5 cpu-clock samples, 1.20%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (30 cpu-clock samples, 7.21%)qsl::engi..operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (13 cpu-clock samples, 3.12%)qs..std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (12 cpu-clock samples, 2.88%)st..std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (8 cpu-clock samples, 1.92%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (2 cpu-clock samples, 0.48%)std::pmr::(anonymous namespace)::newdel_res_t::do_allocate(unsigned long, unsigned long) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::cancel(unsigned long) (42 cpu-clock samples, 10.10%)qsl::engine::O..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (41 cpu-clock samples, 9.86%)decltype(auto..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (19 cpu-clock samples, 4.57%)qsl:..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (2 cpu-clock samples, 0.48%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.24%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (13 cpu-clock samples, 3.12%)qs..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (74 cpu-clock samples, 17.79%)qsl::gateway::Session::on_b..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (73 cpu-clock samples, 17.55%)qsl::gateway::Session::on_..qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (68 cpu-clock samples, 16.35%)qsl::gateway::Session::p..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (20 cpu-clock samples, 4.81%)qsl::..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (10 cpu-clock samples, 2.40%)q..__memcpy_generic (4 cpu-clock samples, 0.96%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::protocol::encode(qsl::protocol::Ack const&) (1 cpu-clock samples, 0.24%)qsl::protocol::encode(qsl::protocol::Fill const&) (2 cpu-clock samples, 0.48%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (34 cpu-clock samples, 8.17%)qsl::gatew..qsl::engine::MatchingEngine::can_store_limit(unsigned int, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (3 cpu-clock samples, 0.72%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (25 cpu-clock samples, 6.01%)qsl::en..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (14 cpu-clock samples, 3.37%)qs..__memcpy_generic (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (9 cpu-clock samples, 2.16%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (2 cpu-clock samples, 0.48%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (5 cpu-clock samples, 1.20%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.72%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (14 cpu-clock samples, 3.37%)qs..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (6 cpu-clock samples, 1.44%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (29 cpu-clock samples, 6.97%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (3 cpu-clock samples, 0.72%)qsl::engine::OrderBook::cancel(unsigned long) (2 cpu-clock samples, 0.48%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (4 cpu-clock samples, 0.96%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (2 cpu-clock samples, 0.48%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.48%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (17 cpu-clock samples, 4.09%)qsl:..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (12 cpu-clock samples, 2.88%)qs..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (6 cpu-clock samples, 1.44%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (4 cpu-clock samples, 0.96%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 0.96%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (3 cpu-clock samples, 0.72%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.24%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (18 cpu-clock samples, 4.33%)qsl:..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (8 cpu-clock samples, 1.92%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.48%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (4 cpu-clock samples, 0.96%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.48%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.24%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (33 cpu-clock samples, 7.93%)qsl::repla..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (28 cpu-clock samples, 6.73%)qsl::rep..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::cancel(unsigned long) (1 cpu-clock samples, 0.24%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (9 cpu-clock samples, 2.16%)qsl::engine::OrderBook::can_apply_modify(unsigned long, long, unsigned int) const (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (5 cpu-clock samples, 1.20%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.72%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (3 cpu-clock samples, 0.72%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (15 cpu-clock samples, 3.61%)qsl..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (8 cpu-clock samples, 1.92%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.20%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.48%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.24%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (4 cpu-clock samples, 0.96%)operator new(unsigned long) (5 cpu-clock samples, 1.20%)malloc@plt (5 cpu-clock samples, 1.20%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (10 cpu-clock samples, 2.40%)q..[unknown] (10 cpu-clock samples, 2.40%)[..[unknown] (10 cpu-clock samples, 2.40%)[..[unknown] (7 cpu-clock samples, 1.68%)[unknown] (5 cpu-clock samples, 1.20%)_mid_memalign (5 cpu-clock samples, 1.20%)__posix_memalign (2 cpu-clock samples, 0.48%)_mid_memalign (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (3 cpu-clock samples, 0.72%)__posix_memalign (3 cpu-clock samples, 0.72%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (10 cpu-clock samples, 2.40%)q..[unknown] (9 cpu-clock samples, 2.16%)[unknown] (9 cpu-clock samples, 2.16%)[unknown] (8 cpu-clock samples, 1.92%)[unknown] (3 cpu-clock samples, 0.72%)_mid_memalign (3 cpu-clock samples, 0.72%)__posix_memalign (5 cpu-clock samples, 1.20%)malloc (5 cpu-clock samples, 1.20%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (9 cpu-clock samples, 2.16%)[unknown] (5 cpu-clock samples, 1.20%)[unknown] (5 cpu-clock samples, 1.20%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.48%)operator new(unsigned long) (3 cpu-clock samples, 0.72%)malloc (2 cpu-clock samples, 0.48%)free@plt (3 cpu-clock samples, 0.72%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)operator new(unsigned long) (4 cpu-clock samples, 0.96%)malloc (3 cpu-clock samples, 0.72%)operator new(unsigned long) (3 cpu-clock samples, 0.72%)malloc@plt (3 cpu-clock samples, 0.72%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)posix_memalign@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)__posix_memalign (2 cpu-clock samples, 0.48%)malloc (2 cpu-clock samples, 0.48%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (5 cpu-clock samples, 1.20%)[unknown] (5 cpu-clock samples, 1.20%)[unknown] (5 cpu-clock samples, 1.20%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (1 cpu-clock samples, 0.24%)_mid_memalign (1 cpu-clock samples, 0.24%)__posix_memalign (2 cpu-clock samples, 0.48%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.48%)__posix_memalign (1 cpu-clock samples, 0.24%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (2 cpu-clock samples, 0.48%)[unknown] (1 cpu-clock samples, 0.24%)[unknown] (1 cpu-clock samples, 0.24%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (4 cpu-clock samples, 0.96%)operator new(unsigned long, std::align_val_t)@plt (4 cpu-clock samples, 0.96%)__libc_start_call_main (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)[unknown] (7 cpu-clock samples, 1.68%)cfree@GLIBC_2.17 (7 cpu-clock samples, 1.68%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (11 cpu-clock samples, 2.64%)d..[unknown] (11 cpu-clock samples, 2.64%)[..[unknown] (11 cpu-clock samples, 2.64%)[..cfree@GLIBC_2.17 (11 cpu-clock samples, 2.64%)c..main (14 cpu-clock samples, 3.37%)main[unknown] (10 cpu-clock samples, 2.40%)[..[unknown] (10 cpu-clock samples, 2.40%)[..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (9 cpu-clock samples, 2.16%)malloc (6 cpu-clock samples, 1.44%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (2 cpu-clock samples, 0.48%)operator new(unsigned long) (5 cpu-clock samples, 1.20%)malloc@plt (5 cpu-clock samples, 1.20%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.24%)posix_memalign@plt (1 cpu-clock samples, 0.24%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (4 cpu-clock samples, 0.96%)[unknown] (2 cpu-clock samples, 0.48%)[unknown] (2 cpu-clock samples, 0.48%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (20 cpu-clock samples, 4.81%)qsl::..[unknown] (17 cpu-clock samples, 4.09%)[unk..[unknown] (17 cpu-clock samples, 4.09%)[unk..[unknown] (13 cpu-clock samples, 3.12%)[u..[unknown] (9 cpu-clock samples, 2.16%)_mid_memalign (9 cpu-clock samples, 2.16%)__posix_memalign (4 cpu-clock samples, 0.96%)malloc (3 cpu-clock samples, 0.72%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.24%)operator new(unsigned long) (1 cpu-clock samples, 0.24%)malloc (1 cpu-clock samples, 0.24%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.48%)__posix_memalign (1 cpu-clock samples, 0.24%)memcpy@plt (1 cpu-clock samples, 0.24%)operator delete(void*)@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.48%)free@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (4 cpu-clock samples, 0.96%)free@plt (1 cpu-clock samples, 0.24%)memcpy@plt (1 cpu-clock samples, 0.24%)operator new(unsigned long)@plt (2 cpu-clock samples, 0.48%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)[unknown] (3 cpu-clock samples, 0.72%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (6 cpu-clock samples, 1.44%)[unknown] (6 cpu-clock samples, 1.44%)[unknown] (6 cpu-clock samples, 1.44%)cfree@GLIBC_2.17 (3 cpu-clock samples, 0.72%)operator new(unsigned long) (3 cpu-clock samples, 0.72%)malloc (3 cpu-clock samples, 0.72%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (2 cpu-clock samples, 0.48%)memcpy@plt (1 cpu-clock samples, 0.24%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.24%) +]]>QSL Matching-Engine Flame Graph (qsl-bench)flamegraph (software cpu-clock sampling hot-symbol profile) | Linux aarch64 | cpu-clock @ 4000Hz | 329 samples | 159 stacks | 2026-06-22T02:18:23ZSearch all (329 cpu-clock samples, 100.00%)allqsl-bench (329 cpu-clock samples, 100.00%)qsl-bench[unknown] (251 cpu-clock samples, 76.29%)[unknown][unknown] (237 cpu-clock samples, 72.04%)[unknown][unknown] (201 cpu-clock samples, 61.09%)[unknown][unknown] (2 cpu-clock samples, 0.61%)[unknown] (2 cpu-clock samples, 0.61%)[unknown] (2 cpu-clock samples, 0.61%)[unknown] (2 cpu-clock samples, 0.61%)[unknown] (1 cpu-clock samples, 0.30%)do_lookup_x (1 cpu-clock samples, 0.30%)_dl_lookup_symbol_x (1 cpu-clock samples, 0.30%)_dl_new_hash (1 cpu-clock samples, 0.30%)__libc_start_call_main (199 cpu-clock samples, 60.49%)__libc_start_call_mainmain (199 cpu-clock samples, 60.49%)maincfree@GLIBC_2.17 (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (20 cpu-clock samples, 6.08%)qsl::en..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}>(qsl::engine::OrderBook::contains(unsigned long) const::{lambda()#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::IntrusiveStore const&)#1}&&, qsl::engine::OrderBook::contains(unsigned long) const::{lambda(qsl::engine::OrderBook::ContiguousStore const&)#1}&&) const [clone .isra.0] (2 cpu-clock samples, 0.61%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.61%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (13 cpu-clock samples, 3.95%)qsl..operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 1.22%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (3 cpu-clock samples, 0.91%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.30%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (7 cpu-clock samples, 2.13%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (3 cpu-clock samples, 0.91%)std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::cancel(unsigned long) (18 cpu-clock samples, 5.47%)qsl::e..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (18 cpu-clock samples, 5.47%)declty..qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (13 cpu-clock samples, 3.95%)qsl..std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (4 cpu-clock samples, 1.22%)std::__detail::_List_node_base::_M_unhook() (1 cpu-clock samples, 0.30%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.30%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (3 cpu-clock samples, 0.91%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (2 cpu-clock samples, 0.61%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>) (56 cpu-clock samples, 17.02%)qsl::gateway::Session::on..qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (56 cpu-clock samples, 17.02%)qsl::gateway::Session::on..qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (53 cpu-clock samples, 16.11%)qsl::gateway::Session::p..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.30%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (13 cpu-clock samples, 3.95%)qsl..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.91%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (5 cpu-clock samples, 1.52%)__memcpy_generic (3 cpu-clock samples, 0.91%)qsl::protocol::encode(qsl::protocol::Fill const&) (2 cpu-clock samples, 0.61%)operator new(unsigned long) (1 cpu-clock samples, 0.30%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (33 cpu-clock samples, 10.03%)qsl::gateway::..qsl::engine::MatchingEngine::can_store_limit(unsigned int, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (2 cpu-clock samples, 0.61%)qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (4 cpu-clock samples, 1.22%)qsl::engine::MatchingEngine::has_symbol(unsigned int) const (1 cpu-clock samples, 0.30%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (16 cpu-clock samples, 4.86%)qsl::..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (3 cpu-clock samples, 0.91%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::can_store_limit(qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) const (2 cpu-clock samples, 0.61%)qsl::engine::OrderBook::contains(unsigned long) const (1 cpu-clock samples, 0.30%)qsl::engine::check_limit(qsl::engine::RiskConfig const&, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.30%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.91%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.91%)qsl::protocol::decode_header(std::span<std::byte const, 18446744073709551615ul>) (1 cpu-clock samples, 0.30%)qsl::protocol::decode_new_order(std::span<std::byte const, 18446744073709551615ul>) (15 cpu-clock samples, 4.56%)qsl:..qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (1 cpu-clock samples, 0.30%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (33 cpu-clock samples, 10.03%)qsl::replay::a..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (4 cpu-clock samples, 1.22%)qsl::engine::OrderBook::cancel(unsigned long) (3 cpu-clock samples, 0.91%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.91%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.61%)std::pmr::(anonymous namespace)::newdel_res_t::do_deallocate(void*, unsigned long, unsigned long) (1 cpu-clock samples, 0.30%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.30%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (5 cpu-clock samples, 1.52%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (5 cpu-clock samples, 1.52%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (3 cpu-clock samples, 0.91%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (2 cpu-clock samples, 0.61%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.30%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (17 cpu-clock samples, 5.17%)qsl::..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (11 cpu-clock samples, 3.34%)qs..qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (7 cpu-clock samples, 2.13%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.61%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (2 cpu-clock samples, 0.61%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (3 cpu-clock samples, 0.91%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.91%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.61%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.30%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.30%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.30%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (3 cpu-clock samples, 0.91%)qsl::engine::OrderBook::add_market(unsigned long, qsl::core::Side, unsigned int) (2 cpu-clock samples, 0.61%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (2 cpu-clock samples, 0.61%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.30%)qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long) (18 cpu-clock samples, 5.47%)qsl::r..qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const (11 cpu-clock samples, 3.34%)qs..qsl::engine::OrderBook::contains(unsigned long) const (5 cpu-clock samples, 1.52%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (2 cpu-clock samples, 0.61%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (2 cpu-clock samples, 0.61%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (1 cpu-clock samples, 0.30%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (1 cpu-clock samples, 0.30%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.30%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (34 cpu-clock samples, 10.33%)qsl::replay::r..operator delete(void*, unsigned long) (1 cpu-clock samples, 0.30%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (26 cpu-clock samples, 7.90%)qsl::repla..qsl::engine::MatchingEngine::cancel(unsigned int, unsigned long) (3 cpu-clock samples, 0.91%)qsl::engine::OrderBook::cancel(unsigned long) (1 cpu-clock samples, 0.30%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.30%)qsl::engine::MatchingEngine::modify(unsigned int, unsigned long, long, unsigned int) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.30%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.30%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (19 cpu-clock samples, 5.78%)qsl::e..qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (17 cpu-clock samples, 5.17%)qsl::..operator delete(void*, unsigned long) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) (4 cpu-clock samples, 1.22%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (1 cpu-clock samples, 0.30%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (11 cpu-clock samples, 3.34%)qs..qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (7 cpu-clock samples, 2.13%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (4 cpu-clock samples, 1.22%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (2 cpu-clock samples, 0.61%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::less<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (3 cpu-clock samples, 0.91%)std::_Rb_tree_decrement(std::_Rb_tree_node_base*) (1 cpu-clock samples, 0.30%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (3 cpu-clock samples, 0.91%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*, unsigned long) (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::contains(unsigned long) const (2 cpu-clock samples, 0.61%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.30%)qsl::replay::decode_command(std::span<std::byte const, 18446744073709551615ul>) (3 cpu-clock samples, 0.91%)operator new(unsigned long) (5 cpu-clock samples, 1.52%)malloc@plt (5 cpu-clock samples, 1.52%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.61%)posix_memalign@plt (2 cpu-clock samples, 0.61%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (4 cpu-clock samples, 1.22%)[unknown] (4 cpu-clock samples, 1.22%)[unknown] (4 cpu-clock samples, 1.22%)[unknown] (2 cpu-clock samples, 0.61%)__posix_memalign (2 cpu-clock samples, 0.61%)malloc (2 cpu-clock samples, 0.61%)operator new(unsigned long, std::align_val_t) (2 cpu-clock samples, 0.61%)__posix_memalign (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (7 cpu-clock samples, 2.13%)[unknown] (5 cpu-clock samples, 1.52%)[unknown] (5 cpu-clock samples, 1.52%)[unknown] (5 cpu-clock samples, 1.52%)[unknown] (1 cpu-clock samples, 0.30%)_mid_memalign (1 cpu-clock samples, 0.30%)__posix_memalign (4 cpu-clock samples, 1.22%)malloc (3 cpu-clock samples, 0.91%)operator new(unsigned long, std::align_val_t)@plt (2 cpu-clock samples, 0.61%)qsl::gateway::(anonymous namespace)::emit_result(unsigned long, qsl::gateway::GatewayResult const&, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (10 cpu-clock samples, 3.04%)qs..[unknown] (9 cpu-clock samples, 2.74%)[..[unknown] (9 cpu-clock samples, 2.74%)[..cfree@GLIBC_2.17 (3 cpu-clock samples, 0.91%)operator new(unsigned long) (6 cpu-clock samples, 1.82%)malloc (4 cpu-clock samples, 1.22%)operator delete(void*)@plt (1 cpu-clock samples, 0.30%)qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (8 cpu-clock samples, 2.43%)q..[unknown] (8 cpu-clock samples, 2.43%)[..[unknown] (8 cpu-clock samples, 2.43%)[..cfree@GLIBC_2.17 (1 cpu-clock samples, 0.30%)operator new(unsigned long) (7 cpu-clock samples, 2.13%)malloc (4 cpu-clock samples, 1.22%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.30%)main (1 cpu-clock samples, 0.30%)decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.30%)operator new(unsigned long) (1 cpu-clock samples, 0.30%)malloc@plt (1 cpu-clock samples, 0.30%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.30%)posix_memalign@plt (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long) (3 cpu-clock samples, 0.91%)[unknown] (3 cpu-clock samples, 0.91%)[unknown] (3 cpu-clock samples, 0.91%)[unknown] (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)_mid_memalign (1 cpu-clock samples, 0.30%)cfree@GLIBC_2.17 (1 cpu-clock samples, 0.30%)operator new(unsigned long, std::align_val_t) (1 cpu-clock samples, 0.30%)__posix_memalign (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) (3 cpu-clock samples, 0.91%)[unknown] (3 cpu-clock samples, 0.91%)[unknown] (3 cpu-clock samples, 0.91%)[unknown] (3 cpu-clock samples, 0.91%)[unknown] (1 cpu-clock samples, 0.30%)_mid_memalign (1 cpu-clock samples, 0.30%)__posix_memalign (2 cpu-clock samples, 0.61%)malloc (1 cpu-clock samples, 0.30%)qsl::gateway::Session::process_frame(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (3 cpu-clock samples, 0.91%)[unknown] (2 cpu-clock samples, 0.61%)[unknown] (2 cpu-clock samples, 0.61%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.61%)free@plt (1 cpu-clock samples, 0.30%)std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](unsigned long const&) (1 cpu-clock samples, 0.30%)operator new(unsigned long, std::align_val_t)@plt (1 cpu-clock samples, 0.30%)__libc_start_call_main (9 cpu-clock samples, 2.74%)_..[unknown] (9 cpu-clock samples, 2.74%)[..[unknown] (9 cpu-clock samples, 2.74%)[..[unknown] (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)unlink_chunk.isra.0 (1 cpu-clock samples, 0.30%)cfree@GLIBC_2.17 (8 cpu-clock samples, 2.43%)c..decltype(auto) qsl::engine::OrderBook::dispatch_storage<qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}>(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] (4 cpu-clock samples, 1.22%)[unknown] (4 cpu-clock samples, 1.22%)[unknown] (4 cpu-clock samples, 1.22%)cfree@GLIBC_2.17 (4 cpu-clock samples, 1.22%)main (11 cpu-clock samples, 3.34%)main[unknown] (5 cpu-clock samples, 1.52%)[unknown] (5 cpu-clock samples, 1.52%)[unknown] (1 cpu-clock samples, 0.30%)_int_free_merge_chunk (1 cpu-clock samples, 0.30%)operator new(unsigned long) (4 cpu-clock samples, 1.22%)malloc (4 cpu-clock samples, 1.22%)free@plt (2 cpu-clock samples, 0.61%)operator delete(void*)@plt (3 cpu-clock samples, 0.91%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.30%)operator new(unsigned long) (4 cpu-clock samples, 1.22%)malloc@plt (4 cpu-clock samples, 1.22%)qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (8 cpu-clock samples, 2.43%)q..[unknown] (3 cpu-clock samples, 0.91%)[unknown] (3 cpu-clock samples, 0.91%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.61%)operator new(unsigned long) (1 cpu-clock samples, 0.30%)malloc (1 cpu-clock samples, 0.30%)free@plt (1 cpu-clock samples, 0.30%)operator delete(void*)@plt (1 cpu-clock samples, 0.30%)operator delete(void*, unsigned long)@plt (1 cpu-clock samples, 0.30%)operator new(unsigned long)@plt (2 cpu-clock samples, 0.61%)qsl::engine::MatchingEngine::new_market(unsigned int, unsigned long, qsl::core::Side, unsigned int) (1 cpu-clock samples, 0.30%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) (12 cpu-clock samples, 3.65%)qsl..[unknown] (10 cpu-clock samples, 3.04%)[u..[unknown] (10 cpu-clock samples, 3.04%)[u..[unknown] (7 cpu-clock samples, 2.13%)[unknown] (1 cpu-clock samples, 0.30%)_mid_memalign (1 cpu-clock samples, 0.30%)__posix_memalign (6 cpu-clock samples, 1.82%)malloc (4 cpu-clock samples, 1.22%)operator new(unsigned long, std::align_val_t) (3 cpu-clock samples, 0.91%)__posix_memalign (2 cpu-clock samples, 0.61%)memcpy@plt (1 cpu-clock samples, 0.30%)operator delete(void*)@plt (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) (11 cpu-clock samples, 3.34%)qs..operator delete(void*, std::align_val_t)@plt (5 cpu-clock samples, 1.52%)operator delete(void*, unsigned long, std::align_val_t)@plt (5 cpu-clock samples, 1.52%)std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)@plt (1 cpu-clock samples, 0.30%)qsl::engine::OrderBook::fill_front_order(std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&, long, qsl::engine::OrderBook::MatchContext&) (1 cpu-clock samples, 0.30%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.30%)qsl::gateway::(anonymous namespace)::append(std::vector<std::byte, std::allocator<std::byte> >&, std::vector<std::byte, std::allocator<std::byte> > const&, unsigned long) [clone .isra.0] (1 cpu-clock samples, 0.30%)operator delete(void*)@plt (1 cpu-clock samples, 0.30%)qsl::gateway::Session::on_bytes(std::span<std::byte const, 18446744073709551615ul>, std::vector<std::byte, std::allocator<std::byte> >&, unsigned long) (3 cpu-clock samples, 0.91%)[unknown] (2 cpu-clock samples, 0.61%)[unknown] (2 cpu-clock samples, 0.61%)cfree@GLIBC_2.17 (2 cpu-clock samples, 0.61%)memcpy@plt (1 cpu-clock samples, 0.30%)qsl::protocol::encode(qsl::protocol::Ack const&) (1 cpu-clock samples, 0.30%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.30%)qsl::protocol::encode(qsl::protocol::NewOrder const&, unsigned long) (1 cpu-clock samples, 0.30%)operator new(unsigned long)@plt (1 cpu-clock samples, 0.30%)qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant<qsl::replay::RegisterSymbol, qsl::replay::NewLimit, qsl::replay::NewMarket, qsl::replay::Cancel, qsl::replay::Modify> const&) (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)[unknown] (1 cpu-clock samples, 0.30%)operator new(unsigned long) (1 cpu-clock samples, 0.30%)malloc (1 cpu-clock samples, 0.30%)qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector<qsl::replay::LogRecord, std::allocator<qsl::replay::LogRecord> > const&) (1 cpu-clock samples, 0.30%)memcpy@plt (1 cpu-clock samples, 0.30%)std::_Hashtable<unsigned long, std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, std::pmr::polymorphic_allocator<std::pair<unsigned long const, qsl::engine::OrderBook::Locator> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<unsigned long const, qsl::engine::OrderBook::Locator>, false>*) (7 cpu-clock samples, 2.13%)free@plt (2 cpu-clock samples, 0.61%)operator delete(void*, unsigned long, std::align_val_t)@plt (5 cpu-clock samples, 1.52%)std::pair<std::_Rb_tree_iterator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, bool> std::_Rb_tree<long, std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >, std::_Select1st<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > >, std::greater<long>, std::pmr::polymorphic_allocator<std::pair<long const, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > > > >::_M_emplace_unique<long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> > >(long&, std::__cxx11::list<qsl::engine::Order, std::pmr::polymorphic_allocator<qsl::engine::Order> >&&) (2 cpu-clock samples, 0.61%)free@plt (1 cpu-clock samples, 0.30%)std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)@plt (1 cpu-clock samples, 0.30%) diff --git a/results/flamegraph.txt b/results/flamegraph.txt index ea163fb..4969a22 100644 --- a/results/flamegraph.txt +++ b/results/flamegraph.txt @@ -1,5 +1,5 @@ Command: make flamegraph -Artifact: flamegraph (cpu-clock software sampling hot-symbol profile) +Artifact: flamegraph (software cpu-clock sampling hot-symbol profile) Hardware: aarch64 OS: Linux 6.19.14-400.asahi.fc44.aarch64+16k CPU: Avalanche-M2 @@ -8,19 +8,20 @@ Perf: perf version 6.19.14-400.asahi.fc44.aarch64 Perf paranoid: 2 Build type: Release Provenance version: 1 -Git commit (informational): 4aec1d0 -Source digest: sha256:619c700c4c9b872ffd42e0b4145d73f06548f971c50b2158398a7722b3d5f41a +Git commit (informational): 31070b1 +Source digest: sha256:6aa521e6295a99f9dbf7dee9e5bcef04e93174ed12c3e8de9b991a8bfc14c809 Source digest scope: flamegraph-benchmark Dirty inputs: no Generated output: results/flamegraph.svg -Date: 2026-06-22T01:28:01Z +Date: 2026-06-22T02:18:23Z Benchmark binary: build/bench/qsl-bench Dataset: qsl-bench default synthetic benchmark suite Call graph: dwarf Record event: cpu-clock Sample freq: 4000 Hz -Sample count: 416 -Folded stacks: 165 +Sample count (folded total): 329 +Sample count (perf record est.): 329 +Folded stacks: 159 Minimum samples for hot profile: 200 Insufficient samples: no Record status: 0 @@ -34,25 +35,25 @@ investigation. Frame width is proportional to on-CPU samples, not wall-clock latency or throughput, and is hardware/kernel/compiler/build dependent. Top 15 folded stacks (count stack): - 20 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0] - 14 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) - 14 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) - 13 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::modify(unsigned long, long, unsigned int) - 11 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 - 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);qsl::engine::OrderBook::level_for[abi:cxx11](qsl::core::Side, long);std::pair > > >, bool> std::_Rb_tree > >, std::_Select1st > > >, std::greater, std::pmr::polymorphic_allocator > > > >::_M_emplace_unique > >(long&, std::__cxx11::list >&&) - 9 qsl-bench;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);[unknown];[unknown];[unknown];[unknown];_mid_memalign - 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::match_baseline(qsl::core::Side, qsl::engine::OrderBook::MatchContext&) - 8 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long);qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const - 7 qsl-bench;__libc_start_call_main;[unknown];[unknown];cfree@GLIBC_2.17 - 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int) - 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::replay(qsl::engine::MatchingEngine&, std::vector > const&);qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) - 6 qsl-bench;main;[unknown];[unknown];operator new(unsigned long);malloc - 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::add_limit(unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::OrderBook::rest(unsigned long, qsl::core::Side, long, unsigned int);std::__detail::_Map_base, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits, true>::operator[](unsigned long const&) + 15 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::protocol::decode_new_order(std::span) + 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 11 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long);qsl::engine::MatchingEngine::contains(unsigned int, unsigned long) const + 8 qsl-bench;__libc_start_call_main;[unknown];[unknown];cfree@GLIBC_2.17 + 7 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::engine::OrderBook::cancel(unsigned long);decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&) + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::gateway::Session::on_bytes(std::span);qsl::gateway::Session::on_bytes(std::span, std::vector >&, unsigned long);qsl::gateway::Session::process_frame(std::span, std::vector >&, unsigned long);qsl::gateway::OrderGateway::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 6 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::apply(qsl::engine::MatchingEngine&, std::variant const&);qsl::engine::MatchingEngine::new_limit(unsigned int, unsigned long, qsl::core::Side, long, unsigned int, qsl::core::TimeInForce) + 5 qsl-bench;qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);operator delete(void*, std::align_val_t)@plt + 5 qsl-bench;qsl::engine::OrderBook::erase_resting_order(qsl::engine::OrderBook::Locator const&);operator delete(void*, unsigned long, std::align_val_t)@plt + 5 qsl-bench;std::_Hashtable, std::pmr::polymorphic_allocator >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node, false>*);operator delete(void*, unsigned long, std::align_val_t)@plt + 5 qsl-bench;[unknown];[unknown];operator new(unsigned long);malloc@plt + 5 qsl-bench;[unknown];[unknown];[unknown];__libc_start_call_main;main;qsl::replay::generate_flow(unsigned long, unsigned int, unsigned long);qsl::engine::OrderBook::contains(unsigned long) const + 4 qsl-bench;decltype(auto) qsl::engine::OrderBook::dispatch_storage(qsl::engine::OrderBook::cancel(unsigned long)::{lambda()#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::IntrusiveStore&)#1}&&, qsl::engine::OrderBook::cancel(unsigned long)::{lambda(qsl::engine::OrderBook::ContiguousStore&)#1}&&) [clone .isra.0];[unknown];[unknown];cfree@GLIBC_2.17 + 4 qsl-bench;main;[unknown];[unknown];operator new(unsigned long);malloc + 4 qsl-bench;operator new(unsigned long);malloc@plt Benchmark output: -order_book add/mod/cancel 200000 ops 132.9 ns/op 7523640 ops/sec -protocol encode+decode 500000 ops 19.8 ns/op 50418890 ops/sec -gateway session (fill) 200000 ops 127.6 ns/op 7838397 ops/sec -matching engine flow 5004 items 102.5 ns/item 9759934 items/sec -replay command log 5004 items 111.8 ns/item 8943232 items/sec +order_book add/mod/cancel 200000 ops 132.8 ns/op 7531861 ops/sec +protocol encode+decode 500000 ops 20.5 ns/op 48773893 ops/sec +gateway session (fill) 200000 ops 127.4 ns/op 7848348 ops/sec +matching engine flow 5004 items 101.6 ns/item 9840697 items/sec +replay command log 5004 items 112.0 ns/item 8928265 items/sec From 5093beb518180a53ecda8a8180aa12011f3bcad8 Mon Sep 17 00:00:00 2001 From: nasr <156965421+div0rce@users.noreply.github.com> Date: Mon, 22 Jun 2026 12:40:30 -0400 Subject: [PATCH 11/11] docs: embed the flamegraph as a visible image in the README MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The flamegraph artifact, generator, provenance companion, and docs already existed but no page actually displayed the SVG — it was only referenced by filename. Embed the rendered results/flamegraph.svg as a visible image under the Benchmarks section, with a caption that classifies it honestly as a software cpu-clock sampling hot-symbol profile (not PMU evidence), names the hot frames, and links the provenance .txt and docs/perf_analysis.md. Co-Authored-By: Claude Opus 4.8 --- README.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/README.md b/README.md index 4332532..d1bb19d 100644 --- a/README.md +++ b/README.md @@ -109,6 +109,23 @@ Reproduce with `make bench` (numbers will differ by machine). The differential-t [`results/differential.txt`](results/differential.txt) — kept separate so it does not disturb the core numbers above. +### Flamegraph + +Where on-CPU time goes in the `qsl-bench` synthetic suite, rendered by `make flamegraph` +(`scripts/flamegraph.sh` → the dependency-free `scripts/flamegraph.py` — no external FlameGraph +toolchain): + +[![qsl-bench cpu-clock flamegraph](results/flamegraph.svg)](results/flamegraph.svg) + +This is a **software cpu-clock sampling** hot-symbol profile, **not** PMU evidence: frame width is +proportional to on-CPU samples (329 folded across 159 stacks on this run), not wall-clock latency or +throughput, and it is hardware/kernel/compiler/build dependent. The hot frames are protocol +`decode_new_order`, gateway session framing, `MatchingEngine::new_limit`, and order-book +cancel/allocation. Provenance and classification are in +[`results/flamegraph.txt`](results/flamegraph.txt); methodology in +[docs/perf_analysis.md](docs/perf_analysis.md). GitHub renders the SVG statically; download the raw +file for interactive zoom and search. + ## Limitations - **Synthetic and local.** No real market data, no real venue connectivity, no order types