This repository implements a small, deterministic C++ limit-order-book engine for LOBSTER-style message data. The parser and replay code operate on the LOBSTER six-column message schema, but the checked-in CSVs are tiny synthetic/reduced fixtures for reproducibility, not full proprietary LOBSTER distributions. The repo includes:
- typed CSV ingestion for LOBSTER message rows
- order lifecycle processing for add, cancel, and execute events
- aggregated bid/ask levels plus order-ID lookup
- two price-level backends:
std::mapand flat sortedstd::vector - rolling analytics and CSV export after every processed message
- optional post-replay prediction summary reporting by message horizon
- deterministic C++ and Python integration tests
- replay benchmark tooling and a hand-maintained benchmark reproducibility note
include/lob/: public headers for parsing, order book state, replay, and analyticssrc/: parser, order book engine, replay, analytics, and CLI entrypointtests/: C++ unit tests plus Python integration coverage for the CMake workflowbenchmark/: replay benchmark harnessdata/: checked-in small sample datasets used for deterministic tests and reproducible benchmark capturesreport/: benchmark and methodology notes
From a fresh clone, run the build, verifier, and benchmark commands below in order. Start with a clean temporary build directory instead of an in-repo build tree:
build_dir="$(mktemp -d "${TMPDIR:-/tmp}/lob-engine-build.XXXXXX")"
cmake -S . -B "$build_dir" -DCMAKE_BUILD_TYPE=Release
cmake --build "$build_dir" --config ReleaseRun the CMake/CTest verifier from that build directory, then run the existing Python test suite from the repo root:
ctest --test-dir "$build_dir" --output-on-failure -C Release
python -m pytest tests -q --tb=shortctest runs the three C++ test executables plus the lob_benchmark_smoke path. python -m pytest tests -q --tb=short configures and reuses a separate .cmake-test-build/ directory under the repo root; that directory and the analytics CSVs produced there are ignored local test artifacts.
Replay a dataset and print final top-of-book state:
"$build_dir/lob_engine" data/AAPL_sample_messages.csv --backend both --depth 10 --repeat 5Export analytics rows after every processed message:
"$build_dir/lob_engine" \
data/AAPL_sample_messages.csv \
--backend both \
--analytics-out "$build_dir/analytics.csv" \
--trade-window-messages 1000 \
--realized-vol-window-seconds 300If --backend both is selected, the CLI writes one CSV per backend by suffixing the output path.
Emit a separate prediction summary after replay without changing the analytics CSV rows:
"$build_dir/lob_engine" \
data/AAPL_sample_messages.csv \
--backend map \
--analytics-out "$build_dir/analytics.csv" \
--prediction-report-out "$build_dir/prediction_report.csv" \
--prediction-horizons 100,500--prediction-report-out requires --prediction-horizons. If both flags are omitted, prediction work stays disabled.
Each processed message produces a row with:
timestampbest_bid,best_ask,spread,midbid_depth_{1,5,10},ask_depth_{1,5,10}order_imbalancerolling_vwaptrade_flow_imbalancerolling_realized_vol
The default rolling windows match the project objective:
- trailing
1000messages for trade-based metrics - trailing
300seconds for realized volatility
Prediction reporting is a separate CSV keyed by message horizon. For each row t, the label is the sign of the first non-zero mid-price move found in t+1 ... t+H relative to mid at t. Rows with invalid current mid or no non-zero future move inside the horizon are skipped. The report includes labeled sample counts, up/down move counts, hit rate from sign(order_imbalance_top5) on non-zero-signal rows, and information coefficient computed as the Pearson correlation between the raw top-5 imbalance value and the future move sign. Zero-signal rows stay in the labeled sample and IC calculation but increment skipped_zero_signal so they are excluded from the hit-rate denominator.
Two backends are implemented behind the same OrderBook interface:
map- sorted levels via
std::map - stable
O(log n)insert/update/remove at the level container
- sorted levels via
flat- sorted levels in a binary-searched
std::vector - better cache locality on shallow books
- more expensive interior insert/erase at larger active depth
- sorted levels in a binary-searched
Deterministic parity tests assert that both backends produce identical book snapshots after each message in the shared test sequences.
The checked-in benchmark numbers below come from the existing lob_benchmark replay harness in Release mode. The timer still covers replay/book updates, not CSV export. Analytics correctness and backend parity stay covered by test_analytics, and the CLI analytics path now shares the same derived book reserve hints plus pre-sized rolling buffers.
Exact hot-path allocation changes in this branch:
- derive
expected_ordersfrom the peak active-order count in the parsed message stream before replay instead of hard-codingmessages.size() - derive
expected_levels_per_sidefrom the peak active bid/ask level count instead of hard-coding64 - pre-size the rolling trade window and realized-vol sample buffer in analytics, and retain that capacity across
AnalyticsEngine::reset() - construct
AnalyticsRowvalues in place during replay instead of pushing a temporary row object per message
Measurement method used for the recorded table:
- baseline tree: clean
origin/maincheckout atd627b73 - optimized tree: this worktree after the reserve/buffer changes below
- build:
cmake -S . -B "$build_dir" -DCMAKE_BUILD_TYPE=Release && cmake --build "$build_dir" --config Release - warmup: one untimed
taskset -c 0 "$build_dir/lob_benchmark" --dataset data/AAPL_sample_messages.csv --backend both --reserve on --depth 5 --repeat 10000 - measured commands: the four
taskset -c 0 "$build_dir/lob_benchmark" --dataset ... --backend both --reserve on --depth 5 --repeat 100000invocations listed below - host: Linux 6.8.0-106-generic,
g++ 13.3.0, AMD EPYC-Rome Processor, benchmark process pinned to CPU 0
These four commands are the recorded measurement step:
taskset -c 0 "$build_dir/lob_benchmark" --dataset data/AAPL_sample_messages.csv --backend both --reserve on --depth 5 --repeat 100000
taskset -c 0 "$build_dir/lob_benchmark" --dataset data/MSFT_sample_messages.csv --backend both --reserve on --depth 5 --repeat 100000
taskset -c 0 "$build_dir/lob_benchmark" --dataset data/NVDA_sample_messages.csv --backend both --reserve on --depth 5 --repeat 100000
taskset -c 0 "$build_dir/lob_benchmark" --dataset data/TSLA_sample_messages.csv --backend both --reserve on --depth 5 --repeat 100000On the optimized tree, lob_benchmark now prints the derived reserve hints alongside each run. For the four checked-in ticker fixtures, the derived replay hints are expected_orders=3 and expected_levels_per_side=3.
Recorded throughput on this host:
| Dataset | Backend | Baseline elapsed ms | Baseline msgs/sec | Optimized elapsed ms | Optimized msgs/sec | Delta |
|---|---|---|---|---|---|---|
AAPL |
map |
39.272 | 50,926,679.688 | 32.665 | 61,228,168.484 | +20.23% |
AAPL |
flat_vector |
36.477 | 54,829,135.007 | 30.653 | 65,245,476.645 | +19.00% |
MSFT |
map |
34.837 | 57,409,888.578 | 33.477 | 59,743,302.149 | +4.06% |
MSFT |
flat_vector |
36.481 | 54,822,944.367 | 30.856 | 64,816,866.749 | +18.23% |
NVDA |
map |
37.224 | 53,729,301.089 | 33.445 | 59,799,408.267 | +11.30% |
NVDA |
flat_vector |
37.173 | 53,801,878.832 | 32.955 | 60,689,226.916 | +12.80% |
TSLA |
map |
34.614 | 57,780,646.523 | 33.194 | 60,252,569.734 | +4.28% |
TSLA |
flat_vector |
36.451 | 54,868,240.915 | 31.245 | 64,010,909.507 | +16.66% |
Post-optimization backend comparison on these fixtures:
flat_vectorremains faster thanmapon all four reduced ticker fixtures after the change: +6.56% onAAPL, +8.49% onMSFT, +1.49% onNVDA, and +6.24% onTSLA- the margin stays narrow because these fixtures top out at three active orders and three active levels per side after malformed rows are dropped
- these numbers are local measurements on tiny synthetic fixtures; do not generalize them to deeper books or proprietary full-session datasets
--reserve on enables:
- auto-derived
unordered_map::reserve()sizing for order lookup - auto-derived vector capacity reservation for the flat backend price-level storage
This is the bounded hot-path allocation reduction implemented in the repo. Throughput numbers are host-dependent and should be treated as local measurements on the checked-in reduced fixtures, not as publishable claims about full vendor datasets. See report/benchmark_report.md for the exact datasets and commands used for reproducible reruns.
The repo ships five checked-in reproducibility fixtures:
AAPL_sample_messages.csvMSFT_sample_messages.csvNVDA_sample_messages.csvTSLA_sample_messages.csvsample_messages.csv
The four ticker-named files are 25-line reduced fixtures with 20 valid messages plus 5 intentionally malformed rows each. sample_messages.csv is a legacy generic fixture with the same contents as AAPL_sample_messages.csv, kept because the parser and Python integration tests reference it directly.
These files are intentionally tiny and deterministic so the build, tests, and benchmark workflow can run on a fresh clone without external data dependencies. They are suitable for correctness checks and relative replay comparisons, not production-grade market simulation or claims about full vendor data.
This codebase gives a compact environment for validating:
- message parsing assumptions
- order-book state transitions
- top-of-book and depth analytics
- replay throughput tradeoffs between container choices
- how much simple preallocation changes replay performance on shallow books
It is intentionally small enough to audit but still structured like a real research prototype: deterministic tests, reproducible build flow, benchmark tooling, and clear documentation.