Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ This repository implements a small, deterministic C++ limit-order-book engine fo
- aggregated bid/ask levels plus order-ID lookup
- two price-level backends: `std::map` and flat sorted `std::vector`
- rolling analytics and CSV export after every processed message
- optional post-replay prediction summary reporting by message horizon
- deterministic C++ and Python integration tests
- replay benchmark tooling and a hand-maintained benchmark reproducibility note

Expand Down Expand Up @@ -61,6 +62,19 @@ Export analytics rows after every processed message:

If `--backend both` is selected, the CLI writes one CSV per backend by suffixing the output path.

Emit a separate prediction summary after replay without changing the analytics CSV rows:

```bash
"$build_dir/lob_engine" \
data/AAPL_sample_messages.csv \
--backend map \
--analytics-out "$build_dir/analytics.csv" \
--prediction-report-out "$build_dir/prediction_report.csv" \
--prediction-horizons 100,500
```

`--prediction-report-out` requires `--prediction-horizons`. If both flags are omitted, prediction work stays disabled.

## Analytics

Each processed message produces a row with:
Expand All @@ -78,6 +92,8 @@ The default rolling windows match the project objective:
- trailing `1000` messages for trade-based metrics
- trailing `300` seconds for realized volatility

Prediction reporting is a separate CSV keyed by message horizon. For each row `t`, the label is the sign of the first non-zero mid-price move found in `t+1 ... t+H` relative to mid at `t`. Rows with invalid current mid or no non-zero future move inside the horizon are skipped. The report includes labeled sample counts, up/down move counts, hit rate from `sign(order_imbalance_top5)` on non-zero-signal rows, and information coefficient computed as the Pearson correlation between the raw top-5 imbalance value and the future move sign. Zero-signal rows stay in the labeled sample and IC calculation but increment `skipped_zero_signal` so they are excluded from the hit-rate denominator.

## Backends

Two backends are implemented behind the same `OrderBook` interface:
Expand Down
241 changes: 241 additions & 0 deletions include/lob/analytics.hpp
Original file line number Diff line number Diff line change
@@ -1,21 +1,221 @@
#pragma once

#include <cstddef>
#include <initializer_list>
#include <limits>
#include <memory>
#include <optional>
#include <string>
#include <utility>
#include <vector>

#include "lob/order_book.hpp"
#include "lob/types.hpp"

namespace lob {

struct OptionalStringSetting {
OptionalStringSetting() = default;
OptionalStringSetting(std::nullopt_t) noexcept {}
OptionalStringSetting(const std::optional<std::string>& text)
: value_(text) {}
OptionalStringSetting(std::optional<std::string>&& text) noexcept
: value_(std::move(text)) {}
OptionalStringSetting(const std::string& text)
: value_(text) {}
OptionalStringSetting(std::string&& text) noexcept
: value_(std::move(text)) {}
OptionalStringSetting(const char* text)
: value_(text == nullptr ? std::optional<std::string>{} : std::optional<std::string>{text}) {}

OptionalStringSetting& operator=(std::nullopt_t) noexcept {
value_.reset();
return *this;
}

OptionalStringSetting& operator=(const std::optional<std::string>& text) {
value_ = text;
return *this;
}

OptionalStringSetting& operator=(std::optional<std::string>&& text) noexcept {
value_ = std::move(text);
return *this;
}

OptionalStringSetting& operator=(const std::string& text) {
value_ = text;
return *this;
}

OptionalStringSetting& operator=(std::string&& text) noexcept {
value_ = std::move(text);
return *this;
}

OptionalStringSetting& operator=(const char* text) {
value_ = text == nullptr ? std::optional<std::string>{} : std::optional<std::string>{text};
return *this;
}

bool has_value() const noexcept {
return value_.has_value();
}

bool empty() const noexcept {
return !value_.has_value() || value_->empty();
}

void reset() noexcept {
value_.reset();
}

const std::string& value() const {
return value_.value();
}

std::string value_or(std::string default_value) const {
return value_.value_or(std::move(default_value));
}

const std::string& operator*() const {
return value();
}

std::string& operator*() {
return value_.value();
}

const std::string* operator->() const {
return &value();
}

std::string* operator->() {
return &value_.value();
}

explicit operator bool() const noexcept {
return value_.has_value();
}

friend bool operator==(const OptionalStringSetting& lhs, std::nullopt_t) noexcept {
return !lhs.value_.has_value();
}

friend bool operator==(std::nullopt_t, const OptionalStringSetting& rhs) noexcept {
return rhs == std::nullopt;
}

friend bool operator!=(const OptionalStringSetting& lhs, std::nullopt_t) noexcept {
return !(lhs == std::nullopt);
}

friend bool operator!=(std::nullopt_t, const OptionalStringSetting& rhs) noexcept {
return !(rhs == std::nullopt);
}

friend bool operator==(const OptionalStringSetting& lhs, const std::string& rhs) {
return lhs.value_ == rhs;
}

friend bool operator==(const std::string& lhs, const OptionalStringSetting& rhs) {
return rhs == lhs;
}

friend bool operator!=(const OptionalStringSetting& lhs, const std::string& rhs) {
return !(lhs == rhs);
}

friend bool operator!=(const std::string& lhs, const OptionalStringSetting& rhs) {
return !(rhs == lhs);
}

friend bool operator==(const OptionalStringSetting& lhs, const char* rhs) {
return lhs == std::string(rhs == nullptr ? "" : rhs);
}

friend bool operator==(const char* lhs, const OptionalStringSetting& rhs) {
return rhs == lhs;
}

friend bool operator!=(const OptionalStringSetting& lhs, const char* rhs) {
return !(lhs == rhs);
}

friend bool operator!=(const char* lhs, const OptionalStringSetting& rhs) {
return !(rhs == lhs);
}

friend bool operator==(const OptionalStringSetting& lhs, const std::optional<std::string>& rhs) {
return lhs.value_ == rhs;
}

friend bool operator==(const std::optional<std::string>& lhs, const OptionalStringSetting& rhs) {
return rhs == lhs;
}

friend bool operator!=(const OptionalStringSetting& lhs, const std::optional<std::string>& rhs) {
return !(lhs == rhs);
}

friend bool operator!=(const std::optional<std::string>& lhs, const OptionalStringSetting& rhs) {
return !(rhs == lhs);
}

private:
std::optional<std::string> value_{};
};

struct AnalyticsConfig {
std::size_t trade_window_messages{1000};
double realized_vol_window_seconds{300.0};
std::size_t depth_levels{10};
std::size_t expected_messages{0};
std::vector<std::size_t> prediction_horizons{};
OptionalStringSetting prediction_report_out{};
std::vector<int> prediction_horizons_messages{};

bool prediction_report_output_enabled() const noexcept {
return prediction_report_out.has_value() && !prediction_report_out.empty();
}

std::vector<std::size_t> resolved_prediction_horizons() const {
const bool use_message_horizons = !prediction_horizons_messages.empty();
std::vector<std::size_t> horizons;

if (use_message_horizons) {
horizons.reserve(prediction_horizons_messages.size());
for (const int horizon : prediction_horizons_messages) {
if (horizon > 0) {
horizons.push_back(static_cast<std::size_t>(horizon));
}
}
return horizons;
}

horizons.reserve(prediction_horizons.size());
for (const std::size_t horizon : prediction_horizons) {
if (horizon > 0 &&
horizon <= static_cast<std::size_t>(std::numeric_limits<int>::max())) {
horizons.push_back(horizon);
}
}
return horizons;
}

std::vector<int> resolved_prediction_horizons_messages() const {
const std::vector<std::size_t> resolved_horizons = resolved_prediction_horizons();
std::vector<int> horizons;
horizons.reserve(resolved_horizons.size());
for (const std::size_t horizon : resolved_horizons) {
horizons.push_back(static_cast<int>(horizon));
}
return horizons;
}

bool prediction_reporting_enabled() const {
return prediction_report_output_enabled() && !resolved_prediction_horizons().empty();
}
};

struct AnalyticsRow {
Expand All @@ -36,6 +236,29 @@ struct AnalyticsRow {
std::optional<double> rolling_realized_vol;
};

struct PredictionSnapshot {
std::size_t message_index{0};
std::optional<double> mid_price;
std::optional<double> order_imbalance_top5;
};

struct PredictionSummaryRow {
std::size_t horizon_messages{0};
std::size_t total_rows_seen{0};
std::size_t eligible_rows_with_valid_mid{0};
std::size_t labeled_rows{0};
std::size_t skipped_no_valid_mid{0};
std::size_t skipped_no_future_move_within_horizon{0};
std::size_t skipped_zero_signal{0};
std::size_t up_moves{0};
std::size_t down_moves{0};
std::size_t correct_predictions{0};
std::size_t incorrect_predictions{0};
double hit_rate{0.0};
double information_coefficient{0.0};
double coverage_vs_total{0.0};
};

class AnalyticsEngine {
public:
explicit AnalyticsEngine(AnalyticsConfig config = {});
Expand Down Expand Up @@ -64,4 +287,22 @@ std::vector<AnalyticsRow> replay_with_analytics(

void write_analytics_csv(const std::vector<AnalyticsRow>& rows, const std::string& output_path);

std::vector<PredictionSnapshot> collect_prediction_snapshots(const std::vector<AnalyticsRow>& rows);

std::vector<PredictionSummaryRow> summarize_prediction_horizons(
const std::vector<PredictionSnapshot>& snapshots,
const std::vector<std::size_t>& horizons);

std::vector<PredictionSummaryRow> summarize_prediction_horizons(
const std::vector<PredictionSnapshot>& snapshots,
const std::vector<int>& horizons);

std::vector<PredictionSummaryRow> summarize_prediction_horizons(
const std::vector<PredictionSnapshot>& snapshots,
std::initializer_list<int> horizons);

void write_prediction_report_csv(
const std::vector<PredictionSummaryRow>& rows,
const std::string& output_path);

} // namespace lob
21 changes: 21 additions & 0 deletions report/benchmark_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,27 @@ ctest --test-dir "$build_dir" --output-on-failure -C Release
python -m pytest tests -q --tb=short
```

## Prediction Reporting Feature Gate

The new prediction labeling/reporting path is outside the replay-only benchmark timer and remains optional. The core `lob_benchmark` command is still the replay hot-path check:

```bash
taskset -c 0 "$build_dir/lob_benchmark" --dataset data/AAPL_sample_messages.csv --backend both --reserve on --depth 5 --repeat 100000
```

To exercise the same dataset through the normal replay CLI with prediction reporting disabled versus enabled:

```bash
"$build_dir/lob_engine" data/AAPL_sample_messages.csv --backend map --analytics-out "$build_dir/analytics.csv"
"$build_dir/lob_engine" data/AAPL_sample_messages.csv --backend map --analytics-out "$build_dir/analytics.csv" --prediction-report-out "$build_dir/prediction_report.csv" --prediction-horizons 100
```

Expected behavior:

- without prediction flags, the CLI emits the existing analytics CSV only
- with prediction flags, the analytics CSV stays unchanged and a separate prediction report CSV is added
- any extra work is feature-gated to the prediction-enabled CLI path; the replay-only benchmark command above remains valid and unchanged

## Measurement methodology

- baseline variant: clean `origin/main` tree at commit `d627b73`
Expand Down
Loading
Loading