Skip to content

Add predictive signal report with IC#13

Merged
Aroesler1 merged 1 commit into
mainfrom
codex/repo_feature_build-205-20260406-010810
Apr 6, 2026
Merged

Add predictive signal report with IC#13
Aroesler1 merged 1 commit into
mainfrom
codex/repo_feature_build-205-20260406-010810

Conversation

@Aroesler1
Copy link
Copy Markdown
Owner

Implements the predictive-signal milestone by adding top-5 order-imbalance prediction reporting with configurable horizons, hit rate, and information coefficient, while preserving the existing analytics CSV path.

Validation run locally:

  • cmake -S . -B build_verify -DCMAKE_BUILD_TYPE=Release
  • cmake --build build_verify -j2
  • ctest --test-dir build_verify --output-on-failure -C Release
  • pytest tests/test_parser.py -q

Copilot AI review requested due to automatic review settings April 6, 2026 01:45
@Aroesler1 Aroesler1 merged commit f424e06 into main Apr 6, 2026
6 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional post-replay “prediction report” pipeline that evaluates the directional usefulness of the existing top-5 order-imbalance signal across configurable message horizons, while keeping the existing analytics CSV export unchanged.

Changes:

  • Extend analytics/public API with prediction snapshot + summary types, horizon resolution, and CSV writer (including hit rate + information coefficient).
  • Add CLI flags (--prediction-report-out, --prediction-horizons) to generate per-backend prediction report CSVs alongside (unchanged) analytics CSVs.
  • Expand C++ and Python integration tests to validate report outputs, config behavior, and CLI validation errors; update docs/benchmark notes to describe the feature gate.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
include/lob/analytics.hpp Adds prediction reporting config fields + new public structs/APIs for prediction summaries and report writing.
src/analytics.cpp Implements labeling, hit-rate + running Pearson IC calculation, snapshot collection, and prediction report CSV output.
src/main.cpp Adds CLI parsing/validation for prediction flags and emits prediction report output per backend.
tests/test_analytics.cpp Adds unit tests for prediction config resolution, labeling rules, IC behavior, and CSV writer output.
tests/test_parser.py Extends integration test to verify prediction report files exist and analytics CSV output remains unchanged.
README.md Documents prediction reporting semantics, flags, and label/metric definitions.
report/benchmark_report.md Notes prediction reporting is feature-gated and outside the replay-only benchmark timing path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/main.cpp
Comment on lines 1 to 10
#include <array>
#include <cctype>
#include <cstdlib>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <limits>
#include <optional>
#include <string>
#include <vector>
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::max is used in this file (e.g., when setting analytics_config.depth_levels), but <algorithm> is not included. Relying on transitive includes is non-portable; add #include <algorithm> to ensure this compiles consistently across standard library implementations.

Copilot uses AI. Check for mistakes.
Comment thread src/analytics.cpp
Comment on lines +171 to +199
std::vector<PredictionSummaryRow> summarize_prediction_horizons_impl(
const std::vector<PredictionSnapshot>& snapshots,
const std::vector<std::size_t>& horizons) {
std::vector<PredictionSummaryRow> summaries;
summaries.reserve(horizons.size());

for (const std::size_t horizon : horizons) {
if (horizon == 0) {
throw std::invalid_argument("Prediction horizons must be positive");
}

PredictionSummaryRow summary;
RunningPearsonCorrelation information_coefficient;
summary.horizon_messages = horizon;
summary.total_rows_seen = snapshots.size();

for (std::size_t index = 0; index < snapshots.size(); ++index) {
const PredictionSnapshot& snapshot = snapshots[index];
if (!snapshot.mid_price.has_value()) {
++summary.skipped_no_valid_mid;
continue;
}

++summary.eligible_rows_with_valid_mid;
const std::optional<int> label = first_future_label(snapshots, index, horizon);
if (!label.has_value()) {
++summary.skipped_no_future_move_within_horizon;
continue;
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summarize_prediction_horizons_impl calls first_future_label(...) for every row, and first_future_label scans up to horizon steps ahead. This makes runtime O(N * horizon) per horizon (and O(N * sum(horizons)) overall), which can get expensive on large datasets/horizons. Consider precomputing the first non-zero future move direction once (or using a two-pointer / next-nonzero index cache) and reusing it across horizons to keep the prediction-report path scalable.

Copilot uses AI. Check for mistakes.
Comment thread tests/test_analytics.cpp
assert(almost_equal(summary.coverage_vs_total, 0.0));
}

void test_prediction_zero_signal_is_skipped_instead_of_labeled() {
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name suggests zero-signal rows are skipped instead of labeled, but the assertions expect labeled_rows == 2 and skipped_zero_signal == 1 (i.e., the row is labeled but excluded from the hit-rate denominator). Renaming this test to reflect the actual behavior would make the intent clearer and avoid confusion with the report semantics.

Suggested change
void test_prediction_zero_signal_is_skipped_instead_of_labeled() {
void test_prediction_zero_signal_is_excluded_from_hit_rate_but_tracked_in_summary() {

Copilot uses AI. Check for mistakes.
@Aroesler1 Aroesler1 deleted the codex/repo_feature_build-205-20260406-010810 branch April 6, 2026 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants