Skip to content

Advanced Forensic Validation Sprint complete #31

@USTungsten

Description

@USTungsten

Advanced Forensic Validation Sprint — Complete

Commit: 1c18b98
Tests: 561 passing (+49 new)

Delivered

Replay subsystem (forensics/replay.py)

  • execute_replay(case_dir, source_run_id, engine_version) re-parses evidence, re-runs plugins, diffs findings/hypotheses, persists replay_{id}.json to exports/, writes audit entry.
  • States: EXACT_MATCH / EXPECTED_DRIFT / UNEXPECTED_DRIFT / INCOMPATIBLE. No synthetic/fake replays — missing evidence returns INCOMPATIBLE with reason.

Run model + diff engine (forensics/models.py, forensics/diff.py)

  • AnalysisRun extended with parser/plugin/tuning provenance (backward-compat defaults, from_dict drops unknown keys).
  • compare_runs(case_dir, run_a, run_b) returns RunComparison covering findings, plugin executions, hypotheses, and tuning-profile changes.

Tuning profile system (forensics/tuning.py)

  • TuningProfile.default() with per-plugin AnalyzerConfigProfile + ThresholdSet for all 11 plugins; threshold values mirror analyzer source constants.

Validation corpus + harness (validation/, tests/corpus/)

  • Seeded 3 cases: CORPUS-normal-flight, CORPUS-crash, CORPUS-vibration-crash.
  • run_validation() parses evidence, runs all registered plugins via trust policy, compares to expected/should-not-find lists.
  • compute_quality_report() produces per-analyzer TP/FP/FN + precision/recall — numbers derived from real runs only.
  • scripts/validate_corpus.py — standalone CLI for CI.

API routes

  • POST /api/cases/{id}/runs/{run_id}/replay
  • GET /api/cases/{id}/runs/{run_id}/replay-verification
  • POST /api/cases/{id}/compare-runs
  • GET /api/cases/{id}/tuning-profile
  • POST /api/validation/run + GET /api/validation/results

GUI updates (web/static/index.html, vanilla JS, >=12px)

  • Run selector shows crit/warn counts, plugin count, tuning profile, REPLAY badge.
  • Replay button + color-coded replay result panel.
  • Compare Runs panel inside Exports.
  • Validation tab with per-case PASS/FAIL and per-analyzer quality table.
  • Tuning Profile badge on Plugins tab.

Tests added (49)

  • tests/test_validation/test_replay.py
  • tests/test_validation/test_diff_engine.py
  • tests/test_validation/test_tuning.py
  • tests/test_validation/test_corpus.py
  • tests/test_web/test_replay_routes.py

Deferred intentionally

  • Real hypothesis count wiring in harness (stub set to 0).
  • Full threshold override plumbing from tuning_profile.json into plugin execution (profile is persisted per run but plugins still read source constants).
  • Corpus growth beyond 3 seed cases — hook-up point is ready, dataset is not.

Recommended next sprint

  1. Wire TuningProfile overrides into plugin runtime (replace hard-coded constants with profile lookups).
  2. Grow corpus to 20+ cases spanning all analyzer categories; enable regression gating in CI.
  3. Implement true hypothesis replay by persisting hypotheses alongside findings per run.
  4. Add /api/validation/history so the Validation tab can chart precision/recall over time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions