Advanced Forensic Validation Sprint — Complete
Commit: 1c18b98
Tests: 561 passing (+49 new)
Delivered
Replay subsystem (forensics/replay.py)
execute_replay(case_dir, source_run_id, engine_version) re-parses evidence, re-runs plugins, diffs findings/hypotheses, persists replay_{id}.json to exports/, writes audit entry.
- States:
EXACT_MATCH / EXPECTED_DRIFT / UNEXPECTED_DRIFT / INCOMPATIBLE. No synthetic/fake replays — missing evidence returns INCOMPATIBLE with reason.
Run model + diff engine (forensics/models.py, forensics/diff.py)
AnalysisRun extended with parser/plugin/tuning provenance (backward-compat defaults, from_dict drops unknown keys).
compare_runs(case_dir, run_a, run_b) returns RunComparison covering findings, plugin executions, hypotheses, and tuning-profile changes.
Tuning profile system (forensics/tuning.py)
TuningProfile.default() with per-plugin AnalyzerConfigProfile + ThresholdSet for all 11 plugins; threshold values mirror analyzer source constants.
Validation corpus + harness (validation/, tests/corpus/)
- Seeded 3 cases:
CORPUS-normal-flight, CORPUS-crash, CORPUS-vibration-crash.
run_validation() parses evidence, runs all registered plugins via trust policy, compares to expected/should-not-find lists.
compute_quality_report() produces per-analyzer TP/FP/FN + precision/recall — numbers derived from real runs only.
scripts/validate_corpus.py — standalone CLI for CI.
API routes
POST /api/cases/{id}/runs/{run_id}/replay
GET /api/cases/{id}/runs/{run_id}/replay-verification
POST /api/cases/{id}/compare-runs
GET /api/cases/{id}/tuning-profile
POST /api/validation/run + GET /api/validation/results
GUI updates (web/static/index.html, vanilla JS, >=12px)
- Run selector shows crit/warn counts, plugin count, tuning profile, REPLAY badge.
- Replay button + color-coded replay result panel.
- Compare Runs panel inside Exports.
- Validation tab with per-case PASS/FAIL and per-analyzer quality table.
- Tuning Profile badge on Plugins tab.
Tests added (49)
tests/test_validation/test_replay.py
tests/test_validation/test_diff_engine.py
tests/test_validation/test_tuning.py
tests/test_validation/test_corpus.py
tests/test_web/test_replay_routes.py
Deferred intentionally
- Real hypothesis count wiring in harness (stub set to 0).
- Full threshold override plumbing from
tuning_profile.json into plugin execution (profile is persisted per run but plugins still read source constants).
- Corpus growth beyond 3 seed cases — hook-up point is ready, dataset is not.
Recommended next sprint
- Wire
TuningProfile overrides into plugin runtime (replace hard-coded constants with profile lookups).
- Grow corpus to 20+ cases spanning all analyzer categories; enable regression gating in CI.
- Implement true hypothesis replay by persisting hypotheses alongside findings per run.
- Add
/api/validation/history so the Validation tab can chart precision/recall over time.
Advanced Forensic Validation Sprint — Complete
Commit:
1c18b98Tests: 561 passing (+49 new)
Delivered
Replay subsystem (
forensics/replay.py)execute_replay(case_dir, source_run_id, engine_version)re-parses evidence, re-runs plugins, diffs findings/hypotheses, persistsreplay_{id}.jsontoexports/, writes audit entry.EXACT_MATCH/EXPECTED_DRIFT/UNEXPECTED_DRIFT/INCOMPATIBLE. No synthetic/fake replays — missing evidence returnsINCOMPATIBLEwith reason.Run model + diff engine (
forensics/models.py,forensics/diff.py)AnalysisRunextended with parser/plugin/tuning provenance (backward-compat defaults,from_dictdrops unknown keys).compare_runs(case_dir, run_a, run_b)returnsRunComparisoncovering findings, plugin executions, hypotheses, and tuning-profile changes.Tuning profile system (
forensics/tuning.py)TuningProfile.default()with per-pluginAnalyzerConfigProfile+ThresholdSetfor all 11 plugins; threshold values mirror analyzer source constants.Validation corpus + harness (
validation/,tests/corpus/)CORPUS-normal-flight,CORPUS-crash,CORPUS-vibration-crash.run_validation()parses evidence, runs all registered plugins via trust policy, compares to expected/should-not-find lists.compute_quality_report()produces per-analyzer TP/FP/FN + precision/recall — numbers derived from real runs only.scripts/validate_corpus.py— standalone CLI for CI.API routes
POST /api/cases/{id}/runs/{run_id}/replayGET /api/cases/{id}/runs/{run_id}/replay-verificationPOST /api/cases/{id}/compare-runsGET /api/cases/{id}/tuning-profilePOST /api/validation/run+GET /api/validation/resultsGUI updates (
web/static/index.html, vanilla JS, >=12px)Tests added (49)
tests/test_validation/test_replay.pytests/test_validation/test_diff_engine.pytests/test_validation/test_tuning.pytests/test_validation/test_corpus.pytests/test_web/test_replay_routes.pyDeferred intentionally
tuning_profile.jsoninto plugin execution (profile is persisted per run but plugins still read source constants).Recommended next sprint
TuningProfileoverrides into plugin runtime (replace hard-coded constants with profile lookups)./api/validation/historyso the Validation tab can chart precision/recall over time.