Summary
Review finding supersession always reports "superseded 0" because different review modes write to different task_group values for the same spec. Supersession matches on (spec_name, task_group), so findings from pre-review, audit-review, and verification accumulate in parallel silos that never invalidate each other. Later coding sessions receive the union of all silos — including stale, already-resolved findings.
Root Cause
_supersede_active_records in review_store.py:127 filters by exact (spec_name, task_group) match. But different review modes use structurally different task_group values for the same spec:
| Source |
Node ID pattern |
task_group |
Table |
| Pre-review |
spec:0:reviewer:pre-review |
"0" |
review_findings |
| Audit-review |
spec:{N}:reviewer:audit-review |
"" (hardcoded) |
review_findings |
| Verifier |
spec:{M}:verifier |
"{M}" |
verification_results |
These three silos never intersect, so supersession never fires across review types.
Why "superseded 0" is nearly universal
- Each
(spec, task_group) pair typically has only one review session per run — nothing to supersede
- Cross-run supersession only fires when the exact same
(spec, task_group) is re-reviewed in a later run
- Cross-mode supersession is structurally impossible due to task_group partitioning
The audit-review task_group="" hardcode
auditor_output.py:175 hardcodes task_group="", discarding the real group number from the audit-review node_id. Even if supersession were broadened, audit findings exist in a phantom silo that matches nothing:
finding = ReviewFinding(
...
task_group="", # <-- always empty string
session_id=f"{spec_name}:audit:{attempt}",
...
)
The retrieval amplification problem
fox_provider._query_reviews() (fox_provider.py:122) calls query_active_findings(conn, spec_name) without a task_group filter — returning all active findings across all silos. A coder session for spec X receives the union of every pre-review, audit, and coverage_regression finding ever inserted, regardless of whether the issues were already addressed.
Evidence
DuckDB analysis from hack/.agent-fox/knowledge.duckdb:
- 12
(spec, task_group) combinations in review_findings — all use task_group="0" (pre-review only in this DB)
- Only 1 spec had any superseded records:
110_hunt_dedup_and_ignore (19 superseded, 7 active) — the only spec reviewed in two separate runs
- Every other spec: 0 superseded records
- Pre-review findings always at
task_group="0", verification verdicts at "4", "5", "6", "8" — 100% mismatch across all 7 specs that have both
Affected Code
| File |
Line |
Issue |
agent_fox/knowledge/review_store.py:127 |
_supersede_active_records |
Supersedes by exact (spec_name, task_group) — correct logic but wrong granularity |
agent_fox/session/auditor_output.py:175 |
task_group="" |
Hardcoded empty string, creating phantom silo |
agent_fox/knowledge/fox_provider.py:122 |
query_active_findings(conn, spec_name) |
Retrieves all task_groups, returning stale + current findings |
agent_fox/graph/injection.py:429 |
Pre-review always at group 0 |
Structural: pre-review is always spec:0 |
agent_fox/graph/injection.py:331 |
Audit-review at group N |
Structural: audit-review uses test group number |
Fix Direction
Two complementary fixes:
-
Fix audit-review task_group: auditor_output.py:175 should use the real group number from the node_id instead of hardcoding "".
-
Scope supersession to spec-level: Either supersede all active findings for a spec regardless of task_group when any new review runs, or have fox_provider filter by the relevant task_group when retrieving findings for a specific coder session.
Summary
Review finding supersession always reports "superseded 0" because different review modes write to different
task_groupvalues for the same spec. Supersession matches on(spec_name, task_group), so findings from pre-review, audit-review, and verification accumulate in parallel silos that never invalidate each other. Later coding sessions receive the union of all silos — including stale, already-resolved findings.Root Cause
_supersede_active_recordsinreview_store.py:127filters by exact(spec_name, task_group)match. But different review modes use structurally different task_group values for the same spec:spec:0:reviewer:pre-review"0"spec:{N}:reviewer:audit-review""(hardcoded)spec:{M}:verifier"{M}"These three silos never intersect, so supersession never fires across review types.
Why "superseded 0" is nearly universal
(spec, task_group)pair typically has only one review session per run — nothing to supersede(spec, task_group)is re-reviewed in a later runThe audit-review
task_group=""hardcodeauditor_output.py:175hardcodestask_group="", discarding the real group number from the audit-review node_id. Even if supersession were broadened, audit findings exist in a phantom silo that matches nothing:The retrieval amplification problem
fox_provider._query_reviews()(fox_provider.py:122) callsquery_active_findings(conn, spec_name)without a task_group filter — returning all active findings across all silos. A coder session for spec X receives the union of every pre-review, audit, and coverage_regression finding ever inserted, regardless of whether the issues were already addressed.Evidence
DuckDB analysis from
hack/.agent-fox/knowledge.duckdb:(spec, task_group)combinations in review_findings — all usetask_group="0"(pre-review only in this DB)110_hunt_dedup_and_ignore(19 superseded, 7 active) — the only spec reviewed in two separate runstask_group="0", verification verdicts at"4","5","6","8"— 100% mismatch across all 7 specs that have bothAffected Code
agent_fox/knowledge/review_store.py:127_supersede_active_records(spec_name, task_group)— correct logic but wrong granularityagent_fox/session/auditor_output.py:175task_group=""agent_fox/knowledge/fox_provider.py:122query_active_findings(conn, spec_name)agent_fox/graph/injection.py:429spec:0agent_fox/graph/injection.py:331Fix Direction
Two complementary fixes:
Fix audit-review task_group:
auditor_output.py:175should use the real group number from the node_id instead of hardcoding"".Scope supersession to spec-level: Either supersede all active findings for a spec regardless of task_group when any new review runs, or have
fox_providerfilter by the relevant task_group when retrieving findings for a specific coder session.