Skip to content

Supersession never fires — task_group partitioning creates isolated silos per review mode #548

@mickume

Description

@mickume

Summary

Review finding supersession always reports "superseded 0" because different review modes write to different task_group values for the same spec. Supersession matches on (spec_name, task_group), so findings from pre-review, audit-review, and verification accumulate in parallel silos that never invalidate each other. Later coding sessions receive the union of all silos — including stale, already-resolved findings.

Root Cause

_supersede_active_records in review_store.py:127 filters by exact (spec_name, task_group) match. But different review modes use structurally different task_group values for the same spec:

Source Node ID pattern task_group Table
Pre-review spec:0:reviewer:pre-review "0" review_findings
Audit-review spec:{N}:reviewer:audit-review "" (hardcoded) review_findings
Verifier spec:{M}:verifier "{M}" verification_results

These three silos never intersect, so supersession never fires across review types.

Why "superseded 0" is nearly universal

  1. Each (spec, task_group) pair typically has only one review session per run — nothing to supersede
  2. Cross-run supersession only fires when the exact same (spec, task_group) is re-reviewed in a later run
  3. Cross-mode supersession is structurally impossible due to task_group partitioning

The audit-review task_group="" hardcode

auditor_output.py:175 hardcodes task_group="", discarding the real group number from the audit-review node_id. Even if supersession were broadened, audit findings exist in a phantom silo that matches nothing:

finding = ReviewFinding(
    ...
    task_group="",   # <-- always empty string
    session_id=f"{spec_name}:audit:{attempt}",
    ...
)

The retrieval amplification problem

fox_provider._query_reviews() (fox_provider.py:122) calls query_active_findings(conn, spec_name) without a task_group filter — returning all active findings across all silos. A coder session for spec X receives the union of every pre-review, audit, and coverage_regression finding ever inserted, regardless of whether the issues were already addressed.

Evidence

DuckDB analysis from hack/.agent-fox/knowledge.duckdb:

  • 12 (spec, task_group) combinations in review_findings — all use task_group="0" (pre-review only in this DB)
  • Only 1 spec had any superseded records: 110_hunt_dedup_and_ignore (19 superseded, 7 active) — the only spec reviewed in two separate runs
  • Every other spec: 0 superseded records
  • Pre-review findings always at task_group="0", verification verdicts at "4", "5", "6", "8"100% mismatch across all 7 specs that have both

Affected Code

File Line Issue
agent_fox/knowledge/review_store.py:127 _supersede_active_records Supersedes by exact (spec_name, task_group) — correct logic but wrong granularity
agent_fox/session/auditor_output.py:175 task_group="" Hardcoded empty string, creating phantom silo
agent_fox/knowledge/fox_provider.py:122 query_active_findings(conn, spec_name) Retrieves all task_groups, returning stale + current findings
agent_fox/graph/injection.py:429 Pre-review always at group 0 Structural: pre-review is always spec:0
agent_fox/graph/injection.py:331 Audit-review at group N Structural: audit-review uses test group number

Fix Direction

Two complementary fixes:

  1. Fix audit-review task_group: auditor_output.py:175 should use the real group number from the node_id instead of hardcoding "".

  2. Scope supersession to spec-level: Either supersede all active findings for a spec regardless of task_group when any new review runs, or have fox_provider filter by the relevant task_group when retrieving findings for a specific coder session.

Metadata

Metadata

Assignees

No one assigned

    Labels

    af:fixIssues ready to be implementedaf:fixed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions