Skip to content

input materialization over-matches references, creating warning storms and irrelevant materialization #48

@danshapiro

Description

@danshapiro

Problem

Input materialization appears to over-classify arbitrary tokens as path/glob references, producing high warning volume and broad materialization of irrelevant files.

In run 01KJR25QS6RY52D7VS2ZRAAXMK, this behavior is stable across all stages rather than isolated to one node.

Why this matters

  • Warning noise hides actionable diagnostics.
  • Over-materialized inputs inflate context and can degrade convergence quality.
  • The system spends time on parsing/expanding invalid candidates that are clearly not file references.
  • Irrelevant artifacts in materialized inputs increase nondeterminism.

Evidence

Run artifacts:

  • ~/.local/state/kilroy/attractor/runs/01KJR25QS6RY52D7VS2ZRAAXMK/progress.ndjson
  • Stage manifests under .../01KJR25QS6RY52D7VS2ZRAAXMK/*/inputs_manifest.json

Observed counts from this run:

  • input_materialization_warning events: 1397
  • Per-stage manifests consistently show:
    • warnings=127
    • resolved_files≈2308
    • discovered_references=3388

Representative warnings include clearly non-path tokens:

  • expand input glob "you are wielding ([ch]) [weapon": syntax error in pattern
  • expand input glob "DEFAULT_TOOL_LIMITS[tool_name": syntax error in pattern

Materialized file sets include irrelevant artifact paths such as:

  • .worktrees/rogue-logs/worktree/.cargo-target/...

Relevant code to inspect:

  • internal/attractor/engine/input_reference_scan.go
  • internal/attractor/engine/input_materialization.go

Likely hotspots:

  • deterministicInputReferenceScanner token capture breadth
  • glob classification for tokens containing *?[
  • artifact-path filtering (isLikelyArtifactInputPath) currently not excluding .worktrees

Steps to reproduce / observe

  1. Inspect run 01KJR25QS6RY52D7VS2ZRAAXMK and count input_materialization_warning in progress.ndjson.
  2. Compare warnings/resolved_files/discovered_references across all stage manifests.
  3. Sample warning payloads and verify many candidates are non-reference text fragments.
  4. Inspect resolved_files for .worktrees/** and build-cache artifacts.

Scope boundaries

This issue is about reference scanning/classification/materialization precision.

This issue is not:

  • A provider/model routing issue.
  • A one-off suppression of warnings without improving classification quality.
  • A project-specific workaround.

Potential directions (non-prescriptive)

  • Tighten scanner heuristics for structured/unstructured token extraction.
  • Require stronger validity checks before classifying a token as a glob.
  • Expand default artifact exclusions to include .worktrees/**.
  • Add counters separating candidate extraction, rejected candidates, failed expansions, and accepted references.
  • Add regression tests for known bad-token patterns and artifact contamination.

Definition of done

  • Reproducing this run shape no longer yields warning storms from non-reference tokens.
  • .worktrees/** artifacts are not materialized by default.
  • Warning volume is materially reduced without dropping legitimate reference discovery.
  • Regression tests cover bad-token and artifact-path cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions