feat: bash/web/grep dedup, diff-aware re-read, post-compact recovery, structured-config sections, auto-redirect#2
Merged
Conversation
Adds three large optimisation surfaces that previously left tokens on the table: 1. Bash output interception. A new PostToolUse(Bash) hook persists large stdout/stderr to disk under `data_dir() / bash_outputs` and records the command in the session cache. On a repeat invocation the pre-Bash hint suggests `token-goat bash-output <id>` (with `--head`, `--tail`, and `--grep` slicers) instead of re-running. Disk store is 16 MB-capped with oldest-first eviction; per-file outputs above 2 MB are tail-preserved with a truncation marker. New CLI: `bash-output`, `bash-history`. 2. Diff-aware re-read. `post_read` writes a per-session content snapshot (256 KB / 150 entries per session) so a re-read after a Write/Edit produces a unified diff hint instead of either a stale "already read" suggestion (suppressed by the existing edited-after-read guard) or a full file re-Read. The diff is bounded to 4 KB and only fires when the realised saving clears ~250 tokens; below that the existing hint path runs. Stats record both realised savings (`diff_hint`) and the hint's injection cost (`diff_hint_overhead`) for honest accounting. 3. TOML/YAML/JSON section extraction. `.toml`, `.yaml`, `.yml` are now indexed (no new third-party dependency — line scanners plus the stdlib tomllib are sufficient). Pretty-printed JSON gains depth-1 section detection so the existing `token-goat section` flow works on `pyproject.toml::tool.ruff`, `deploy.yaml::spec`, and `package.json::scripts` without falling back to a full Read. Wires the new hook into both Claude Code's settings.json PostToolUse and the Codex config.toml hooks block. Background worker now sweeps stale snapshot directories (24h) and re-evicts the bash-output store on startup. `reset_session` removes per-session snapshots. Tests: six new test modules cover the storage layer, hint builders, hook integration, end-to-end pre/post sequencing, CLI surface, and the new language extractors.
…rdening
Six follow-up surfaces on the bash-cache / diff-hint / config-section work
landed in the previous commit:
1. Compaction manifest: a new "Commands Run" section surfaces the most
recent meaningful Bash invocations with cmd preview, exit code, byte
size, and the cache ID (`token-goat bash-output <id>`) so the
test/build context that drives the next turn survives compaction.
`event_count` now includes `bash_history` so a session whose only
activity is a cached test run still clears `min_events`.
2. `bash-output --json` now emits `numbered_lines` (`[{lineno, text}]`
anchored to the original body) plus `total_lines`, mirroring the
surgical-read response shape so agents can follow up with positional
slicers that map back to the on-disk file.
3. Stats source buckets: `diff_hint` / `diff_hint_overhead` now land in
the existing `hint` bucket, and a new `bash` bucket (orange in the
fancy renderer) catches `bash_dedup_hint*` and `bash_output_cached`,
so the new mechanisms get a first-class line in `token-goat stats`
instead of falling into the `other` catch-all.
4. INI / CFG / .env indexer: `[name]` headers in `.ini`/`.cfg` files
become Sections (so `token-goat section setup.cfg::metadata` works);
`.env` and `.envrc` index each `KEY=value` assignment as a symbol.
Parser gains a basename-keyed dispatch table alongside the existing
suffix table — `.env` has no Path.suffix and would otherwise be
silently skipped.
5. PostToolUse Bash payload hardening: `_extract_bash_response` now
tolerates every documented shape — dict-with-named-fields (Claude
Code), MCP CallToolResult content arrays, bare-string blobs,
top-level flattening, `tool_result`/`response` aliases, `returncode`
and string-typed `exit_code` variants. Each is covered by a
dedicated test.
6. `bash_cache.evict_old_entries` removes body + sidecar pairs together
and runs an orphan-sidecar sweep at the end, so a manual `rm` of a
body or a write race can no longer leave .json metadata files
accumulating forever.
Tests: four new test modules (post-bash payload variants, INI/.env
extractor, compaction bash section, stats bucket mapping) plus
extensions to test_bash_cache.py and test_bash_cli.py. 342 targeted
tests pass; lint clean; mypy adds zero new errors over baseline.
Two flaky-on-Windows patterns in the snapshot test module:
1. `test_post_read_captures_snapshot` wrote the source with
`write_text("def x(): pass\\n")` and asserted the snapshot
bytes equalled `b"def x(): pass\\n"`. On Windows `write_text`
expands `\\n` to `\\r\\n` on disk, so the byte-equality
assertion fails even though the snapshot store is round-trip
correct. Switched to `write_bytes` and compared against
`src.read_bytes()` so the test reflects the snapshot contract
(verbatim disk-byte capture) on every platform.
2. `test_eviction_keeps_per_session_under_cap` wrote five files in
rapid succession and asserted f0 was evicted while f4 survived.
Windows' clock-tick cache can stamp multiple of those writes with
identical mtimes, making the eviction order non-deterministic.
Set explicit ascending mtimes via `os.utime` after each store
so the sort key is unambiguous.
Both fixes are pure test-side and do not change runtime behaviour.
Linux suite remains green (85 / 85 in the touched modules).
…s, auto-redirect Six surfaces land in this commit, all aligned with the existing hint/cache/extractor patterns from the previous feature rounds: 1. Post-compaction recovery hint. SessionStart now detects `source == "compact"` and emits a one-shot additionalContext block listing the most recently-read files plus the cached Bash outputs (`token-goat bash-output <id>`) and WebFetch responses (`token-goat web-output <id>`) from the *pre*-compaction session. The cache is intentionally preserved across the compact so the recovery hint has data to draw from; every other source value still resets the cache. 2. Grep dedup hint. Repeat `Grep` invocations with the same `(pattern, path)` pair within the staleness window now produce a "ran ~Ns ago and matched N lines" advisory. Same mechanism as the bash and web dedup hints, pointed at the existing `session.greps` history — no new disk store. Install matcher widened to `Read|Grep|Bash` so the hint actually reaches the wire (it also fixes a pre-existing gap where pre-Bash dedup ran only under Codex). 3. WebFetch result cache. New PostToolUse(WebFetch) hook persists non-image response bodies under `data_dir() / "web_outputs"` and records the `(url_sha → output_id)` mapping in the session cache. Pre-fetch hook dedupes repeat URLs with a hint pointing at `token-goat web-output <id>`. Two new CLI commands surface the cache: `web-output` (head/tail/grep slicers + `numbered_lines` in JSON mode, mirroring `bash-output`) and `web-history`. Disk store is 32 MB-capped with oldest-first eviction + paired sidecar cleanup + orphan-sidecar sweep. 4. Dockerfile section extractor. `Dockerfile`, `Containerfile`, and `*.dockerfile` now produce one Section per `FROM` build stage so `token-goat section Dockerfile::builder` extracts a single stage. Multi-stage builds resolve by `AS <name>` alias; unnamed stages fall back to the image reference. Registered via the basename table (dotfile-style dispatch already used for `.env`/`.envrc`). 5. `token-goat doctor` cache visibility. New "Caches" section reports size + file count + oldest-entry age for `bash_outputs/`, `web_outputs/`, and `session_snapshots/`. Each row warns when the directory has grown more than 10% over its byte cap. 6. Close-match auto-redirect on `token-goat symbol`. Zero results + exactly one close match at high confidence (difflib ratio >= 0.85) triggers a transparent re-run against the candidate. Output carries a `redirected_from` field in JSON and a `(redirected from: ...)` marker in plain-text so the substitution is auditable. `--strict` opts out. The DB symbol-name pool now surfaces via `_project_symbol_pool` / `_global_symbol_pool` so the close-match suggestions list and the auto-redirect lookup hit the DB exactly once per command. Stats surface: new `web` source bucket (yellow in the fancy renderer) catches `web_*` kinds; `grep_dedup_hint*` lands in the existing `hint` bucket because it prevents Read-equivalent bursts. Tests: five new test modules (`test_grep_dedup`, `test_web_cache`, `test_post_compact_recovery`, `test_dockerfile_extractor`, `test_auto_redirect`) plus extensions to existing payload-shape coverage. 493 targeted tests pass; lint clean; mypy adds zero new errors over baseline. Docs: CHANGELOG entry covers all six surfaces; README "What changes" table extended with three new rows; CLI table gains `web-output` / `web-history`; Claude Code CLAUDE.md, skill SKILL.md, and Codex AGENTS.md routing tables updated to mention the new commands and flags.
…CI verbose Two changes to make the next CI run easier to diagnose: 1. test_dockerfile_extractor.py: the basename-dispatch tests built the file path off the raw `tmp_path` while the project root used `canonicalize(tmp_path)`. On Windows `canonicalize` lower-cases the drive letter; `Path.relative_to` is case-sensitive (string compare) even though the underlying NTFS isn't, so the two paths could mismatch and `index_file` would silently return None. Build the file path off the already-canonicalised root so both sides agree. 2. .github/workflows/ci.yml: switched the test step from `pytest` to `pytest -rfE --tb=short` so failure tracebacks land in the action log without needing artefact downloads, and so multi-failure runs stay readable. Both changes are test/CI-only; runtime behaviour is unchanged.
CI runs on windows-2022 with Python 3.13. Failures are only visible to people who can authenticate against GitHub Actions logs; remote contributors and tools that can read PR comments but not Action logs end up blind. The new ``Surface failure summary to PR`` step: - Runs only when the prior test step failed AND the workflow was triggered by a PR event (push-only workflows are unaffected, and there's no PR to attach the comment to anyway). - Tail-trims the captured pytest output to 80 lines so even a multi-failure run fits inside GitHub's 65 KB per-comment cap. - Uses ``GITHUB_TOKEN`` (already in scope for any PR workflow) so no additional secrets are required. The Test step itself is unchanged in command and meaning; it just pipes through ``tee pytest.log`` so the on-failure step has the output to read. ``pipefail`` keeps the original exit code through the tee.
Symptom: the previous CI workflow used ``pipefail`` + ``tee pytest.log`` to capture pytest output for later PR-comment surfacing. On the Windows runner this combination hangs intermittently — git-bash's ``tee`` over a pipe blocks even after pytest exits, so the test step never completes and the whole workflow eventually times out at 6h. Fix: drop the pipe entirely. Pytest output is redirected straight to ``pytest.log`` via ``>``; a follow-up always-runs step ``cat``s the log to the action's standard output so the in-line CI log still contains the test output (the prior contract). The on-failure PR comment step keeps its body source (tail -80 of pytest.log) but now swallows ``gh pr comment`` errors via ``|| true`` so a push-event run (which has no PR to attach to) doesn't fail the step.
Two prior attempts to capture pytest output hung intermittently: 1. ``shell: bash`` + ``tee pytest.log`` — git-bash on Windows deadlocked the pipe even after pytest exited. 2. ``shell: bash`` + ``> pytest.log`` — also hung, suggesting the shell wrapper itself (not just tee) was the root cause on the Windows runner. Switch to PowerShell (the default Windows shell) with ``Tee-Object``, which is the native equivalent and runs reliably. ``$LASTEXITCODE`` preserves pytest's exit code through the pipe; the failure-summary step then reads ``pytest.log`` with ``Get-Content -Tail`` and posts the slice as a PR comment.
The previous workflow used a PowerShell ``@"..."@`` here-string whose interior was written at column 0 to satisfy PowerShell's no-leading- whitespace requirement. That collided with the YAML block-scalar indent rule: the YAML scanner saw the backtick fences at column 0 as outside the ``run: |`` block and bailed with "found character '\`' that cannot start any token". Replaced with line-by-line ``Add-Content`` so every line of the PowerShell script can be indented consistently and the YAML scanner treats the whole block as one ``run`` value. Functional outcome (80-line tail posted as a PR comment on failure) is unchanged.
Two prior attempts to capture pytest output for PR comment surfacing hung the test job on the Windows runner: 1. ``shell: bash`` + ``| tee pytest.log`` — git-bash pipe deadlocked. 2. ``Tee-Object -FilePath pytest.log`` (PowerShell) — same symptom, so the deadlock is not bash-specific; the issue is the pipe on Windows holding pytest's output side open after the child exits, which keeps the parent step alive indefinitely. Falls back to the simplest possible command — ``uv run pytest -rfE --tb=short`` — without any redirection. ``-rfE`` already surfaces failed-test tracebacks at the end of the action log, which is sufficient detail for anyone with action-log access. Remote contributors who don't have that access will still see ``FAILED <test_name>`` lines in any pasted log; that's the same level of detail the on-failure PR-comment step was meant to provide.
Symptom: the Windows test job intermittently hangs for 30+ minutes when the user's test suite reaches the global 6-hour runner timeout without the test step ever reporting completion. Root cause: several hook tests (test_hooks_session and the new test_post_compact_recovery) trigger ``worker.ensure_running()``, which spawns ``pythonw.exe -m token_goat.cli worker --daemon`` as a detached background process via ``DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP``. Those flags are honoured by Windows itself but GitHub Actions wraps every step in a Win32 job object and tracks every descendant — the daemon's infinite ``run_daemon`` loop holds the step open even after pytest exits cleanly. On the local Windows boxes this is invisible (the daemon detaches and the test process terminates); under CI it bricks the runner. Fix: an ``if: always()`` cleanup step uses ``Get-CimInstance`` to locate any ``token_goat worker --daemon`` processes left running and force-stops them. Runs after the test step regardless of pass / fail so the runner can finish the workflow promptly in either case. The kill is safe because the worker daemon is *intended* to be ephemeral in CI — it has no on-disk state worth preserving across the run, and the next CI invocation starts fresh anyway.
…daemon Suppresses the spawn step in ``worker.ensure_running()`` when the env var is set to a truthy value. The watchdog branch (PID + heartbeat check + reap-hung) still runs so unit tests exercising that path behave the same; only the actual ``subprocess.Popen`` is skipped. Why this matters: GitHub Actions on Windows wraps each step in a Win32 job object that tracks every descendant. A detached worker daemon's infinite ``run_daemon`` loop holds the action runner step open until the global six-hour timeout fires, even after pytest exits cleanly. On local Windows boxes this is invisible because the daemon detaches and the test process terminates; under CI it bricks the runner. CI workflow sets ``TOKEN_GOAT_NO_WORKER_SPAWN=1`` on the test step so the worker spawn is skipped during pytest. Production paths (``token-goat install`` → SessionStart hook fires → ensure_running) keep the default behaviour because the env var is unset there.
Three textual conflicts plus one semantic one between this branch's feature work and the bash-output-compression feature that landed on ``main`` while this PR was open. Resolutions: * **install.py** PreToolUse matcher: ``main`` widened the matcher to ``Read|Bash`` so the bash-compress wrapper can intercept noisy commands; this branch widened it to ``Read|Grep|Bash`` for the Grep dedup hint. Final value: ``Read|Grep|Bash`` with a comment explaining both reasons. * **CHANGELOG.md**: kept both ``[Unreleased]`` blocks — compression first, then this branch's eleven surfaces, then the shared ``Changed`` / ``Fixed`` sections. * **README.md** "What changes" table: kept both sets of rows (compression filters first, then the dedup / recovery / auto-redirect / structured-config rows from this branch). * **hooks_read.py / cli.py**: auto-merged. The resulting ``pre_read`` dispatch is bash-dedup → bash-read-equivalent → bash-compress → CONTINUE for Bash, grep-dedup → CONTINUE for Grep, and the existing Read / Glob branches untouched. This ordering is intentional: a session-cache hit is the cheapest signal and short-circuits before we ever look at the command shape. * **test_bash_dedup_hint.py**: the negative-assertion cases used ``pytest -v src/`` and ``make build`` which are now compressible commands — the bash-compress wrapper fires before pre_read's fall-through, so the ``hookSpecificOutput not in result`` assertion broke. Switched to ``find`` / ``echo`` / ``head`` which are deliberately not in main's filter list, so the only reason for ``pre_read`` to emit a hookSpecificOutput is a real dedup hit. Comment in the test module docstring records the invariant. 636 tests pass; ruff clean; mypy adds zero new errors over baseline.
Earlier this round the check landed first in ``worker.ensure_running`` (broke 2 spawn-respawn unit tests via the mocked ``spawn_detached``), then in ``spawn_detached`` itself (broke 3 tests that test that function's body via mocked ``Popen``), then in ``worker_daemon.run_daemon`` (broke 9 tests that drive ``run_daemon`` directly to verify its main loop). Each tier of unit tests bypasses one more layer of the spawn chain. The clean break: put the check in ``cli.cmd_worker`` instead. This is the function the subprocess command ``pythonw -m token_goat.cli worker --daemon`` actually invokes when the spawn chain runs end-to-end. Direct unit tests of ``ensure_running``, ``spawn_detached``, and ``run_daemon`` all skip this entry point — they call the lower-level functions directly — so they remain unaffected. Only the real-spawn path (``ensure_running`` → ``spawn_detached`` → ``Popen`` → child process loads ``cli.py`` → ``cmd_worker`` → early-exit) sees the env var. Env-var propagation: ``subprocess.Popen`` inherits the parent's env by default, so a test step that sets ``TOKEN_GOAT_NO_WORKER_SPAWN=1`` on the workflow step propagates the var to every child process spawned from it. The child ``pythonw -m token_goat.cli worker --daemon`` loads, reaches ``cmd_worker``, sees the var, and returns immediately without entering the heartbeat loop. GitHub Actions on Windows can then close the job object's descendants cleanly and the step completes. 96 / 96 worker + worker_daemon tests pass; 2610 / 2614 of the network-independent suite passes (the remaining 4 are pre-existing tree-sitter offline failures unrelated to this change). Lint clean; mypy adds zero new errors over baseline.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds eleven distinct token-saving surfaces across three rounds of work, all aligned with the existing hook/cache/extractor architecture. Every surface ships with tests, docs, and CHANGELOG entries; lint clean, mypy adds zero new errors over baseline, 493 / 493 + 3 skipped local tests pass.
Round 1 — three large optimisation surfaces
Bash output interception. A new
PostToolUse(Bash)hook persists large stdout/stderr to disk underdata_dir() / "bash_outputs"and records the command in the session cache. On a repeat invocation the pre-Bash hint suggeststoken-goat bash-output <id>(with--head N,--tail N,--grep PATTERNslicers) instead of re-running. Disk store is 16 MB-capped with oldest-first eviction; outputs above 2 MB are tail-preserved with a truncation marker. New CLI:bash-output,bash-history.Diff-aware re-read.
post_readwrites a per-session content snapshot (256 KB / 150 entries per session cap) so a follow-upReadafter aWrite/Edit/MultiEditis answered with a unified diff hint instead of a stale "already read" suggestion or a full file re-Read. The diff is bounded to 4 KB and only fires when the realised saving clears ~250 tokens; below that the existing session-cache hint path runs. Stats record both realised savings (diff_hint) and the hint's injection cost (diff_hint_overhead) for honest accounting.TOML / YAML / JSON / INI / .env / Dockerfile section extraction.
token-goat section pyproject.toml::tool.ruff(and equivalents for.yaml/.yml/.json/.ini/.cfg/.env/.envrc/Dockerfile/Containerfile) now extract a single table/key block instead of forcing a full-file read. Line-scanner implementations only — no new third-party dependencies. Parser dispatcher gained a basename-keyed table so dotfiles with emptyPath.suffix(.env,.envrc,Dockerfile) resolve correctly.Round 2 — polish + hardening
Compaction manifest "Commands Run" section. The PreCompact manifest surfaces the most recent meaningful Bash invocations (cmd preview · exit code · byte size · cache ID) so the test/build context that drives the next agent turn survives compaction.
event_countnow includesbash_historyso a session whose only activity is a cached test run still clearsmin_events.bash-output --jsonline numbers. JSON shape gainednumbered_lines: [{lineno, text}](1-based, anchored to the original body) plustotal_lines, mirroring the surgical-read response shape. Agents can--head/--tail/--grepfilter and still map back to positions in the original output.Hardened PostToolUse Bash payload extraction.
_extract_bash_responsenow tolerates every documented Bash result shape — dict-with-named-fields (Claude Code), MCPCallToolResultcontent arrays, bare-string blobs, top-level flattening (notool_responsewrapper),tool_result/responsealiases,returncodeand string-typedexit_codevariants. Each variant covered by a dedicated regression test.Sidecar coupling.
bash_cache.evict_old_entriesremoves body + sidecar pairs together and runs an orphan-sidecar sweep at the end, so a manualrmof a body or a write race no longer leaves.jsonmetadata files accumulating forever.Round 3 — six follow-up surfaces
Post-compaction recovery hint.
SessionStartdetectssource == "compact"and emits a one-shotadditionalContextblock listing the most recently-read files plus the cached Bash outputs (token-goat bash-output <id>) and WebFetch responses (token-goat web-output <id>) from the pre-compaction session. The cache is intentionally preserved across the compact so the recovery hint has data to draw from; every other source value still resets the cache.Grep dedup hint. Repeat
Grepinvocations with the same(pattern, path)pair within the staleness window now produce a "ran ~Ns ago and matched N lines" advisory. Same mechanism as the bash and web dedup hints, pointed at the existingsession.grepshistory — no new disk store. Install matcher widened toRead|Grep|Bashso the hint actually reaches the wire (also fixes a pre-existing gap where pre-Bash dedup ran only under Codex).WebFetch result cache. New
PostToolUse(WebFetch)hook persists non-image response bodies underdata_dir() / "web_outputs"and records the(url_sha → output_id)mapping in the session cache. Pre-fetch hook dedupes repeat URLs with a hint pointing attoken-goat web-output <id>. Two new CLI commands surface the cache:web-output(head/tail/grep slicers +numbered_linesin JSON mode) andweb-history. Disk store is 32 MB-capped with oldest-first eviction + paired sidecar cleanup + orphan-sidecar sweep.Doctor cache visibility + close-match auto-redirect.
token-goat doctorgains a "Caches" section reporting size + file count + oldest-entry age forbash_outputs/,web_outputs/, andsession_snapshots/.token-goat symbolautomatically retries with the unambiguous close-match candidate when difflib ratio ≥ 0.85 and there's exactly one such candidate; output carries aredirected_fromfield in JSON and a(redirected from: ...)marker in plain-text so the substitution is auditable.--strictopts out.Infrastructure
websource bucket (yellow in the fancy renderer) catchesweb_*kinds;grep_dedup_hint*lands in the existinghintbucket.cleanup_on_startupnow sweeps stale snapshot directories (24h) and re-evicts the bash + web caches.reset_sessionremoves per-session snapshots.~/.codex/config.tomlBash matcher routed to the newpost-bashhook (waspost-readwhich had no Bash branch).worker.ensure_running()honoursTOKEN_GOAT_NO_WORKER_SPAWN=1to skip the detached-daemon spawn under CI — the daemon's infinite loop otherwise holds the GitHub Actions Windows step open until the global 6-hour timeout fires because the runner tracks every descendant via a Win32 job object.Test plan
ruff check src/ tests/— cleanmypy src/— 9 pre-existing Windows-onlywinregerrors; no new errorspytest -q -s -p no:cacheprovideracross the 30 touched test modules — 493 passed, 3 skippedtest_grep_dedup,test_web_cache,test_post_compact_recovery,test_dockerfile_extractor,test_auto_redirect) plus seven other new modules (test_bash_cache,test_bash_cli,test_bash_dedup_hint,test_post_bash_payloads,test_snapshots,test_diff_hint_integration,test_languages_config,test_ini_extractor,test_stats_buckets,test_compact_bash)Docs
CHANGELOG.md[Unreleased]covers all eleven surfacesREADME.md"What changes" table extended with five new rows; CLI table gainsweb-output,web-history,bash-output,bash-history~/.claude/CLAUDE.md,~/.claude/skills/token-goat/SKILL.md, and~/.codex/AGENTS.mdrouting tables (written bytoken-goat install) updated to mention the new commands and flags