Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 33 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,36 @@ jobs:
- run: uv python install 3.13
- run: uv sync --all-extras
- name: Test
run: uv run pytest
# ``-rfE`` prints short tracebacks for failures and errors at the
# end of the run so a CI failure surfaces the actual assertion or
# exception in the log without needing to download artefacts.
# ``--tb=short`` keeps each entry compact (one line per frame) so
# the summary stays readable even when several tests fail at once.
# ``TOKEN_GOAT_NO_WORKER_SPAWN=1`` suppresses the detached worker
# daemon spawn from ``worker.ensure_running()``. Under GitHub
# Actions on Windows the runner tracks every descendant via a
# Win32 job object, so the daemon's infinite loop holds the
# test step open until the global six-hour timeout fires. Tests
# still exercise the watchdog branch by reading the env var; only
# the actual ``subprocess.Popen`` is skipped.
env:
TOKEN_GOAT_NO_WORKER_SPAWN: "1"
run: uv run pytest -rfE --tb=short

- name: Kill leftover detached worker daemons
# Several hook tests trigger ``worker.ensure_running()`` which
# spawns ``pythonw.exe -m token_goat.cli worker --daemon`` as a
# detached background process. ``DETACHED_PROCESS`` is honoured
# by Windows itself but GitHub Actions' Windows runner uses a
# Win32 job object to track every descendant; the daemon's
# infinite loop would otherwise hold the step open until the
# global six-hour timeout. This always-runs step terminates
# any leftover daemon so the runner can finish promptly.
if: always()
run: |
Get-CimInstance Win32_Process |
Where-Object { $_.CommandLine -like '*token_goat*worker*--daemon*' } |
ForEach-Object {
Write-Host "killing leftover worker pid=$($_.ProcessId)"
Stop-Process -Id $_.ProcessId -Force -ErrorAction SilentlyContinue
}
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,27 @@ All notable changes to Token-Goat are documented in this file. Format follows Ke
### Added

- **Bash output compression.** PreToolUse hook on Bash detects compressible commands and rewrites them to flow through `token-goat compress`, which runs the original through the system shell, captures stdout + stderr, applies a per-tool filter, and prints a compressed view that surfaces failures first. Twelve filters cover the noisiest dev commands: `pytest`, `jest` / `vitest`, `cargo`, `npm` / `pnpm` / `yarn` / `bun`, `docker` / `buildah` / `podman`, `kubectl` / `helm`, `aws`, `ruff` / `eslint` / `mypy` / `pyright` / `pylint` / `stylelint` / `biome` / `tsc`, `git`, `make` / `ninja` / `gradle` / `mvn` / `bazel` / `go`, `terraform` / `tofu`, `pip` / `pipx`. Typical savings: pytest 80-97%, npm 88%, docker 75%, linters 80%. Each filter strips ANSI, collapses `\r` progress bars, dedupes consecutive lines, groups linter issues by rule (3 examples per code), keeps every error and warning block verbatim, and caps total output at 1000 lines / 64 KiB. The wrapper preserves the original exit code, kills the process group on timeout (SIGTERM then SIGKILL after a grace period on POSIX), and caps each stream capture at 32 MiB. Configurable via `[bash_compress]` in config.toml (`enabled`, `disabled_filters`, `max_lines`, `max_bytes`, `timeout_seconds`) or disabled with `TOKEN_GOAT_BASH_COMPRESS=0`. Savings are recorded per filter as `bash_compress:<name>`. New CLI subcommand `token-goat compress` for previewing compression on any command.
- **Post-compaction recovery hint.** ``SessionStart`` now detects ``source == "compact"`` and emits a one-shot ``additionalContext`` block listing the most recently-read files, cached Bash outputs (``token-goat bash-output <id>``), and cached WebFetch responses (``token-goat web-output <id>``) from the *pre*-compaction session. The cache is intentionally preserved across the compact so the recovery hint has data to draw from; the cache reset still fires on every other source value (startup / resume / clear / unknown). When the prior session was empty, no hint is emitted — the recovery path is silent until it has something worth surfacing.
- **Grep dedup hint.** A repeat ``Grep`` invocation with the same ``(pattern, path)`` pair within the staleness window now produces a ``"this ran ~Ns ago and matched N lines"`` advisory. Same mechanism as the bash and web dedup hints but pointed at the existing ``session.greps`` history — no new disk store is involved. Suppressed when the prior result was below 50 matches (the hint preamble would approach the saving).
- **WebFetch result cache.** A new ``PostToolUse(WebFetch)`` hook persists non-image response bodies to ``data_dir() / "web_outputs"`` and records the ``(url_sha → output_id)`` mapping in the session cache. On a repeat fetch of the same URL the pre-fetch hook emits a dedup hint pointing at ``token-goat web-output <id>``, mirroring the bash-cache pattern. Two new CLI commands surface the cache: ``token-goat web-output`` (with the same ``--head`` / ``--tail`` / ``--grep`` slicers as ``bash-output``, plus ``numbered_lines`` in JSON mode) and ``token-goat web-history``. Disk store is byte-capped (32 MB default) with oldest-first eviction + paired sidecar cleanup.
- **Dockerfile section extractor.** ``Dockerfile``, ``Containerfile``, and ``*.dockerfile`` now produce one ``Section`` per ``FROM`` build stage, so ``token-goat section Dockerfile::builder`` extracts a single stage instead of forcing a full-file read. Multi-stage builds resolve by ``AS <name>`` alias when present; unnamed stages fall back to the image reference so they remain addressable.
- **Pre-Grep matcher + pre-Bash matcher in install.** ``PreToolUse`` now fires on ``Read|Grep|Bash`` (matcher widened from the prior ``Read|Bash``) so the new Grep dedup hint actually runs alongside the Bash compression rewriter from the prior entry.
- **``token-goat doctor`` cache visibility.** A new ``Caches`` section reports the size, file count, and oldest-entry age for ``bash_outputs/``, ``web_outputs/``, and ``session_snapshots/``. Each row warns when the directory has grown more than 10% over its byte cap, surfacing potential eviction gaps without needing to grep the data directory by hand.
- **Close-match auto-redirect on ``token-goat symbol``.** When a symbol query returns zero results and the project has exactly one close-match candidate at high confidence (difflib ratio ≥ 0.85), the lookup is automatically re-run against that candidate. The redirected response carries a ``redirected_from`` field in JSON output and a ``(redirected from: …)`` marker in plain-text output so the substitution is auditable. Pass ``--strict`` to disable the redirect and get the previous "Did you mean: …?" suggestion list behaviour.
- **``bash`` and ``web`` source buckets in stats.** ``token-goat stats`` now attributes ``bash_*`` kinds to a visible ``bash`` bucket (orange in the fancy renderer) and ``web_*`` kinds to a new ``web`` bucket (yellow), so the new mechanisms get first-class lines in the by-source panel instead of falling into the ``other`` catch-all. ``grep_dedup_hint`` lands in the existing ``hint`` bucket because it prevents a Read-equivalent burst (consistent with ``diff_hint``).
- **Bash output interception.** A new `PostToolUse(Bash)` hook persists large stdout/stderr to disk under `data_dir() / "bash_outputs"` and records the command in the session cache. When the same command is about to run again in the same session, the pre-Bash hint suggests `token-goat bash-output <id>` (optionally with `--head N`, `--tail N`, or `--grep PATTERN`) instead of re-executing — avoiding both runtime cost and duplicated tokens. The store is byte-capped (16 MB default) with oldest-first eviction; outputs above 2 MB are tail-preserved with a truncation marker. Two new CLI commands surface the cache: `token-goat bash-output` retrieves a sliced view, `token-goat bash-history` lists cached entries newest-first.
- **Diff-aware re-read.** `post_read` now writes a per-session content snapshot (under `data_dir() / "session_snapshots"`, capped at 256 KB per file and 150 snapshots per session) so a follow-up `Read` after a `Write`/`Edit`/`MultiEdit` can be answered with a unified diff hint instead of a `pre_read` blocking message that silently allowed the full re-read. The diff is bounded to 4 KB and only fires when the realised saving exceeds ~250 tokens; below that the existing session-cache hint path runs unchanged. Stats record both the realised saving (`diff_hint`) and the hint's injection cost (`diff_hint_overhead`) for honest accounting.
- **TOML, YAML, JSON, INI, CFG, and dotenv section extraction.** `token-goat section pyproject.toml::tool.ruff` (and equivalents for `.yaml`, `.yml`, `.json`, `.ini`, `.cfg`, `.env`, and `.envrc`) now extract a single table/key block instead of forcing a full-file read. The TOML scanner emits one `Section` per `[table]` and `[[array]]` header; the YAML scanner emits top-level keys plus one nested layer (`spec.replicas`-style) computed from the file's detected indent; JSON gains depth-1 section detection on pretty-printed files; INI/CFG indexes one section per `[name]` header; `.env`/`.envrc` index each `KEY=value` assignment as a symbol. None of the six pulls in an extra dependency — all use line-scanners and the existing stdlib parsers. The parser dispatcher gained a basename-keyed table (alongside the existing suffix table) so dotfiles with empty extensions (`.env`, `.envrc`) resolve correctly.
- **Stale-data sweeps in the background worker.** `cleanup_on_startup` now also drops snapshot directories older than 24 hours and enforces the bash-output byte cap, so a long-lived install does not accumulate per-session debris.
- **Compaction manifest gained a "Commands Run" section.** The PreCompact manifest now surfaces the most recent meaningful Bash invocations (cmd preview, exit code, byte size, cache ID) so the test/build context that drives the next agent turn survives compaction. Each entry includes the `token-goat bash-output <id>` cache key for surgical recall. `event_count` includes `bash_history` so a session whose only activity is a cached test run still clears the `min_events` threshold.
- **`token-goat bash-output --json` now surfaces line numbers.** The JSON shape adds `numbered_lines` (a 1-based, original-body-anchored `[{lineno, text}]` list) and `total_lines`, mirroring the surgical-read response shape elsewhere in the codebase. Agents can now `--head` / `--tail` / `--grep` filter and still map back to positions in the original output.
- **Hardened PostToolUse Bash payload extraction.** `_extract_bash_response` now tolerates every documented Bash result shape: dict-with-named-fields (Claude Code), MCP `CallToolResult` content arrays, bare-string blobs, top-level flattening (no `tool_response` wrapper), `tool_result`/`response` aliases, `returncode` and string-typed `exit_code` variants. Each shape is covered by a dedicated regression test in `test_post_bash_payloads.py`.

### Changed

- **`reset_session`** now also removes per-session content snapshots, matching the existing JSON-cache reset semantics.
- **Codex Bash matcher in `~/.codex/config.toml`** now points at the new `post-bash` hook instead of `post-read`; under Codex, `post-read` previously did nothing for `Bash` calls (no branch in the handler), so this is a strict gain.
- **`bash_cache.evict_old_entries`** removes body + sidecar pairs together, and runs a second pass to sweep any orphan sidecars left over from out-of-band deletion. Previously, manual `rm` of a body file or a write race could leave a `.json` sidecar with no matching body that lived forever.

### Fixed

Expand Down
13 changes: 12 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,19 @@ Each one is preventable. Token-Goat intercepts all four, automatically.
|--------------------|-----------------|
| 3.3 MB screenshot lands in model context | 84 KB compressed copy, 97.4% smaller |
| Agent re-reads files from earlier in the session | "Already read this" reminder with narrow slice suggestion |
| Agent re-reads a file edited mid-session | Unified diff injected as a hint — full Read avoided when the diff covers the change |
| Compaction forgets which files were edited | Structured session manifest injected before compact |
| Same files re-read from scratch after `/compact` | Recovery hint at SessionStart lists cached snapshot + bash + WebFetch IDs |
| Full file read for one function or section | `token-goat read file::symbol`, about 85% smaller |
| `pytest` dumps 150 PASSED lines + dots + tracebacks | Failures-first view, 80 to 97% smaller |
| `npm install` floods deprecation warnings + spinner | Errors kept; warnings collapsed by package, ~90% smaller |
| `docker build` emits sha256 digests + transfer progress | Step headers + errors kept; noise dropped, ~75% smaller |
| `ruff` / `eslint` / `mypy` repeat the same rule 50 times | Grouped by rule with first 3 examples, ~80% smaller |
| Same `pytest` / `cargo` / `git log` re-run mid-session | Pre-Bash dedup hint points at the cached output (`token-goat bash-output <id>`) |
| Same `Grep` pattern re-run with hundreds of matches | Pre-Grep dedup hint quotes the prior match count |
| Same docs URL fetched twice | Pre-WebFetch dedup hint points at the cached body (`token-goat web-output <id>`) |
| `token-goat section pyproject.toml::tool.ruff` | One TOML table extracted instead of the whole config; same for `.yaml`/`.yml`/`.json`/`.ini`/`.cfg`/`.env`/`Dockerfile` |
| Typoed `token-goat symbol getUserr` | Auto-redirects to the unambiguous close match (use `--strict` to opt out) |

> Four hours of use on the author's machine: **59.7 MB** of data that never hit the model, with an estimated **11.5 million tokens** avoided.

Expand Down Expand Up @@ -215,7 +222,11 @@ The `--openclaw` flag patches Claude Code and drops a TypeScript bridge plugin i
| `token-goat semantic "<query>"` | Find code by meaning, not by filename. Tune with `--max-distance <float>` or `--no-rerank`. |
| `token-goat map` | Get a compact orientation of the repo. Add `--compact` to fit a 300-token budget. |
| `token-goat gdrive-sections <file-id>` | List the heading outline of a Google Doc without fetching the body. |
| `token-goat stats` | See how many tokens you have saved. Shows a per-source breakdown (image / hint / read / compact). |
| `token-goat stats` | See how many tokens you have saved. Shows a per-source breakdown (image / hint / read / compact / bash / web). |
| `token-goat bash-output <id>` | Retrieve a cached Bash output by ID. Filter with `--head N`, `--tail N`, or `--grep PATTERN` to avoid re-running the command. |
| `token-goat bash-history` | List cached Bash outputs (newest first) with their IDs, byte sizes, and exit codes. |
| `token-goat web-output <id>` | Retrieve a cached WebFetch response body by ID with the same `--head`/`--tail`/`--grep` slicers. |
| `token-goat web-history` | List cached WebFetch responses (newest first) with their IDs, byte sizes, status codes, and URL previews. |
| `token-goat compact-hint --session-id <id>` | Inspect the compaction manifest for a session |
| `token-goat install` | Wire up hooks and autostart. `--dry-run` previews the changes, `--verify` audits an existing install. |
| `token-goat doctor` | Confirm everything is wired correctly |
Expand Down
Loading
Loading