feat(sbx): Docker Sandboxes mixin kit (v1, shim tier)#303
Conversation
Brainstormed design for the sub-plan that closes the Phase 1 Simple Query loop on top of 04b₂'s upstream wiring. Settles single-driver half-duplex flow, per-frame demux for result counters, RFQ-byte deny gating, parse-all-before-forward semantics, the §8 DBEvent schema extension, redaction-invariant statement_digest, and the real-pgx spine integration test scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 tasks: Normalize on Parser, source spans on ClassifiedStatement, DBEvent §8 sub-structs, Server scaffolding (MaxQueryBytes, atomic policy, per-dialect classifier map), connState extensions + RFQ-byte capture, simpleQueryLoop scaffold, frame budget cap, upstreamread demux + counters, deny synth helpers, eventbuilder with redaction + digest + sibling tagging, allow/deny handleQuery paths, handshake wiring + approve config-load warning, real-pgx spine test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thread RawStmt.StmtLocation + StmtLen through classifyWithBackend to populate the SourceStart / SourceEnd fields added in Task 1. Handles pg_query's behavior of returning StmtLen=0 for trailing statements (use end-of-input boundary). Skip leading whitespace to get actual statement boundaries (needed for redaction). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Normalize(sql) (string, error) method to the Parser interface, with: - normalize_linux.go: CGO backend using pg_query.Normalize() - normalize_other.go: WASM fallback using pgquery_wasm.Normalize() The Normalize method returns SQL with all literal values replaced by $N placeholders, for use in statement_digest and parameters_redacted tiers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add EventTLS, EventDecision, EventResult, EventTxContext, EventPredicates struct types per spec §8. Extend DBEvent with these five fields (tls, decision, result, tx_context, predicates). Supports JSON round-trip with nullable integer fields (RowsReturned, RowsAffected). Add two new tests: TestDBEvent_Extended_RoundTrip validates round-trip of all new fields; TestDBEvent_Extended_RowsNull verifies null serialization of pointer fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Config.MaxQueryBytes: caps 'Q' frame body; defaults to 1 MiB when zero
(applied in both the sentinel and normal New() paths).
- classifiers.go: buildClassifierMap constructs one Parser per distinct
dialect; New() rejects unknown dialect strings at construction time.
- Server.policyPtr (atomic.Pointer[RuleSet]) + SetPolicy/policy methods
enable hot-swap policy updates without lock contention.
- Fixes TestServer_StartTwice_ReturnsError: added missing Dialect field
("postgres") so it survives the new dialect validation gate.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Design for a `kind: mixin` kit hosted at docker/sbx-kit/ that installs AgentSH into any Docker Sandbox at creation and routes the agent's command-level activity through a coding-agent-tuned policy. v1 ships the shim tier only; LD_PRELOAD and ptrace tiers are parked behind a forward-compatible tier label written to /run/agentsh/tier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Realign §6/§7/§9/§10/§11 against actual codebase: the daemon is `agentsh server`, not `serve`; no `--user-config` flag exists so the bootstrap merges baked template + user override into /etc/agentsh/policies/default.yaml on each start; package paths match nfpms conventions (/usr/bin, /usr/lib/agentsh/shims, /usr/share/agentsh, /usr/share/doc/agentsh). curl|sh redirect downgraded to audit-only because agentsh-fetch doesn't exist yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11-task TDD plan implementing the spec at docs/superpowers/specs/2026-05-11-docker-sandboxes-mixin-kit-design.md. Tasks 1-2 land the coding-agent policy + merge helper; tasks 3-5 build out cmd/agentsh-sbx-bootstrap step-by-step (merge → daemon spawn → tier probe); task 6 packages the new artifacts via .goreleaser.yml; tasks 7-9 ship the policy reference, install.sh, and the kit tree itself; tasks 10-11 wire release publishing and gate on end-to-end build + manual sandbox matrix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…in kit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…package-caches Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix 1: move allow-package-caches before allow-home so the narrower rule is reachable (first-match-wins; allow-home /home/** was shadowing it) - Fix 2: remove dead audit-curl-pipe-to-shell command rule; the shellc-opaque-script layer already blocks curl|sh; add comment noting a v1.1 agentsh-fetch redirect will replace it - Fix 3: add @job to deny-signal-agentsh signals list so SIGSTOP cannot be used to pause the daemon unmonitored - Fix 4: rename allow-package-installers to audit-package-installers and change decision from allow to audit (valid decision per pkg/types; engine handles it in CheckCommand) - Fix 5: add TestAgentPolicies_CodingAgent to anchor the coding-agent template shape with floor assertions (>=9 file rules, >=2 cmd rules, >=3 signal rules) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…le formatting - Update audit-package-installers description from stale "allow with audit" wording to "audit-log all package manager invocations" to match its actual decision - Restore the full two-line doc comment for loadAgentDefaultEngine (first line was swallowed by the TestAgentPolicies_CodingAgent insertion block, leaving an orphaned continuation line above the function) - Remove extra blank line between TestAgentPolicies_CodingAgent and loadAgentDefaultEngine Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the base+overlay merge semantics for cmd/agentsh-sbx-bootstrap: overlay wins on name collision (replacement in-place), unknown overlay rules append in declared order. Covers FileRules, NetworkRules, CommandRules, UnixRules, and SignalRules. Base metadata (Version, Name, Description, ResourceLimits, EnvPolicy, Audit) is always preserved. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- MergeOverlay now merges DnsRedirectRules and ConnectRedirectRules by name (same pattern as FileRules/NetworkRules/CommandRules/UnixRules/ SignalRules); previously overlay-provided redirect rules were silently discarded. - Rewrote the MergeOverlay doc comment to enumerate all non-rule fields preserved from base, call out the shallow-copy/aliasing trap, and list which rule kinds are merged vs. base-wins. - Extended TestMergeOverlay_PreservesAllRuleKinds to cover UnixRules, DnsRedirectRules, and ConnectRedirectRules. - Added TestMergeOverlay_EmptyNameOverlayAppends to exercise the anonymous-rule append path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the bootstrap binary's startup policy-merge phase: reads the baked coding-agent template, optionally overlays /home/agent/.agentsh/ policy.yaml via policy.MergeOverlay, and atomically writes the result to /etc/agentsh/policies/default.yaml. Missing/unparseable overlay is non-fatal (logs to stderr, falls back to template); missing template is fatal (exits 1). Five TDD tests cover the no-overlay, with-overlay, bad-overlay fallback, missing-template error, and atomic-write paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ntion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ror surfacing - Add --daemon-log flag; pass *daemonLog to spawnDaemon instead of hardcoded defaultDaemonLog (Fix 1). - Add clarifying comment on defaultBootstrapLog reserving the path for Task 5 / installers (Fix 1). - Replace goroutine logF.Close() with synchronous close after cmd.Start (Fix 2). - Quote sock path in fake-daemon test script to handle spaces (Fix 3). - Surface non-ENOENT os.Stat errors in waitForSocket with a wrapped message instead of silently retrying (Fix 4). - Add TestWaitForSocket_NonExistError to cover the new fast-fail branch via ENOTDIR (Fix 5). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…be test Replace direct *exec.ExitError type assertion with errors.As, move defaultShimDir to the package-level const block with the other defaults, and make TestProbeShimTier_RejectsRealCurl fatal on any error that isn't a clean exit-1 (curl not found) so real probe failures can't be masked. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Detects dpkg/rpm/apk, resolves the latest release tag via the GitHub API (sed-only, no jq), and downloads the GoReleaser-produced artifact. Alpine (apk) falls back to the tar.gz archive since GoReleaser nfpms produces no .apk. AGENTSH_DRY_RUN=1 prints all actions without executing them; smoke-tested by install-agentsh_test.sh (shellcheck clean). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ch to real .apk GoReleaser nfpms now includes apk format so Alpine Linux gets a proper package with all mixin-kit artifacts (/usr/share/agentsh/coding-agent.template.yaml, /usr/lib/agentsh/shims/*, /etc/agentsh/config.yaml, etc.). The install script's apk branch is reverted from the tar.gz workaround to downloading the real .apk via `apk add --allow-untrusted`; the test assertion is updated to match the .apk URL pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
spec.yaml, initFiles, startup, SKILL.md, policy override stub, coding-agent-smoke.sh, and a Go structural test for spec.yaml. All 6 spec_test.go functions pass; go test ./... -short clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds docker/sbx-kit/tests/run-e2e.sh — exercises the kit's mechanics against the public docker/sandbox-templates:shell-docker image with no sbx CLI required. Builds binaries on the host, mounts them into the sandbox-template container, simulates `sbx run --kit` install layout (binaries, policy template, shim symlinks, profile.d, environment.d, files/ tree, user override), runs agentsh-sbx-bootstrap, and verifies: tier file = shim; curl resolves under /usr/lib/agentsh/shims/; merged policy contains baked rule + appended override + replace-by-name paths; SKILL.md and override stub present. Also fixes a real bug surfaced by the E2E: probeShimTier used `/bin/sh -c '. /etc/profile.d/agentsh.sh 2>/dev/null || true; ...'`, which aborts before reaching `|| true` because bash-as-/bin/sh runs in POSIX mode where errors in the special builtin `.` exit the shell. Switched to `[ -r ... ] && . ...` which is portable across bash-POSIX, dash, and busybox sh. Without this fix every real sandbox would have recorded tier=none. `make sbx-e2e` runs the harness. Documented in docker/sbx-kit/README.md with the explicit list of what it does/doesn't verify (in-sandbox enforcement remains gated on a real `sbx run` against a tagged release). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Added a container-simulated E2E test ( It boots a real Surfaced a real bug while writing it: Out of scope for this E2E (still gated on a real 🤖 Generated with Claude Code |
Adds a follow-on design that turns the Docker Sandboxes mixin kit from "AgentSH alongside the agent" into "AgentSH owns the agent's lifecycle": wrapper symlinks at /usr/local/bin/<agent> route claude/opencode/gemini/ codex/cursor launches through `agentsh wrap`, giving full exec-pipeline interception, a coherent session, and a session report. Fail-CLOSED deviation from the parent spec §7: if agentsh wrap cannot engage cleanly (binary missing, tier != shim, etc.), the wrapper exits non-zero and refuses to launch the agent. Operators choosing this kit choose enforcement-mandatory semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7-task TDD plan implementing the spec at docs/superpowers/specs/2026-05-11-sbx-agent-wrap.md. Tasks 1-2 land the wrapper + installer shell scripts with shell-driven tests (5+6 cases respectively, FAKE_ROOT test hook). Task 3 packages them via .goreleaser.yml. Task 4 wires the installer into the kit's spec.yaml install step. Task 5 extends docker/sbx-kit/tests/run-e2e.sh to assert wrap engages end-to-end with a fake agentsh binary. Task 6 updates the Go structural test for the new two-entry install block. Task 7 documents fail-closed semantics + the limitation that absolute-path entrypoints bypass the wrapper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…osed) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- FAKE_ROOT is now only honored when AGENTSH_TEST=1 is also set, closing a path-substitution surface for sandboxed agents with env-var control. - Fix 2: add comment above command -v explaining fail-closed behaviour for shell-function-named agentsh. - Fix 3: run_wrap_no_agentsh uses an empty tempdir for PATH instead of /usr/bin:/bin, so the test is not sensitive to host agentsh installation. - Fix 4: stderr message for missing real binary now includes the "refusing to launch $name" tail for grep-friendliness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… on install Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…; strengthen idempotency - Fix 1: three-way check in installer — silently skip symlinks that already point at WRAP (idempotent re-run); only warn for genuinely conflicting entries (regular file or symlink pointing elsewhere). - Fix 2: Test 4b — foreign-symlink conflict scenario; verifies pre-existing symlink pointing to /opt/vendor/bin/claude is left untouched with warning. - Fix 3: Test 6 uses capture_state() (filename + readlink target) instead of find|sort; also asserts second-run output is completely silent. - Fix 4: FAKE_ROOT absolute-path hazard documented in header comment block. - Fix 5: "wrapped $agent" success message routed to stderr for stream consistency with all other installer messages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add two nfpms.contents entries so the W1/W2 auto-wrap harness scripts ship in every .deb/.rpm/.apk at /usr/lib/agentsh/ with mode 0755. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Check 8 side-loads agent-wrap.sh and install-agent-wrappers.sh into the container at the production layout, puts a fake agentsh on PATH that prints a recognizable marker, installs a fake /usr/bin/claude, runs the installer, and asserts the wrap chain fires (with args preserved) when claude is invoked from a login shell. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Added auto-wrap of the agent harness via After this PR, the kit doesn't just install AgentSH alongside the agent — it owns the agent's lifecycle. After Fail-CLOSED deviation from parent spec §7 — explicit and documented. If the wrapper runs and AgentSH can't engage (binary missing, tier ≠ shim, tier file missing), it exits non-zero and refuses to launch the agent. This kit's purpose is enforcement; running unenforced is not a supported state. The parent spec's "never bricks the sandbox" still governs the bootstrap; this section governs agent launch time. What ships:
Verification (worktree local):
Out of scope per spec §3: no env-var opt-in (auto-wrap is the default), no manual 🤖 Generated with Claude Code |
v1 assumed /usr/local/bin precedes real agent install locations in PATH.
Probing docker/sandbox-templates:opencode revealed opencode lives at
/usr/local/share/npm-global/bin/opencode, which precedes /usr/local/bin
in PATH. The v1 wrapper would never have fired against any real agent kit.
v2 switches to move-aside-and-replace at the discovered binary location:
discover via command -v, rename to .real, drop a symlink to agent-wrap
at the original path. Wrapper now derives the real binary from \${0}.real
instead of a fixed /usr/bin/<name>.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch real-binary resolution from the v1 fixed-path /usr/bin/<name> to
${0}.real so the wrapper works regardless of where the agent binary lives
(e.g. /usr/local/share/npm-global/bin/opencode). FAKE_ROOT test hook is
kept but now only gates the tier_file path, not the real-binary path.
Tests rewritten to place the fake binary at <symlink>.real.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-v, move-aside Replace v1 fixed-path probe+symlink design with v2 move-aside-and-replace: - Discover each agent via `command -v` (scoped through _AGENT_PATH / FAKE_TEST_PATH) so the wrapper lands at the exact path the agent kit installed the binary. - Rename discovered binary to <path>.real, symlink original location → agent-wrap. - FAKE_ROOT and FAKE_TEST_PATH both gated behind AGENTSH_TEST=1. - Test harness fully rewritten: 6 cases covering no agents, one agent, multiple agents, foreign .real conflict, missing wrap, and idempotency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…de template Replace Section 8's fake-claude stub with a live check against docker/sandbox-templates:opencode. Pulls the image, side-loads a stub agentsh + the real agent-wrap and installer, then verifies the move-aside-and-replace layout and the full wrap-chain invocation. Load-bearing assertion: opencode lives at /usr/local/share/npm-global/bin/opencode (not /usr/local/bin/opencode) because the image's PATH puts npm-global first — this is the exact bug the v1 installer would have missed. Section 8 now fails if that path changes. SKIPs cleanly on image-pull failure so CI without hub access is not broken. Sections 1–7 (shell-docker kit mechanics) are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the v1 PATH-precedence language and the absolute-path-entrypoint caveat (move-aside doesn't depend on PATH order). Replace the /usr/local/bin/<agent> reference with the parameterized "<original agent path>" so the docs match what the installer actually does — drops a symlink at wherever `command -v` finds the agent (e.g. /usr/local/share/npm-global/bin/opencode for the opencode template). Add the uninstall caveat: the installer renames files the agent kit shipped, so clean recovery requires restoring <path>.real -> <path>. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ates Section 8 was opencode-only after the W5-redo. Refactor it into a parameterized check_real_agent() function and run it against every publicly available docker/sandbox-templates: image (opencode, gemini, codex). claude isn't published, so it's not tested. This pins the load-bearing assertion — that all three real-agent templates install their binaries at /usr/local/share/npm-global/bin/<agent>, which precedes /usr/local/bin in PATH — across the entire fleet, not just one example. Locks down the v1 design bug from regressing. Also widens the top-level cleanup trap to remove any per-agent containers left behind on early exit, replacing in-loop trap chaining that broke shell quoting at script end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Ships a Docker Sandboxes mixin kit at
docker/sbx-kit/so AgentSH can be installed into any sandbox at creation:/usr/lib/agentsh/shims/on PATH). LD_PRELOAD and ptrace tiers are parked behind forward-compatible labels in/run/agentsh/tier./home/**and/root/), self-protects/etc/agentsh/,/usr/lib/agentsh/, etc., soft-deletes workspace files (recoverable), denies sudo/su/doas, blocks signals to PID 1 andagentsh*(including@jobto prevent SIGSTOP freezing the daemon), audits package installers. Outbound network controls intentionally left to the Docker Sandbox proxy./workspace/.claude/skills/agentsh/SKILL.mdand a human-facing reference at/usr/share/doc/agentsh/policy-reference.md. Users extend by writing to/home/agent/.agentsh/policy.yaml— bootstrap merges on each start.New code surface
configs/policies/coding-agent.yamlinternal/policy/merge.goMergeOverlay— position-preserving rule merge (7 rule kinds)cmd/agentsh-sbx-bootstrap/agentsh server→ wait for socket → probe shim tier → write/run/agentsh/tierscripts/install-agentsh.shdocker/sbx-kit/spec.yaml, README, SKILL.md, override stub, smoke test, Go structural testdocs/policy-reference.md/usr/share/doc/agentsh/Packaging (
.goreleaser.yml)sbx-bootstrap-linuxbuild for/usr/bin/agentsh-sbx-bootstrap/usr/lib/agentsh/shims/(bash, sh, curl, wget, pip, pip3, npm, node, git, python, python3, rm)/usr/share/agentsh/coding-agent.template.yaml/usr/share/doc/agentsh/policy-reference.mdapkadded to nfpms formats so Alpine sandboxes get a real packageinstall.shpublished as a release asset viarelease.extra_filesSpec + plan
docs/superpowers/specs/2026-05-11-docker-sandboxes-mixin-kit-design.mddocs/superpowers/plans/2026-05-11-docker-sandboxes-mixin-kit.mdTest plan
Verified locally:
go build ./...cleanGOOS=windows go build ./...cleango test ./internal/policy/... ./cmd/agentsh-sbx-bootstrap/... ./docker/sbx-kit/...all green (new tests added in each)./scripts/install-agentsh_test.shpasses (5 dry-run scenarios)goreleaser checkcleango test ./...clean (one flake ininternal/store/watchtower/transport— pre-existing per known-flake notes; passes on retry)Deferred to a live Docker Sandboxes environment (no automated CI for v1):
sbx run claude --kit git+https://github.com/canyonroad/agentsh.git#dir=docker/sbx-kit&ref=feature/docker-sbx-mixin-kit— verifycat /run/agentsh/tierreturnsshim, rundocker/sbx-kit/tests/coding-agent-smoke.shopencodeandgeminiagent kitsinstall.shresolves athttps://github.com/canyonroad/agentsh/releases/latest/download/install.shonce a release is tagged on this repo (note: the kit's spec.yaml currently curls fromerans/agentshper the design — update the URL if you'd prefer canyonroad as the primary)Notes for reviewers
spec.yamlcurlsinstall.shfromgithub.com/erans/agentsh. If the canyonroad fork should be canonical, swap that URL (and adjust the test's expected hostname accordingly).release.extra_files: install.shwill only attach the file once a real release tag fires the workflow;goreleaser release --snapshot --skip=publishcouldn't be fully exercised locally because the arm64 cross-compiler isn't present, butgoreleaser checkvalidates the config.🤖 Generated with Claude Code