feat(sbx): Docker Sandboxes mixin kit (v1, shim tier) by erans · Pull Request #303 · canyonroad/agentsh

erans · 2026-05-11T20:08:07Z

Summary

Ships a Docker Sandboxes mixin kit at docker/sbx-kit/ so AgentSH can be installed into any sandbox at creation:

sbx run <agent> --kit git+https://github.com/erans/agentsh.git#dir=docker/sbx-kit

v1 enforcement tier: shim only (subprocess-exec interception via /usr/lib/agentsh/shims/ on PATH). LD_PRELOAD and ptrace tiers are parked behind forward-compatible labels in /run/agentsh/tier.
Coding-agent-tuned policy: denies credential paths (.ssh/.aws/.gnupg/.kube/.docker/.netrc/gcloud/gh/git-credentials in both /home/** and /root/), self-protects /etc/agentsh/, /usr/lib/agentsh/, etc., soft-deletes workspace files (recoverable), denies sudo/su/doas, blocks signals to PID 1 and agentsh* (including @job to prevent SIGSTOP freezing the daemon), audits package installers. Outbound network controls intentionally left to the Docker Sandbox proxy.
Self-teaching: drops a Claude Code SKILL at /workspace/.claude/skills/agentsh/SKILL.md and a human-facing reference at /usr/share/doc/agentsh/policy-reference.md. Users extend by writing to /home/agent/.agentsh/policy.yaml — bootstrap merges on each start.
Fail-open semantics throughout: broken overlay → fall back to template; daemon doesn't start → tier=none + loud log; never bricks the sandbox.

New code surface

Path	Role
`configs/policies/coding-agent.yaml`	Baked-in policy template
`internal/policy/merge.go`	`MergeOverlay` — position-preserving rule merge (7 rule kinds)
`cmd/agentsh-sbx-bootstrap/`	New Go binary: merge policy → spawn `agentsh server` → wait for socket → probe shim tier → write `/run/agentsh/tier`
`scripts/install-agentsh.sh`	dpkg/rpm/apk installer (uses GitHub Releases API, no jq)
`docker/sbx-kit/`	The mixin kit: `spec.yaml`, README, SKILL.md, override stub, smoke test, Go structural test
`docs/policy-reference.md`	Grammar reference packaged into `/usr/share/doc/agentsh/`

Packaging (`.goreleaser.yml`)

New sbx-bootstrap-linux build for /usr/bin/agentsh-sbx-bootstrap
12 shim symlinks under /usr/lib/agentsh/shims/ (bash, sh, curl, wget, pip, pip3, npm, node, git, python, python3, rm)
Policy template at /usr/share/agentsh/coding-agent.template.yaml
Policy reference at /usr/share/doc/agentsh/policy-reference.md
apk added to nfpms formats so Alpine sandboxes get a real package
install.sh published as a release asset via release.extra_files

Spec + plan

Design: docs/superpowers/specs/2026-05-11-docker-sandboxes-mixin-kit-design.md
Plan: docs/superpowers/plans/2026-05-11-docker-sandboxes-mixin-kit.md

Test plan

Verified locally:

go build ./... clean
GOOS=windows go build ./... clean
go test ./internal/policy/... ./cmd/agentsh-sbx-bootstrap/... ./docker/sbx-kit/... all green (new tests added in each)
./scripts/install-agentsh_test.sh passes (5 dry-run scenarios)
goreleaser check clean
Bootstrap smoke ran locally with fake daemon: shim tier detected, tier file written, fail-open behaviour confirmed
Full go test ./... clean (one flake in internal/store/watchtower/transport — pre-existing per known-flake notes; passes on retry)

Deferred to a live Docker Sandboxes environment (no automated CI for v1):

sbx run claude --kit git+https://github.com/canyonroad/agentsh.git#dir=docker/sbx-kit&ref=feature/docker-sbx-mixin-kit — verify cat /run/agentsh/tier returns shim, run docker/sbx-kit/tests/coding-agent-smoke.sh
Same against opencode and gemini agent kits
Confirm install.sh resolves at https://github.com/canyonroad/agentsh/releases/latest/download/install.sh once a release is tagged on this repo (note: the kit's spec.yaml currently curls from erans/agentsh per the design — update the URL if you'd prefer canyonroad as the primary)

Notes for reviewers

The kit's spec.yaml curls install.sh from github.com/erans/agentsh. If the canyonroad fork should be canonical, swap that URL (and adjust the test's expected hostname accordingly).
release.extra_files: install.sh will only attach the file once a real release tag fires the workflow; goreleaser release --snapshot --skip=publish couldn't be fully exercised locally because the arm64 cross-compiler isn't present, but goreleaser check validates the config.
Out of scope (parked, per spec §13): LD_PRELOAD tier, ptrace tier, OCI registry publishing of the kit itself, submission to docker/sbx-kits-contrib.

🤖 Generated with Claude Code

Brainstormed design for the sub-plan that closes the Phase 1 Simple Query loop on top of 04b₂'s upstream wiring. Settles single-driver half-duplex flow, per-frame demux for result counters, RFQ-byte deny gating, parse-all-before-forward semantics, the §8 DBEvent schema extension, redaction-invariant statement_digest, and the real-pgx spine integration test scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

15 tasks: Normalize on Parser, source spans on ClassifiedStatement, DBEvent §8 sub-structs, Server scaffolding (MaxQueryBytes, atomic policy, per-dialect classifier map), connState extensions + RFQ-byte capture, simpleQueryLoop scaffold, frame budget cap, upstreamread demux + counters, deny synth helpers, eventbuilder with redaction + digest + sibling tagging, allow/deny handleQuery paths, handshake wiring + approve config-load warning, real-pgx spine test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Thread RawStmt.StmtLocation + StmtLen through classifyWithBackend to populate the SourceStart / SourceEnd fields added in Task 1. Handles pg_query's behavior of returning StmtLen=0 for trailing statements (use end-of-input boundary). Skip leading whitespace to get actual statement boundaries (needed for redaction). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds Normalize(sql) (string, error) method to the Parser interface, with: - normalize_linux.go: CGO backend using pg_query.Normalize() - normalize_other.go: WASM fallback using pgquery_wasm.Normalize() The Normalize method returns SQL with all literal values replaced by $N placeholders, for use in statement_digest and parameters_redacted tiers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add EventTLS, EventDecision, EventResult, EventTxContext, EventPredicates struct types per spec §8. Extend DBEvent with these five fields (tls, decision, result, tx_context, predicates). Supports JSON round-trip with nullable integer fields (RowsReturned, RowsAffected). Add two new tests: TestDBEvent_Extended_RoundTrip validates round-trip of all new fields; TestDBEvent_Extended_RowsNull verifies null serialization of pointer fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Config.MaxQueryBytes: caps 'Q' frame body; defaults to 1 MiB when zero (applied in both the sentinel and normal New() paths). - classifiers.go: buildClassifierMap constructs one Parser per distinct dialect; New() rejects unknown dialect strings at construction time. - Server.policyPtr (atomic.Pointer[RuleSet]) + SetPolicy/policy methods enable hot-swap policy updates without lock contention. - Fixes TestServer_StartTwice_ReturnsError: added missing Dialect field ("postgres") so it survives the new dialect validation gate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Design for a `kind: mixin` kit hosted at docker/sbx-kit/ that installs AgentSH into any Docker Sandbox at creation and routes the agent's command-level activity through a coding-agent-tuned policy. v1 ships the shim tier only; LD_PRELOAD and ptrace tiers are parked behind a forward-compatible tier label written to /run/agentsh/tier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Realign §6/§7/§9/§10/§11 against actual codebase: the daemon is `agentsh server`, not `serve`; no `--user-config` flag exists so the bootstrap merges baked template + user override into /etc/agentsh/policies/default.yaml on each start; package paths match nfpms conventions (/usr/bin, /usr/lib/agentsh/shims, /usr/share/agentsh, /usr/share/doc/agentsh). curl|sh redirect downgraded to audit-only because agentsh-fetch doesn't exist yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

11-task TDD plan implementing the spec at docs/superpowers/specs/2026-05-11-docker-sandboxes-mixin-kit-design.md. Tasks 1-2 land the coding-agent policy + merge helper; tasks 3-5 build out cmd/agentsh-sbx-bootstrap step-by-step (merge → daemon spawn → tier probe); task 6 packages the new artifacts via .goreleaser.yml; tasks 7-9 ship the policy reference, install.sh, and the kit tree itself; tasks 10-11 wire release publishing and gate on end-to-end build + manual sandbox matrix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…in kit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…package-caches Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@job

- Fix 1: move allow-package-caches before allow-home so the narrower rule is reachable (first-match-wins; allow-home /home/** was shadowing it) - Fix 2: remove dead audit-curl-pipe-to-shell command rule; the shellc-opaque-script layer already blocks curl|sh; add comment noting a v1.1 agentsh-fetch redirect will replace it - Fix 3: add @job to deny-signal-agentsh signals list so SIGSTOP cannot be used to pause the daemon unmonitored - Fix 4: rename allow-package-installers to audit-package-installers and change decision from allow to audit (valid decision per pkg/types; engine handles it in CheckCommand) - Fix 5: add TestAgentPolicies_CodingAgent to anchor the coding-agent template shape with floor assertions (>=9 file rules, >=2 cmd rules, >=3 signal rules) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…le formatting - Update audit-package-installers description from stale "allow with audit" wording to "audit-log all package manager invocations" to match its actual decision - Restore the full two-line doc comment for loadAgentDefaultEngine (first line was swallowed by the TestAgentPolicies_CodingAgent insertion block, leaving an orphaned continuation line above the function) - Remove extra blank line between TestAgentPolicies_CodingAgent and loadAgentDefaultEngine Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements the base+overlay merge semantics for cmd/agentsh-sbx-bootstrap: overlay wins on name collision (replacement in-place), unknown overlay rules append in declared order. Covers FileRules, NetworkRules, CommandRules, UnixRules, and SignalRules. Base metadata (Version, Name, Description, ResourceLimits, EnvPolicy, Audit) is always preserved. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- MergeOverlay now merges DnsRedirectRules and ConnectRedirectRules by name (same pattern as FileRules/NetworkRules/CommandRules/UnixRules/ SignalRules); previously overlay-provided redirect rules were silently discarded. - Rewrote the MergeOverlay doc comment to enumerate all non-rule fields preserved from base, call out the shallow-copy/aliasing trap, and list which rule kinds are merged vs. base-wins. - Extended TestMergeOverlay_PreservesAllRuleKinds to cover UnixRules, DnsRedirectRules, and ConnectRedirectRules. - Added TestMergeOverlay_EmptyNameOverlayAppends to exercise the anonymous-rule append path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds the bootstrap binary's startup policy-merge phase: reads the baked coding-agent template, optionally overlays /home/agent/.agentsh/ policy.yaml via policy.MergeOverlay, and atomically writes the result to /etc/agentsh/policies/default.yaml. Missing/unparseable overlay is non-fatal (logs to stderr, falls back to template); missing template is fatal (exits 1). Five TDD tests cover the no-overlay, with-overlay, bad-overlay fallback, missing-template error, and atomic-write paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ntion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ror surfacing - Add --daemon-log flag; pass *daemonLog to spawnDaemon instead of hardcoded defaultDaemonLog (Fix 1). - Add clarifying comment on defaultBootstrapLog reserving the path for Task 5 / installers (Fix 1). - Replace goroutine logF.Close() with synchronous close after cmd.Start (Fix 2). - Quote sock path in fake-daemon test script to handle spaces (Fix 3). - Surface non-ENOENT os.Stat errors in waitForSocket with a wrapped message instead of silently retrying (Fix 4). - Add TestWaitForSocket_NonExistError to cover the new fast-fail branch via ENOTDIR (Fix 5). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…be test Replace direct *exec.ExitError type assertion with errors.As, move defaultShimDir to the package-level const block with the other defaults, and make TestProbeShimTier_RejectsRealCurl fatal on any error that isn't a clean exit-1 (curl not found) so real probe failures can't be masked. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Detects dpkg/rpm/apk, resolves the latest release tag via the GitHub API (sed-only, no jq), and downloads the GoReleaser-produced artifact. Alpine (apk) falls back to the tar.gz archive since GoReleaser nfpms produces no .apk. AGENTSH_DRY_RUN=1 prints all actions without executing them; smoke-tested by install-agentsh_test.sh (shellcheck clean). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ch to real .apk GoReleaser nfpms now includes apk format so Alpine Linux gets a proper package with all mixin-kit artifacts (/usr/share/agentsh/coding-agent.template.yaml, /usr/lib/agentsh/shims/*, /etc/agentsh/config.yaml, etc.). The install script's apk branch is reverted from the tar.gz workaround to downloading the real .apk via `apk add --allow-untrusted`; the test assertion is updated to match the .apk URL pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

spec.yaml, initFiles, startup, SKILL.md, policy override stub, coding-agent-smoke.sh, and a Go structural test for spec.yaml. All 6 spec_test.go functions pass; go test ./... -short clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds docker/sbx-kit/tests/run-e2e.sh — exercises the kit's mechanics against the public docker/sandbox-templates:shell-docker image with no sbx CLI required. Builds binaries on the host, mounts them into the sandbox-template container, simulates `sbx run --kit` install layout (binaries, policy template, shim symlinks, profile.d, environment.d, files/ tree, user override), runs agentsh-sbx-bootstrap, and verifies: tier file = shim; curl resolves under /usr/lib/agentsh/shims/; merged policy contains baked rule + appended override + replace-by-name paths; SKILL.md and override stub present. Also fixes a real bug surfaced by the E2E: probeShimTier used `/bin/sh -c '. /etc/profile.d/agentsh.sh 2>/dev/null || true; ...'`, which aborts before reaching `|| true` because bash-as-/bin/sh runs in POSIX mode where errors in the special builtin `.` exit the shell. Switched to `[ -r ... ] && . ...` which is portable across bash-POSIX, dash, and busybox sh. Without this fix every real sandbox would have recorded tier=none. `make sbx-e2e` runs the harness. Documented in docker/sbx-kit/README.md with the explicit list of what it does/doesn't verify (in-sandbox enforcement remains gated on a real `sbx run` against a tagged release). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

erans · 2026-05-11T20:32:40Z

Added a container-simulated E2E test (make sbx-e2e or docker/sbx-kit/tests/run-e2e.sh) — commit 935f7e7.

It boots a real docker/sandbox-templates:shell-docker container (the same image the actual Docker Sandboxes agent kits derive from), lays down the post-install state sbx run --kit would produce, runs agentsh-sbx-bootstrap, and verifies 7 checks: tier file = shim, curl resolves under the shim dir, merged policy contains the baked rule + appended override + replace-by-name overlay paths, SKILL.md and override stub present. No sbx CLI required; runs on any host with Docker + Go.

Surfaced a real bug while writing it: probeShimTier used . /etc/profile.d/agentsh.sh 2>/dev/null || true which silently aborts the probe in bash-POSIX mode (the . is a special builtin and its failure exits the shell before || true runs). Switched to [ -r ... ] && . .... Without this, every real sandbox would have recorded tier=none.

Out of scope for this E2E (still gated on a real sbx run against a tagged release): the install.sh download path and in-sandbox enforcement of deny/audit/soft_delete (that needs agentsh server with libseccomp). Documented in the kit README.

🤖 Generated with Claude Code

Adds a follow-on design that turns the Docker Sandboxes mixin kit from "AgentSH alongside the agent" into "AgentSH owns the agent's lifecycle": wrapper symlinks at /usr/local/bin/<agent> route claude/opencode/gemini/ codex/cursor launches through `agentsh wrap`, giving full exec-pipeline interception, a coherent session, and a session report. Fail-CLOSED deviation from the parent spec §7: if agentsh wrap cannot engage cleanly (binary missing, tier != shim, etc.), the wrapper exits non-zero and refuses to launch the agent. Operators choosing this kit choose enforcement-mandatory semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

7-task TDD plan implementing the spec at docs/superpowers/specs/2026-05-11-sbx-agent-wrap.md. Tasks 1-2 land the wrapper + installer shell scripts with shell-driven tests (5+6 cases respectively, FAKE_ROOT test hook). Task 3 packages them via .goreleaser.yml. Task 4 wires the installer into the kit's spec.yaml install step. Task 5 extends docker/sbx-kit/tests/run-e2e.sh to assert wrap engages end-to-end with a fake agentsh binary. Task 6 updates the Go structural test for the new two-entry install block. Task 7 documents fail-closed semantics + the limitation that absolute-path entrypoints bypass the wrapper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…osed) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- FAKE_ROOT is now only honored when AGENTSH_TEST=1 is also set, closing a path-substitution surface for sandboxed agents with env-var control. - Fix 2: add comment above command -v explaining fail-closed behaviour for shell-function-named agentsh. - Fix 3: run_wrap_no_agentsh uses an empty tempdir for PATH instead of /usr/bin:/bin, so the test is not sensitive to host agentsh installation. - Fix 4: stderr message for missing real binary now includes the "refusing to launch $name" tail for grep-friendliness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… on install Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…; strengthen idempotency - Fix 1: three-way check in installer — silently skip symlinks that already point at WRAP (idempotent re-run); only warn for genuinely conflicting entries (regular file or symlink pointing elsewhere). - Fix 2: Test 4b — foreign-symlink conflict scenario; verifies pre-existing symlink pointing to /opt/vendor/bin/claude is left untouched with warning. - Fix 3: Test 6 uses capture_state() (filename + readlink target) instead of find|sort; also asserts second-run output is completely silent. - Fix 4: FAKE_ROOT absolute-path hazard documented in header comment block. - Fix 5: "wrapped $agent" success message routed to stderr for stream consistency with all other installer messages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add two nfpms.contents entries so the W1/W2 auto-wrap harness scripts ship in every .deb/.rpm/.apk at /usr/lib/agentsh/ with mode 0755. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Check 8 side-loads agent-wrap.sh and install-agent-wrappers.sh into the container at the production layout, puts a fake agentsh on PATH that prints a recognizable marker, installs a fake /usr/bin/claude, runs the installer, and asserts the wrap chain fires (with args preserved) when claude is invoked from a login shell. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

erans · 2026-05-11T21:45:52Z

Added auto-wrap of the agent harness via agentsh wrap — 9 new commits (5c2ddafa…14139053) implementing the design at docs/superpowers/specs/2026-05-11-sbx-agent-wrap.md.

After this PR, the kit doesn't just install AgentSH alongside the agent — it owns the agent's lifecycle. After install.sh finishes, a second install step runs install-agent-wrappers.sh which probes /usr/bin for known agents (claude, opencode, gemini, codex, cursor) and creates /usr/local/bin/<agent> symlinks pointing at /usr/lib/agentsh/agent-wrap. PATH precedence makes the agent kit's exec claude resolve to the wrapper, which then exec's agentsh wrap -- /usr/bin/claude "$@". Single coherent session, full exec-pipeline interception of every subprocess, session report on exit.

Fail-CLOSED deviation from parent spec §7 — explicit and documented. If the wrapper runs and AgentSH can't engage (binary missing, tier ≠ shim, tier file missing), it exits non-zero and refuses to launch the agent. This kit's purpose is enforcement; running unenforced is not a supported state. The parent spec's "never bricks the sandbox" still governs the bootstrap; this section governs agent launch time.

What ships:

packaging/agent-wrap.sh — the wrapper script (POSIX sh, FAKE_ROOT gated behind AGENTSH_TEST=1 so a sandboxed process can't redirect path resolution).
packaging/install-agent-wrappers.sh — idempotent installer with silent-skip for already-correctly-wrapped symlinks, foreign-symlink-conflict detection.
5 + 7 shell tests covering wrapper and installer (both shellcheck-clean).
.goreleaser.yml packages both at /usr/lib/agentsh/ mode 0755.
docker/sbx-kit/spec.yaml adds the second install command.
E2E test grows from 7 to 8 checks: the new check side-loads the wrapper and a fake agentsh into the sandbox-template container and asserts the wrap chain fires with args preserved.
docker/sbx-kit/spec_test.go asserts both install commands.
docker/sbx-kit/README.md gains a "Behavior: agent harness runs under agentsh wrap" section with the fail-closed deviation note + known limitations (absolute-path entrypoints bypass the wrapper; install-time failures pass through).
docs/policy-reference.md table grows two rows.

Verification (worktree local):

go test ./docker/sbx-kit/... ./cmd/agentsh-sbx-bootstrap/... ./internal/policy/... — all green
packaging/agent-wrap_test.sh — 5/5
packaging/install-agent-wrappers_test.sh — 7/7
scripts/install-agentsh_test.sh — OK
bash docker/sbx-kit/tests/run-e2e.sh — 8/8 pass

Out of scope per spec §3: no env-var opt-in (auto-wrap is the default), no manual agentsh wrap SKILL guidance (the harness is already wrapped), no fix for absolute-path entrypoints (documented as a known limitation).

🤖 Generated with Claude Code

v1 assumed /usr/local/bin precedes real agent install locations in PATH. Probing docker/sandbox-templates:opencode revealed opencode lives at /usr/local/share/npm-global/bin/opencode, which precedes /usr/local/bin in PATH. The v1 wrapper would never have fired against any real agent kit. v2 switches to move-aside-and-replace at the discovered binary location: discover via command -v, rename to .real, drop a symlink to agent-wrap at the original path. Wrapper now derives the real binary from \${0}.real instead of a fixed /usr/bin/<name>. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Switch real-binary resolution from the v1 fixed-path /usr/bin/<name> to ${0}.real so the wrapper works regardless of where the agent binary lives (e.g. /usr/local/share/npm-global/bin/opencode). FAKE_ROOT test hook is kept but now only gates the tier_file path, not the real-binary path. Tests rewritten to place the fake binary at <symlink>.real. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-v, move-aside Replace v1 fixed-path probe+symlink design with v2 move-aside-and-replace: - Discover each agent via `command -v` (scoped through _AGENT_PATH / FAKE_TEST_PATH) so the wrapper lands at the exact path the agent kit installed the binary. - Rename discovered binary to <path>.real, symlink original location → agent-wrap. - FAKE_ROOT and FAKE_TEST_PATH both gated behind AGENTSH_TEST=1. - Test harness fully rewritten: 6 cases covering no agents, one agent, multiple agents, foreign .real conflict, missing wrap, and idempotency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…de template Replace Section 8's fake-claude stub with a live check against docker/sandbox-templates:opencode. Pulls the image, side-loads a stub agentsh + the real agent-wrap and installer, then verifies the move-aside-and-replace layout and the full wrap-chain invocation. Load-bearing assertion: opencode lives at /usr/local/share/npm-global/bin/opencode (not /usr/local/bin/opencode) because the image's PATH puts npm-global first — this is the exact bug the v1 installer would have missed. Section 8 now fails if that path changes. SKIPs cleanly on image-pull failure so CI without hub access is not broken. Sections 1–7 (shell-docker kit mechanics) are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop the v1 PATH-precedence language and the absolute-path-entrypoint caveat (move-aside doesn't depend on PATH order). Replace the /usr/local/bin/<agent> reference with the parameterized "<original agent path>" so the docs match what the installer actually does — drops a symlink at wherever `command -v` finds the agent (e.g. /usr/local/share/npm-global/bin/opencode for the opencode template). Add the uninstall caveat: the installer renames files the agent kit shipped, so clean recovery requires restoring <path>.real -> <path>. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ates Section 8 was opencode-only after the W5-redo. Refactor it into a parameterized check_real_agent() function and run it against every publicly available docker/sandbox-templates: image (opencode, gemini, codex). claude isn't published, so it's not tested. This pins the load-bearing assertion — that all three real-agent templates install their binaries at /usr/local/share/npm-global/bin/<agent>, which precedes /usr/local/bin in PATH — across the entire fleet, not just one example. Locks down the v1 design bug from regressing. Also widens the top-level cleanup trap to remove any per-agent containers left behind on early exit, replacing in-loop trap chaining that broke shell quoting at script end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

erans and others added 30 commits May 11, 2026 08:24

db: effects — add SourceStart/SourceEnd to ClassifiedStatement

71ed971

db: classify/postgres — UTF-8-safe whitespace skip + trailing-stmt test

5baf6f8

policy: add coding-agent template baked into the Docker Sandboxes mix…

df440a5

…in kit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

policy: cover /root paths in coding-agent deny-credentials and allow-…

df87cb5

…package-caches Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bootstrap: use errors.Is(err, os.ErrNotExist) to match codebase conve…

9bbdff0

…ntion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bootstrap: spawn agentsh server and wait for socket

7507d6f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bootstrap: shim-tier probe + /run/agentsh/tier writer

2186ed0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: policy-reference.md packaged with the kit for in-sandbox use

a91e037

release: package sbx-bootstrap binary, shim symlinks, policy template

03c41d2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

release: publish install.sh as a release asset for the sbx mixin kit

2a5705b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

erans and others added 11 commits May 11, 2026 14:19

packaging: agent-wrap.sh — engage \agentsh wrap\ on launch (fail-cl…

5c2ddaf

…osed) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

packaging: install-agent-wrappers.sh — symlink /usr/local/bin/<agent>…

2842cfd

… on install Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

release: package agent-wrap.sh and install-agent-wrappers.sh

bea2c6c

Add two nfpms.contents entries so the W1/W2 auto-wrap harness scripts ship in every .deb/.rpm/.apk at /usr/lib/agentsh/ with mode 0755. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sbx: wire install-agent-wrappers.sh into spec.yaml install step

6d49697

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sbx: extend spec_test.go to assert the second install command

26481d8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: document agent-wrap behavior + fail-closed deviation

1413905

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

erans and others added 6 commits May 11, 2026 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sbx): Docker Sandboxes mixin kit (v1, shim tier)#303

feat(sbx): Docker Sandboxes mixin kit (v1, shim tier)#303
erans wants to merge 47 commits into
mainfrom
feature/docker-sbx-mixin-kit

erans commented May 11, 2026

Uh oh!

erans commented May 11, 2026

Uh oh!

erans commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

erans commented May 11, 2026

Summary

New code surface

Packaging (.goreleaser.yml)

Spec + plan

Test plan

Notes for reviewers

Uh oh!

erans commented May 11, 2026

Uh oh!

erans commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Packaging (`.goreleaser.yml`)