Skip to content

feat(sbx): Docker Sandboxes mixin kit (v1, shim tier)#303

Open
erans wants to merge 47 commits into
mainfrom
feature/docker-sbx-mixin-kit
Open

feat(sbx): Docker Sandboxes mixin kit (v1, shim tier)#303
erans wants to merge 47 commits into
mainfrom
feature/docker-sbx-mixin-kit

Conversation

@erans
Copy link
Copy Markdown
Collaborator

@erans erans commented May 11, 2026

Summary

Ships a Docker Sandboxes mixin kit at docker/sbx-kit/ so AgentSH can be installed into any sandbox at creation:

sbx run <agent> --kit git+https://github.com/erans/agentsh.git#dir=docker/sbx-kit
  • v1 enforcement tier: shim only (subprocess-exec interception via /usr/lib/agentsh/shims/ on PATH). LD_PRELOAD and ptrace tiers are parked behind forward-compatible labels in /run/agentsh/tier.
  • Coding-agent-tuned policy: denies credential paths (.ssh/.aws/.gnupg/.kube/.docker/.netrc/gcloud/gh/git-credentials in both /home/** and /root/), self-protects /etc/agentsh/, /usr/lib/agentsh/, etc., soft-deletes workspace files (recoverable), denies sudo/su/doas, blocks signals to PID 1 and agentsh* (including @job to prevent SIGSTOP freezing the daemon), audits package installers. Outbound network controls intentionally left to the Docker Sandbox proxy.
  • Self-teaching: drops a Claude Code SKILL at /workspace/.claude/skills/agentsh/SKILL.md and a human-facing reference at /usr/share/doc/agentsh/policy-reference.md. Users extend by writing to /home/agent/.agentsh/policy.yaml — bootstrap merges on each start.
  • Fail-open semantics throughout: broken overlay → fall back to template; daemon doesn't start → tier=none + loud log; never bricks the sandbox.

New code surface

Path Role
configs/policies/coding-agent.yaml Baked-in policy template
internal/policy/merge.go MergeOverlay — position-preserving rule merge (7 rule kinds)
cmd/agentsh-sbx-bootstrap/ New Go binary: merge policy → spawn agentsh server → wait for socket → probe shim tier → write /run/agentsh/tier
scripts/install-agentsh.sh dpkg/rpm/apk installer (uses GitHub Releases API, no jq)
docker/sbx-kit/ The mixin kit: spec.yaml, README, SKILL.md, override stub, smoke test, Go structural test
docs/policy-reference.md Grammar reference packaged into /usr/share/doc/agentsh/

Packaging (.goreleaser.yml)

  • New sbx-bootstrap-linux build for /usr/bin/agentsh-sbx-bootstrap
  • 12 shim symlinks under /usr/lib/agentsh/shims/ (bash, sh, curl, wget, pip, pip3, npm, node, git, python, python3, rm)
  • Policy template at /usr/share/agentsh/coding-agent.template.yaml
  • Policy reference at /usr/share/doc/agentsh/policy-reference.md
  • apk added to nfpms formats so Alpine sandboxes get a real package
  • install.sh published as a release asset via release.extra_files

Spec + plan

  • Design: docs/superpowers/specs/2026-05-11-docker-sandboxes-mixin-kit-design.md
  • Plan: docs/superpowers/plans/2026-05-11-docker-sandboxes-mixin-kit.md

Test plan

Verified locally:

  • go build ./... clean
  • GOOS=windows go build ./... clean
  • go test ./internal/policy/... ./cmd/agentsh-sbx-bootstrap/... ./docker/sbx-kit/... all green (new tests added in each)
  • ./scripts/install-agentsh_test.sh passes (5 dry-run scenarios)
  • goreleaser check clean
  • Bootstrap smoke ran locally with fake daemon: shim tier detected, tier file written, fail-open behaviour confirmed
  • Full go test ./... clean (one flake in internal/store/watchtower/transport — pre-existing per known-flake notes; passes on retry)

Deferred to a live Docker Sandboxes environment (no automated CI for v1):

  • sbx run claude --kit git+https://github.com/canyonroad/agentsh.git#dir=docker/sbx-kit&ref=feature/docker-sbx-mixin-kit — verify cat /run/agentsh/tier returns shim, run docker/sbx-kit/tests/coding-agent-smoke.sh
  • Same against opencode and gemini agent kits
  • Confirm install.sh resolves at https://github.com/canyonroad/agentsh/releases/latest/download/install.sh once a release is tagged on this repo (note: the kit's spec.yaml currently curls from erans/agentsh per the design — update the URL if you'd prefer canyonroad as the primary)

Notes for reviewers

  • The kit's spec.yaml curls install.sh from github.com/erans/agentsh. If the canyonroad fork should be canonical, swap that URL (and adjust the test's expected hostname accordingly).
  • release.extra_files: install.sh will only attach the file once a real release tag fires the workflow; goreleaser release --snapshot --skip=publish couldn't be fully exercised locally because the arm64 cross-compiler isn't present, but goreleaser check validates the config.
  • Out of scope (parked, per spec §13): LD_PRELOAD tier, ptrace tier, OCI registry publishing of the kit itself, submission to docker/sbx-kits-contrib.

🤖 Generated with Claude Code

erans and others added 30 commits May 11, 2026 08:24
Brainstormed design for the sub-plan that closes the Phase 1 Simple
Query loop on top of 04b₂'s upstream wiring. Settles single-driver
half-duplex flow, per-frame demux for result counters, RFQ-byte deny
gating, parse-all-before-forward semantics, the §8 DBEvent schema
extension, redaction-invariant statement_digest, and the real-pgx
spine integration test scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 tasks: Normalize on Parser, source spans on ClassifiedStatement,
DBEvent §8 sub-structs, Server scaffolding (MaxQueryBytes, atomic
policy, per-dialect classifier map), connState extensions + RFQ-byte
capture, simpleQueryLoop scaffold, frame budget cap, upstreamread
demux + counters, deny synth helpers, eventbuilder with redaction +
digest + sibling tagging, allow/deny handleQuery paths, handshake
wiring + approve config-load warning, real-pgx spine test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thread RawStmt.StmtLocation + StmtLen through classifyWithBackend to populate
the SourceStart / SourceEnd fields added in Task 1. Handles pg_query's behavior
of returning StmtLen=0 for trailing statements (use end-of-input boundary). Skip
leading whitespace to get actual statement boundaries (needed for redaction).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Normalize(sql) (string, error) method to the Parser interface, with:
- normalize_linux.go: CGO backend using pg_query.Normalize()
- normalize_other.go: WASM fallback using pgquery_wasm.Normalize()

The Normalize method returns SQL with all literal values replaced by $N
placeholders, for use in statement_digest and parameters_redacted tiers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add EventTLS, EventDecision, EventResult, EventTxContext, EventPredicates
struct types per spec §8. Extend DBEvent with these five fields (tls,
decision, result, tx_context, predicates). Supports JSON round-trip with
nullable integer fields (RowsReturned, RowsAffected). Add two new tests:
TestDBEvent_Extended_RoundTrip validates round-trip of all new fields;
TestDBEvent_Extended_RowsNull verifies null serialization of pointer fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Config.MaxQueryBytes: caps 'Q' frame body; defaults to 1 MiB when zero
  (applied in both the sentinel and normal New() paths).
- classifiers.go: buildClassifierMap constructs one Parser per distinct
  dialect; New() rejects unknown dialect strings at construction time.
- Server.policyPtr (atomic.Pointer[RuleSet]) + SetPolicy/policy methods
  enable hot-swap policy updates without lock contention.
- Fixes TestServer_StartTwice_ReturnsError: added missing Dialect field
  ("postgres") so it survives the new dialect validation gate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Design for a `kind: mixin` kit hosted at docker/sbx-kit/ that installs
AgentSH into any Docker Sandbox at creation and routes the agent's
command-level activity through a coding-agent-tuned policy. v1 ships
the shim tier only; LD_PRELOAD and ptrace tiers are parked behind a
forward-compatible tier label written to /run/agentsh/tier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Realign §6/§7/§9/§10/§11 against actual codebase: the daemon is
`agentsh server`, not `serve`; no `--user-config` flag exists so the
bootstrap merges baked template + user override into
/etc/agentsh/policies/default.yaml on each start; package paths
match nfpms conventions (/usr/bin, /usr/lib/agentsh/shims,
/usr/share/agentsh, /usr/share/doc/agentsh). curl|sh redirect
downgraded to audit-only because agentsh-fetch doesn't exist yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11-task TDD plan implementing the spec at
docs/superpowers/specs/2026-05-11-docker-sandboxes-mixin-kit-design.md.
Tasks 1-2 land the coding-agent policy + merge helper; tasks 3-5 build
out cmd/agentsh-sbx-bootstrap step-by-step (merge → daemon spawn →
tier probe); task 6 packages the new artifacts via .goreleaser.yml;
tasks 7-9 ship the policy reference, install.sh, and the kit tree
itself; tasks 10-11 wire release publishing and gate on end-to-end
build + manual sandbox matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…in kit

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…package-caches

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix 1: move allow-package-caches before allow-home so the narrower
  rule is reachable (first-match-wins; allow-home /home/** was shadowing it)
- Fix 2: remove dead audit-curl-pipe-to-shell command rule; the
  shellc-opaque-script layer already blocks curl|sh; add comment noting
  a v1.1 agentsh-fetch redirect will replace it
- Fix 3: add @job to deny-signal-agentsh signals list so SIGSTOP
  cannot be used to pause the daemon unmonitored
- Fix 4: rename allow-package-installers to audit-package-installers
  and change decision from allow to audit (valid decision per pkg/types;
  engine handles it in CheckCommand)
- Fix 5: add TestAgentPolicies_CodingAgent to anchor the coding-agent
  template shape with floor assertions (>=9 file rules, >=2 cmd rules,
  >=3 signal rules)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…le formatting

- Update audit-package-installers description from stale "allow with audit" wording
  to "audit-log all package manager invocations" to match its actual decision
- Restore the full two-line doc comment for loadAgentDefaultEngine (first line was
  swallowed by the TestAgentPolicies_CodingAgent insertion block, leaving an orphaned
  continuation line above the function)
- Remove extra blank line between TestAgentPolicies_CodingAgent and loadAgentDefaultEngine

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the base+overlay merge semantics for cmd/agentsh-sbx-bootstrap:
overlay wins on name collision (replacement in-place), unknown overlay rules
append in declared order. Covers FileRules, NetworkRules, CommandRules,
UnixRules, and SignalRules. Base metadata (Version, Name, Description,
ResourceLimits, EnvPolicy, Audit) is always preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- MergeOverlay now merges DnsRedirectRules and ConnectRedirectRules by
  name (same pattern as FileRules/NetworkRules/CommandRules/UnixRules/
  SignalRules); previously overlay-provided redirect rules were silently
  discarded.
- Rewrote the MergeOverlay doc comment to enumerate all non-rule fields
  preserved from base, call out the shallow-copy/aliasing trap, and list
  which rule kinds are merged vs. base-wins.
- Extended TestMergeOverlay_PreservesAllRuleKinds to cover UnixRules,
  DnsRedirectRules, and ConnectRedirectRules.
- Added TestMergeOverlay_EmptyNameOverlayAppends to exercise the
  anonymous-rule append path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the bootstrap binary's startup policy-merge phase: reads the
baked coding-agent template, optionally overlays /home/agent/.agentsh/
policy.yaml via policy.MergeOverlay, and atomically writes the result
to /etc/agentsh/policies/default.yaml. Missing/unparseable overlay is
non-fatal (logs to stderr, falls back to template); missing template is
fatal (exits 1). Five TDD tests cover the no-overlay, with-overlay,
bad-overlay fallback, missing-template error, and atomic-write paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ntion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ror surfacing

- Add --daemon-log flag; pass *daemonLog to spawnDaemon instead of
  hardcoded defaultDaemonLog (Fix 1).
- Add clarifying comment on defaultBootstrapLog reserving the path for
  Task 5 / installers (Fix 1).
- Replace goroutine logF.Close() with synchronous close after cmd.Start
  (Fix 2).
- Quote sock path in fake-daemon test script to handle spaces (Fix 3).
- Surface non-ENOENT os.Stat errors in waitForSocket with a wrapped
  message instead of silently retrying (Fix 4).
- Add TestWaitForSocket_NonExistError to cover the new fast-fail branch
  via ENOTDIR (Fix 5).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…be test

Replace direct *exec.ExitError type assertion with errors.As, move
defaultShimDir to the package-level const block with the other defaults,
and make TestProbeShimTier_RejectsRealCurl fatal on any error that isn't
a clean exit-1 (curl not found) so real probe failures can't be masked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Detects dpkg/rpm/apk, resolves the latest release tag via the GitHub
API (sed-only, no jq), and downloads the GoReleaser-produced artifact.
Alpine (apk) falls back to the tar.gz archive since GoReleaser nfpms
produces no .apk. AGENTSH_DRY_RUN=1 prints all actions without
executing them; smoke-tested by install-agentsh_test.sh (shellcheck clean).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ch to real .apk

GoReleaser nfpms now includes apk format so Alpine Linux gets a proper
package with all mixin-kit artifacts (/usr/share/agentsh/coding-agent.template.yaml,
/usr/lib/agentsh/shims/*, /etc/agentsh/config.yaml, etc.). The install
script's apk branch is reverted from the tar.gz workaround to downloading
the real .apk via `apk add --allow-untrusted`; the test assertion is
updated to match the .apk URL pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
spec.yaml, initFiles, startup, SKILL.md, policy override stub,
coding-agent-smoke.sh, and a Go structural test for spec.yaml.
All 6 spec_test.go functions pass; go test ./... -short clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds docker/sbx-kit/tests/run-e2e.sh — exercises the kit's mechanics
against the public docker/sandbox-templates:shell-docker image with no
sbx CLI required. Builds binaries on the host, mounts them into the
sandbox-template container, simulates `sbx run --kit` install layout
(binaries, policy template, shim symlinks, profile.d, environment.d,
files/ tree, user override), runs agentsh-sbx-bootstrap, and verifies:
tier file = shim; curl resolves under /usr/lib/agentsh/shims/; merged
policy contains baked rule + appended override + replace-by-name paths;
SKILL.md and override stub present.

Also fixes a real bug surfaced by the E2E: probeShimTier used
`/bin/sh -c '. /etc/profile.d/agentsh.sh 2>/dev/null || true; ...'`,
which aborts before reaching `|| true` because bash-as-/bin/sh runs in
POSIX mode where errors in the special builtin `.` exit the shell.
Switched to `[ -r ... ] && . ...` which is portable across bash-POSIX,
dash, and busybox sh. Without this fix every real sandbox would have
recorded tier=none.

`make sbx-e2e` runs the harness. Documented in docker/sbx-kit/README.md
with the explicit list of what it does/doesn't verify (in-sandbox
enforcement remains gated on a real `sbx run` against a tagged release).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erans
Copy link
Copy Markdown
Collaborator Author

erans commented May 11, 2026

Added a container-simulated E2E test (make sbx-e2e or docker/sbx-kit/tests/run-e2e.sh) — commit 935f7e7.

It boots a real docker/sandbox-templates:shell-docker container (the same image the actual Docker Sandboxes agent kits derive from), lays down the post-install state sbx run --kit would produce, runs agentsh-sbx-bootstrap, and verifies 7 checks: tier file = shim, curl resolves under the shim dir, merged policy contains the baked rule + appended override + replace-by-name overlay paths, SKILL.md and override stub present. No sbx CLI required; runs on any host with Docker + Go.

Surfaced a real bug while writing it: probeShimTier used . /etc/profile.d/agentsh.sh 2>/dev/null || true which silently aborts the probe in bash-POSIX mode (the . is a special builtin and its failure exits the shell before || true runs). Switched to [ -r ... ] && . .... Without this, every real sandbox would have recorded tier=none.

Out of scope for this E2E (still gated on a real sbx run against a tagged release): the install.sh download path and in-sandbox enforcement of deny/audit/soft_delete (that needs agentsh server with libseccomp). Documented in the kit README.

🤖 Generated with Claude Code

erans and others added 11 commits May 11, 2026 14:19
Adds a follow-on design that turns the Docker Sandboxes mixin kit from
"AgentSH alongside the agent" into "AgentSH owns the agent's lifecycle":
wrapper symlinks at /usr/local/bin/<agent> route claude/opencode/gemini/
codex/cursor launches through `agentsh wrap`, giving full exec-pipeline
interception, a coherent session, and a session report.

Fail-CLOSED deviation from the parent spec §7: if agentsh wrap cannot
engage cleanly (binary missing, tier != shim, etc.), the wrapper exits
non-zero and refuses to launch the agent. Operators choosing this kit
choose enforcement-mandatory semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7-task TDD plan implementing the spec at
docs/superpowers/specs/2026-05-11-sbx-agent-wrap.md. Tasks 1-2 land
the wrapper + installer shell scripts with shell-driven tests
(5+6 cases respectively, FAKE_ROOT test hook). Task 3 packages them
via .goreleaser.yml. Task 4 wires the installer into the kit's
spec.yaml install step. Task 5 extends docker/sbx-kit/tests/run-e2e.sh
to assert wrap engages end-to-end with a fake agentsh binary. Task 6
updates the Go structural test for the new two-entry install block.
Task 7 documents fail-closed semantics + the limitation that
absolute-path entrypoints bypass the wrapper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…osed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- FAKE_ROOT is now only honored when AGENTSH_TEST=1 is also set, closing
  a path-substitution surface for sandboxed agents with env-var control.
- Fix 2: add comment above command -v explaining fail-closed behaviour for
  shell-function-named agentsh.
- Fix 3: run_wrap_no_agentsh uses an empty tempdir for PATH instead of
  /usr/bin:/bin, so the test is not sensitive to host agentsh installation.
- Fix 4: stderr message for missing real binary now includes the
  "refusing to launch $name" tail for grep-friendliness.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… on install

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…; strengthen idempotency

- Fix 1: three-way check in installer — silently skip symlinks that already
  point at WRAP (idempotent re-run); only warn for genuinely conflicting
  entries (regular file or symlink pointing elsewhere).
- Fix 2: Test 4b — foreign-symlink conflict scenario; verifies pre-existing
  symlink pointing to /opt/vendor/bin/claude is left untouched with warning.
- Fix 3: Test 6 uses capture_state() (filename + readlink target) instead of
  find|sort; also asserts second-run output is completely silent.
- Fix 4: FAKE_ROOT absolute-path hazard documented in header comment block.
- Fix 5: "wrapped $agent" success message routed to stderr for stream
  consistency with all other installer messages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add two nfpms.contents entries so the W1/W2 auto-wrap harness scripts
ship in every .deb/.rpm/.apk at /usr/lib/agentsh/ with mode 0755.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Check 8 side-loads agent-wrap.sh and install-agent-wrappers.sh into the
container at the production layout, puts a fake agentsh on PATH that
prints a recognizable marker, installs a fake /usr/bin/claude, runs the
installer, and asserts the wrap chain fires (with args preserved) when
claude is invoked from a login shell.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@erans
Copy link
Copy Markdown
Collaborator Author

erans commented May 11, 2026

Added auto-wrap of the agent harness via agentsh wrap — 9 new commits (5c2ddafa…14139053) implementing the design at docs/superpowers/specs/2026-05-11-sbx-agent-wrap.md.

After this PR, the kit doesn't just install AgentSH alongside the agent — it owns the agent's lifecycle. After install.sh finishes, a second install step runs install-agent-wrappers.sh which probes /usr/bin for known agents (claude, opencode, gemini, codex, cursor) and creates /usr/local/bin/<agent> symlinks pointing at /usr/lib/agentsh/agent-wrap. PATH precedence makes the agent kit's exec claude resolve to the wrapper, which then exec's agentsh wrap -- /usr/bin/claude "$@". Single coherent session, full exec-pipeline interception of every subprocess, session report on exit.

Fail-CLOSED deviation from parent spec §7 — explicit and documented. If the wrapper runs and AgentSH can't engage (binary missing, tier ≠ shim, tier file missing), it exits non-zero and refuses to launch the agent. This kit's purpose is enforcement; running unenforced is not a supported state. The parent spec's "never bricks the sandbox" still governs the bootstrap; this section governs agent launch time.

What ships:

  • packaging/agent-wrap.sh — the wrapper script (POSIX sh, FAKE_ROOT gated behind AGENTSH_TEST=1 so a sandboxed process can't redirect path resolution).
  • packaging/install-agent-wrappers.sh — idempotent installer with silent-skip for already-correctly-wrapped symlinks, foreign-symlink-conflict detection.
  • 5 + 7 shell tests covering wrapper and installer (both shellcheck-clean).
  • .goreleaser.yml packages both at /usr/lib/agentsh/ mode 0755.
  • docker/sbx-kit/spec.yaml adds the second install command.
  • E2E test grows from 7 to 8 checks: the new check side-loads the wrapper and a fake agentsh into the sandbox-template container and asserts the wrap chain fires with args preserved.
  • docker/sbx-kit/spec_test.go asserts both install commands.
  • docker/sbx-kit/README.md gains a "Behavior: agent harness runs under agentsh wrap" section with the fail-closed deviation note + known limitations (absolute-path entrypoints bypass the wrapper; install-time failures pass through).
  • docs/policy-reference.md table grows two rows.

Verification (worktree local):

  • go test ./docker/sbx-kit/... ./cmd/agentsh-sbx-bootstrap/... ./internal/policy/... — all green
  • packaging/agent-wrap_test.sh — 5/5
  • packaging/install-agent-wrappers_test.sh — 7/7
  • scripts/install-agentsh_test.sh — OK
  • bash docker/sbx-kit/tests/run-e2e.sh — 8/8 pass

Out of scope per spec §3: no env-var opt-in (auto-wrap is the default), no manual agentsh wrap SKILL guidance (the harness is already wrapped), no fix for absolute-path entrypoints (documented as a known limitation).

🤖 Generated with Claude Code

erans and others added 6 commits May 11, 2026 15:06
v1 assumed /usr/local/bin precedes real agent install locations in PATH.
Probing docker/sandbox-templates:opencode revealed opencode lives at
/usr/local/share/npm-global/bin/opencode, which precedes /usr/local/bin
in PATH. The v1 wrapper would never have fired against any real agent kit.

v2 switches to move-aside-and-replace at the discovered binary location:
discover via command -v, rename to .real, drop a symlink to agent-wrap
at the original path. Wrapper now derives the real binary from \${0}.real
instead of a fixed /usr/bin/<name>.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch real-binary resolution from the v1 fixed-path /usr/bin/<name> to
${0}.real so the wrapper works regardless of where the agent binary lives
(e.g. /usr/local/share/npm-global/bin/opencode). FAKE_ROOT test hook is
kept but now only gates the tier_file path, not the real-binary path.
Tests rewritten to place the fake binary at <symlink>.real.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-v, move-aside

Replace v1 fixed-path probe+symlink design with v2 move-aside-and-replace:
- Discover each agent via `command -v` (scoped through _AGENT_PATH / FAKE_TEST_PATH)
  so the wrapper lands at the exact path the agent kit installed the binary.
- Rename discovered binary to <path>.real, symlink original location → agent-wrap.
- FAKE_ROOT and FAKE_TEST_PATH both gated behind AGENTSH_TEST=1.
- Test harness fully rewritten: 6 cases covering no agents, one agent, multiple
  agents, foreign .real conflict, missing wrap, and idempotency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…de template

Replace Section 8's fake-claude stub with a live check against
docker/sandbox-templates:opencode. Pulls the image, side-loads a stub
agentsh + the real agent-wrap and installer, then verifies the
move-aside-and-replace layout and the full wrap-chain invocation.

Load-bearing assertion: opencode lives at
/usr/local/share/npm-global/bin/opencode (not /usr/local/bin/opencode)
because the image's PATH puts npm-global first — this is the exact bug
the v1 installer would have missed. Section 8 now fails if that path
changes. SKIPs cleanly on image-pull failure so CI without hub access is
not broken. Sections 1–7 (shell-docker kit mechanics) are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the v1 PATH-precedence language and the absolute-path-entrypoint
caveat (move-aside doesn't depend on PATH order). Replace the
/usr/local/bin/<agent> reference with the parameterized "<original agent
path>" so the docs match what the installer actually does — drops a
symlink at wherever `command -v` finds the agent (e.g.
/usr/local/share/npm-global/bin/opencode for the opencode template).

Add the uninstall caveat: the installer renames files the agent kit
shipped, so clean recovery requires restoring <path>.real -> <path>.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ates

Section 8 was opencode-only after the W5-redo. Refactor it into a
parameterized check_real_agent() function and run it against every
publicly available docker/sandbox-templates: image (opencode, gemini,
codex). claude isn't published, so it's not tested.

This pins the load-bearing assertion — that all three real-agent templates
install their binaries at /usr/local/share/npm-global/bin/<agent>, which
precedes /usr/local/bin in PATH — across the entire fleet, not just one
example. Locks down the v1 design bug from regressing.

Also widens the top-level cleanup trap to remove any per-agent containers
left behind on early exit, replacing in-loop trap chaining that broke
shell quoting at script end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant