Skip to content

Add sandboxed agent runtime package#1220

Closed
Connoropolous wants to merge 6 commits into
mainfrom
codex/agent-runtime-sandbox-harnesses
Closed

Add sandboxed agent runtime package#1220
Connoropolous wants to merge 6 commits into
mainfrom
codex/agent-runtime-sandbox-harnesses

Conversation

@Connoropolous
Copy link
Copy Markdown
Contributor

@Connoropolous Connoropolous commented May 15, 2026

Summary

  • Add a new cyrus-agent-runtime package for unified agent sessions across harnesses and sandbox providers.
  • Add local and ComputeSDK-backed sandbox abstractions, harness adapters, tests, assumptions, and validation notes.
  • Add ComputeSDK dependency wiring and dependency overrides needed for a clean audit.
  • Live process streaming: optional streamCommand capability on RunnerSandbox with onStdout / onStderr callbacks, AbortSignal cancellation, and AsyncIterable<string> stdin. Local provider streams natively via child_process.spawn; Daytona is reached through a pluggable NativeStreamAdapter registry that unwraps ComputeSDK's ProviderSandbox.getInstance() to drive @daytonaio/sdk async sessions + getSessionCommandLogs(onStdout, onStderr). RuntimeAgentSession.start() now prefers streaming when available, line-buffers across packet boundaries, and emits TranscriptEvents live. New opt-in interactiveInput flag routes addMessage() chunks into the running process's stdin.
  • Folders and repositories as first-class session config, distinct from volumes. RuntimeFolderConfig uploads a host folder into the sandbox and (with access: "readwrite") syncs sandbox edits and newly-created files back to the host after the harness completes. RuntimeRepositoryConfig runs git clone inside the sandbox at mountPath with optional branch checkout and depth shallow-clone; local-path sources are rewritten to file://... to preserve git semantics, and shallow clones with a branch use --branch on the clone itself.
  • stop() and destroy() decoupled. stop() cancels the in-flight run only — does NOT release the sandbox. destroy() is the sole sandbox-release path, exposed symmetrically on AgentSession and AgentSessionResult (sharing one internal one-shot teardown). AgentSession.destroy() implicitly cancels an in-flight run via stop() first. Equates to ComputeSDK's ProviderSandbox.destroy() for ComputeSDK-backed providers. Lets future workflows reuse a warm sandbox across runs (a single run's stop() no longer destroys shared compute).

Original Criteria Coverage

Session:

  • Session id

Config:

  • Env vars
  • Env secrets
  • Model
  • Harness selection: Claude, Codex, Cursor, Gemini, PI, OpenCode
  • Network egress
  • System prompt
  • User prompt
  • Packages such as CLI tools and system dependencies
  • Plugins (MCPs, hooks, skills, ...)
  • Allowed tools / permission mode
  • Memory
  • Sandbox providers
  • Repositories Read/write including branch optionally
  • Folders Read/Write
  • Volumes, including future FUSE-style volumes

Interface / Runtime abstractions:

  • Permission prompt callbacks
  • Transcript events emitted as arbitrary JSON envelopes
  • Interrupt / stop (cancel-run-only; sandbox stays alive)
  • Add message / queue message
  • Live stdout/stderr streaming with line-buffered transcript emission
  • Live stdin via AsyncIterable<string> (opt-in interactiveInput)
  • destroy() on both AgentSession and AgentSessionResult (ComputeSDK-aligned) — sole sandbox-release path

Purpose:

  • Unified interface over all harnesses and sandbox providers

Infra:

  • Use ComputeSDK for sandbox provider vendor list and abstraction
  • Treat local execution as a provider aligned with the same sandbox interface
  • Pluggable NativeStreamAdapter registry lets each ComputeSDK provider reach its native streaming primitives without coupling agent-runtime to any specific provider SDK

Transcript Event Flow

  • Sandbox providers own execution: runCommand(...) is the buffered one-shot path; streamCommand(...) is the live chunk-delivery path. Adapters check capabilities.streamingProcess to pick.
  • Harness adapters own parsing: stdout/stderr lines are passed through the selected adapter, which maps provider-native JSON/text events into normalized TranscriptEvent envelopes.
  • RuntimeAgentSession owns emission: it emits setup.*, file.write.*, folder.materialize.* / folder.syncback.*, repository.materialize.*, parsed assistant/tool/result events, and the final result event — as they arrive when the sandbox supports streaming.
  • Daytona live streaming reaches the underlying @daytonaio/sdk Sandbox through the ComputeSDK ProviderSandbox.getInstance() escape hatch and uses async sessions + getSessionCommandLogs(onStdout, onStderr) callbacks. Other ComputeSDK providers can ship their own NativeStreamAdapter via ComputeSdkSandboxProviderOptions.nativeStreamAdapters.

Filesystem Concepts at the Session Level

Three deliberately distinct concepts, each with its own materializer + transcript lifecycle:

  • Files (RuntimeFileConfig[]) — small one-off inline strings written into the sandbox before setup. Supports sensitive: true for redacted transcript logging.
  • Folders (RuntimeFolderConfig[]) — host filesystem folders bind/copy-synced into the sandbox; host is source of truth. access: "read" copies in only; access: "readwrite" syncs sandbox edits and newly-created files back to the host after the harness completes. Supports exclude globs.
  • Repositories (RuntimeRepositoryConfig[]) — git-driven trees materialized via git clone inside the sandbox at mountPath. Supports optional branch checkout (including SHAs/tags on full clones) and depth shallow-clone (auto-1 for access: "read"). Local-path sources are converted to file://... to preserve git semantics.
  • Volumes (RuntimeVolumeConfig[]) — provider-attached persistent storage with a lifecycle independent of the sandbox (Docker volumes, EBS volumes, FUSE mounts). Distinct from folders because the source of truth is the provider, not the host filesystem.

Lifecycle: stop vs. destroy

Two single-purpose operations:

  • stop(reason?) — cancel the in-flight run. Aborts the running harness process, closes the live event stream, closes the input pipe. Sandbox stays alive. Idempotent. Available on AgentSession only (no meaning post-run).
  • destroy() — release the sandbox (ComputeSDK ProviderSandbox.destroy() for remote, no-op for local). If a run is in flight, calls stop() first so the harness terminates cleanly before teardown. Idempotent. Available on both AgentSession and AgentSessionResult, sharing the same one-shot internal teardown promise — so calling either or both in any order is safe.

This decoupling enables the future EnvironmentFactory model (CYPACK-1209) where multiple runs share one warm sandbox: a single run's stop() no longer destroys shared compute.

Validation

  • pnpm --filter cyrus-agent-runtime typecheck (clean)
  • pnpm --filter cyrus-agent-runtime test:run (29 tests passing)
  • pnpm --filter cyrus-agent-runtime build (clean)
  • pnpm audit (clean)
  • pre-commit full pnpm build (clean)
  • pre-commit full pnpm typecheck (clean)
  • Real Daytona Claude smoke (raw streamCommand): installed Claude Code remotely, captured system / assistant / result transcript events, received daytona claude event smoke ok.
  • Real local streaming spike: chunks arrived at the exact 400ms cadence the child process emitted them (first chunk @ 429ms, final exit @ 2032ms).
  • Real local runtime spike via createAgentSession + codex exec: thread.started / turn.started emitted at ~172ms; turn.completed at ~8861ms (8.6s spread).
  • Real Daytona streaming spike: Claude stream-json system event landed 1.7s before result event over a remote sandbox.
  • Real Daytona + Claude full-runtime spike via createAgentSession: 3 local notes uploaded to /home/daytona/notes via folder.materialize.*, Claude Code installed via 3 setup commands, claude invoked with stream-json output, 20 transcript events emitted live over a 7.2s harness window, final extracted result "alpha=Alpha Note, beta=Beta Note, gamma=Gamma Note", result.destroy() invoked twice (first destroyed the Daytona sandbox, second was a clean no-op).
  • Folder materialization tests against a real local sandbox: host tree uploaded, exclude globs honored; read-write sync-back picks up both edits and newly-created files.
  • Repository materialization test against a real local git repo: git clone runs inside the sandbox, branch checkout works on both full and shallow clones.
  • stop() / destroy() decoupling tests: stop() provably does NOT destroy the sandbox; destroy() is the only release path; both surfaces (AgentSession.destroy() and AgentSessionResult.destroy()) share a one-shot; AgentSession.destroy() mid-run cancels the in-flight harness before releasing.

Follow-ups (not in this PR)

  • Plugins replace MCP config — design accepted (Claude-superset shape, graceful degradation, both rootPath and inline content supplied, runtime transfers local paths to remote sandbox, our own cyrus-plugin.json manifest format). RuntimePlugin will bundle MCP servers + skills + hooks + commands + agents + contextFile + permissions. Per-harness PluginMaterializers translate one declaration into Claude / Cursor / Codex / Gemini native filesystem state inside the sandbox. To be implemented on this branch in a follow-up commit.
  • Hashed EnvironmentFactory for run/environment split — captured as a deferred speculation in CYPACK-1209. Splits CreateAgentSessionConfig into a per-call RunConfig and a cached-by-hash EnvironmentConfig, amortizing expensive remote-sandbox spin-up across runs.
  • Should env apply to setup commands or only to the harness? Found while writing the Daytona+Claude runtime spike: overriding PATH in session env breaks setup commands that need to find a binary via the container's default PATH. Today env applies to both; two reasonable alternatives are (a) setup gets only sandbox-default env, (b) split into setupEnv and harnessEnv. To be filed as a Linear ticket.

Tip: I will respond to comments that @ mention @cyrusagent on this PR. You can also submit a review with all your feedback at once, and I will automatically wake up to address each comment.

…dboxes

Add an optional `streamCommand(command, options)` capability to
`RunnerSandbox`, with `onStdout` / `onStderr` chunk callbacks, an
`AbortSignal` for cancellation, and an `AsyncIterable<string> input`
option for live stdin. Local provider implements it via
`child_process.spawn`; Daytona is reached through a pluggable
`NativeStreamAdapter` registry that unwraps ComputeSDK's
`ProviderSandbox.getInstance()` to the native `@daytonaio/sdk` Sandbox
and uses async sessions + `getSessionCommandLogs(onStdout, onStderr)`.

`RuntimeAgentSession.start()` now prefers `streamCommand` when
`capabilities.streamingProcess` is true, line-buffers chunks across
packet boundaries, and emits `TranscriptEvent`s as the harness CLI
produces them. New `interactiveInput` opt-in routes `addMessage()`
into the running process's stdin (default off — most one-shot CLIs
block on a piped-but-never-closed stdin).

Verified end-to-end:
- local `spawn`: chunks land at the exact 400ms cadence the child emits
- real `codex exec` via `createAgentSession`: events emitted ~8.6s
  before turn end
- real Daytona Claude `stream-json`: system event landed 1.7s before
  result event over a remote sandbox
…config

Add two materialization concepts to `CreateAgentSessionConfig`, deliberately
distinct from the existing `volumes` (provider-attached persistent storage):

- `RuntimeFolderConfig` — exposes a host filesystem folder inside the
  sandbox. Walks the host tree and uploads each file via
  `SandboxFilesystem.writeFile`. Supports `exclude` globs. With
  `access: "readwrite"` the runtime syncs sandbox edits and any
  newly-created files back to the host folder after the harness
  command completes.
- `RuntimeRepositoryConfig` — runs `git clone` inside the sandbox at
  `mountPath` with optional `branch` checkout and `depth` shallow-clone.
  Local-path sources are rewritten to `file://...` to preserve git
  semantics. Shallow clones with a branch use `--branch` on the clone
  itself, since `git checkout` of a non-default branch fails after a
  shallow clone.

Both emit lifecycle transcript events (`folder.materialize.*`,
`folder.syncback.*`, `repository.materialize.*`) and run after files
but before package setup commands, so setup steps that depend on the
cloned tree or the mounted folder see them ready.

27 tests pass (5 new): one materializer unit test per concept and
one runtime-level integration test verifying that the session wires
each through to the right sandbox calls and emits the right events.
Equates to ComputeSDK's ProviderSandbox.destroy() for ComputeSDK-backed
providers (deletes the remote sandbox, releases compute resources) and
is a no-op for the local provider. Lets a caller hold only the result
object, consume events/result, then tear down without keeping a
reference to the session.

Idempotent — backed by a one-shot destroy promise on the session that
both `AgentSession.stop()` and `AgentSessionResult.destroy()` share, so
callers can call either or both in any order without double-destroying
the underlying ComputeSDK / local sandbox.

Verified with a new test that asserts:
  - the returned result exposes destroy()
  - calling result.destroy() invokes sandbox.destroy() exactly once
  - calling result.destroy() twice is a no-op the second time
  - calling session.stop() after result.destroy() does not double-destroy
`stop()` and `destroy()` were doing two unrelated things bundled into
one method. Split them.

`stop()` now cancels the in-flight run only — aborts the harness
process, closes the live event stream, closes the input pipe — and
leaves the sandbox alive. This enables future workflows that reuse a
warm sandbox across runs (per CYPACK-1209): a single run's `stop()`
no longer destroys shared compute.

`destroy()` is the sole sandbox-release path. It exists symmetrically
on both `AgentSession` and `AgentSessionResult` (sharing a one-shot
internal teardown promise). `AgentSession.destroy()` also implicitly
cancels an in-flight run via `stop()` before releasing the sandbox,
so callers don't need a two-step.

Pre-1.0 package, clean break — no consumers to migrate.
@Connoropolous
Copy link
Copy Markdown
Contributor Author

Superseded by #1229 — same foundational cyrus-agent-runtime work, plus everything built on top since: Daytona volumes + subpath per-binding isolation, RuntimePlugin (MCP servers / hooks / skills) replacing the standalone mcps field, @cyrus-ai/cursor-runner extracted as a published driver, typed AgentSession<H> events per harness, multi-turn session.run() with per-session state backing, caller-driven harness-session resume, sandbox.persistentState abstraction (with verified per-harness buildStateEnv mappings for claude / cursor / codex / gemini / opencode), destroyWhileInactive, Daytona base-snapshot harness binaries, and edge-worker AgentChatSessionHandler integration.

Closing here, please review at #1229.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant