Sandboxed agent runtime, plugins, Daytona volumes, persistent state, and edge-worker chat integration by Connoropolous · Pull Request #1229 · cyrusagents/cyrus

Connoropolous · 2026-05-20T01:48:25Z

Supersedes #1220 — same foundational agent-runtime work, plus everything built on top since.

Summary

New cyrus-agent-runtime package: unified AgentSession across harnesses (Claude / Codex / Cursor / Gemini / OpenCode) and sandbox providers (local + ComputeSDK / Daytona), with live streamCommand, line-buffered transcript emission, opt-in interactiveInput, and a decoupled stop() (cancel-run-only) / destroy() (sole sandbox-release path) lifecycle.
New @cyrus-ai/cursor-runner package (published): thin CLI wrapper around @cursor/sdk that emits SDKMessage JSONL. Lets the cursor harness adapter consume a typed, version-pinned wire format owned by us.
Plugins: RuntimePlugin bundles MCP servers + hooks + skills. Per-harness materializers translate one declaration into Claude / Cursor / Codex native filesystem state inside the sandbox; the bundled MCP-config path replaces the old standalone mcps field.
Multi-turn sessions: session.run(prompt) callable repeatedly with per-session state backing surviving across calls; first turn materializes files/folders/repos and runs setup, later turns resume with the harness's native flag (Claude --continue, etc.). Caller-driven cross-binding resume via resumeHarnessSessionId ↔ harnessSessionId round-trip (--resume <id> for Claude, --agent-id <id> for Cursor).
Daytona volume mounting: RuntimeVolumeConfig with provider-driven kind: \"bind\" | \"fuse\" | \"provider\", plus subpath for per-binding isolation when many sessions share one provider volume.
sandbox.persistentState: { volume, bindingId }: consumer-facing abstraction that hides the per-harness state-env-var math. Runtime mounts the caller's volume at a fixed internal path with bindingId as subpath, then calls each adapter's new buildStateEnv(mountPath) hook to inject the right env vars. Mappings verified upstream — not guessed:
- claude → CLAUDE_CONFIG_DIR = ${m}/.claude
- cursor → CURSOR_DATA_DIR = ${m}/.cursor
- codex → CODEX_HOME = ${m}/.codex (codex-rs/utils/home-dir/src/lib.rs)
- gemini → GEMINI_CLI_HOME = ${m} (CLI appends .gemini itself; @google/gemini-cli-core::homedir())
- opencode → all four XDG_*_HOME dirs under ${m}/.opencode-xdg/{config,data,state,cache} (no app-specific override exists)
Typed events per harness: AgentSession<H> generic threads the right SDK union into event.raw — harness: \"claude\" reads SDKMessage with no cast, etc. Added transcript() snapshot accessor for cross-turn replay.
destroyWhileInactive: optional flag that pauses (stops + later resumes) the underlying sandbox between run() calls. For Daytona this maps to sandbox.stop() / sandbox.start() — preserves all on-disk state at a few-second resume cost.
Daytona base snapshot path: pre-installed harness binaries inside the snapshot; harness.command lets the adapter spawn the snapshot-resident binary directly (Cursor uses this for cursor-runner).
Folders and repositories as first-class session config, separate from volumes (RuntimeFolderConfig with readwrite sync-back, RuntimeRepositoryConfig running git clone inside the sandbox with optional branch / depth).
Edge-worker AgentChatSessionHandler: picks provider from EdgeConfig.defaultProvider, wires MCP servers via the new plugin shape, forwards Daytona chat sandbox snapshot + custom layout + bypass perms, fixes ANTHROPIC_API_KEY vs ANTHROPIC_AUTH_TOKEN precedence.
SDK pinning: lockstep-pinned dev-dep SDK versions across all harness adapters so the runtime's typed event unions can't drift from the CLI wire format. Bumped Claude CLI install version pin for Daytona setup.

Validation

pnpm --filter cyrus-agent-runtime typecheck — clean
pnpm --filter cyrus-agent-runtime test:run — 57 tests passing
pnpm --filter cyrus-agent-runtime build — clean
pnpm build + pnpm typecheck across the monorepo — clean
pnpm audit — clean
(Carried forward from Add sandboxed agent runtime package #1220) Real Daytona Claude smoke via createAgentSession; live local streaming spike; folder materialization tests against real local sandbox with exclude + sync-back; repository materialization with full and shallow clones + branch checkout; stop() / destroy() decoupling proofs.

Follow-ups (not in this PR)

Hashed EnvironmentFactory for run/environment split — captured as CYPACK-1209.
env precedence on setup commands vs harness (one env field overriding PATH can break setup) — to be filed.
Codex hooks materializer is deferred — see inline comments citing the upstream blockers.

Test plan

CI green
Reviewer sanity-check on the new cyrus-agent-runtime public surface (AgentSession<H>, createAgentSession, RuntimePlugin, sandbox.persistentState)
Reviewer sanity-check on the edge-worker AgentChatSessionHandler provider wiring + MCP server forwarding

…dboxes Add an optional `streamCommand(command, options)` capability to `RunnerSandbox`, with `onStdout` / `onStderr` chunk callbacks, an `AbortSignal` for cancellation, and an `AsyncIterable<string> input` option for live stdin. Local provider implements it via `child_process.spawn`; Daytona is reached through a pluggable `NativeStreamAdapter` registry that unwraps ComputeSDK's `ProviderSandbox.getInstance()` to the native `@daytonaio/sdk` Sandbox and uses async sessions + `getSessionCommandLogs(onStdout, onStderr)`. `RuntimeAgentSession.start()` now prefers `streamCommand` when `capabilities.streamingProcess` is true, line-buffers chunks across packet boundaries, and emits `TranscriptEvent`s as the harness CLI produces them. New `interactiveInput` opt-in routes `addMessage()` into the running process's stdin (default off — most one-shot CLIs block on a piped-but-never-closed stdin). Verified end-to-end: - local `spawn`: chunks land at the exact 400ms cadence the child emits - real `codex exec` via `createAgentSession`: events emitted ~8.6s before turn end - real Daytona Claude `stream-json`: system event landed 1.7s before result event over a remote sandbox

…config Add two materialization concepts to `CreateAgentSessionConfig`, deliberately distinct from the existing `volumes` (provider-attached persistent storage): - `RuntimeFolderConfig` — exposes a host filesystem folder inside the sandbox. Walks the host tree and uploads each file via `SandboxFilesystem.writeFile`. Supports `exclude` globs. With `access: "readwrite"` the runtime syncs sandbox edits and any newly-created files back to the host folder after the harness command completes. - `RuntimeRepositoryConfig` — runs `git clone` inside the sandbox at `mountPath` with optional `branch` checkout and `depth` shallow-clone. Local-path sources are rewritten to `file://...` to preserve git semantics. Shallow clones with a branch use `--branch` on the clone itself, since `git checkout` of a non-default branch fails after a shallow clone. Both emit lifecycle transcript events (`folder.materialize.*`, `folder.syncback.*`, `repository.materialize.*`) and run after files but before package setup commands, so setup steps that depend on the cloned tree or the mounted folder see them ready. 27 tests pass (5 new): one materializer unit test per concept and one runtime-level integration test verifying that the session wires each through to the right sandbox calls and emits the right events.

Equates to ComputeSDK's ProviderSandbox.destroy() for ComputeSDK-backed providers (deletes the remote sandbox, releases compute resources) and is a no-op for the local provider. Lets a caller hold only the result object, consume events/result, then tear down without keeping a reference to the session. Idempotent — backed by a one-shot destroy promise on the session that both `AgentSession.stop()` and `AgentSessionResult.destroy()` share, so callers can call either or both in any order without double-destroying the underlying ComputeSDK / local sandbox. Verified with a new test that asserts: - the returned result exposes destroy() - calling result.destroy() invokes sandbox.destroy() exactly once - calling result.destroy() twice is a no-op the second time - calling session.stop() after result.destroy() does not double-destroy

`stop()` and `destroy()` were doing two unrelated things bundled into one method. Split them. `stop()` now cancels the in-flight run only — aborts the harness process, closes the live event stream, closes the input pipe — and leaves the sandbox alive. This enables future workflows that reuse a warm sandbox across runs (per CYPACK-1209): a single run's `stop()` no longer destroys shared compute. `destroy()` is the sole sandbox-release path. It exists symmetrically on both `AgentSession` and `AgentSessionResult` (sharing a one-shot internal teardown promise). `AgentSession.destroy()` also implicitly cancels an in-flight run via `stop()` before releasing the sandbox, so callers don't need a two-step. Pre-1.0 package, clean break — no consumers to migrate.

…ndler Brutal spike — wire the Slack chat session lifecycle through cyrus-agent-runtime's createAgentSession instead of the legacy IAgentRunner + AgentSessionManager + RunnerConfigBuilder stack. Removed: - packages/edge-worker/src/ChatSessionHandler.ts (515 lines) - packages/edge-worker/test/chat-sessions.test.ts - EdgeWorker.getDefaultModelForRunner / getDefaultFallbackModelForRunner (only used by the deleted chat-session createRunner callback) - EdgeWorker.getChatThreadLastReply stub now returns null (F1 tests that depended on the runner's getMessages() need a new approach) Added: - packages/edge-worker/src/AgentChatSessionHandler.ts — ~280-line replacement that drives createAgentSession per Slack mention, posts the harness-extracted result back to Slack, destroys the sandbox in a finally. Modified: - SlackChatAdapter.postReply(event, runner: IAgentRunner) → postReply(event, finalText: string). Decouples the adapter from the runner machinery. - EdgeWorker wiring: drops runnerConfigBuilder/createRunner/ onStateChange/onClaudeError from chat-session deps, replaces shutdown's getAllRunners() with chatSessionHandler.shutdown(). - package.json: adds cyrus-agent-runtime workspace dep. Brutal cuts (documented in the new file's header): - No multi-turn --continue resume — each Slack mention is a fresh AgentSession. Conversation continuity comes from the adapter's fetchThreadContext() injecting prior thread messages as text. - No mid-flight stream injection — busy threads get notifyBusy. - No MCPs — agent-runtime doesn't yet wire mcps through to the harness CLI, and the in-process cyrus-tools server wouldn't translate across the subprocess boundary anyway. Slack chat sessions run with the Claude CLI default toolset only. - Claude harness only — no runner selection. - No persisted session state across restarts. Validation: - pnpm typecheck (clean across monorepo) - pnpm test:packages:run (601 edge-worker tests, 114 claude-runner, 198 gemini-runner, 62 slack-event-transport, etc. — all green)

You asked for the Slack chat replacement to use the Daytona+Claude flow we just validated end-to-end; the first cut accidentally used the local sandbox. Switching to Daytona: - Each Slack mention spawns a fresh Daytona sandbox at /home/daytona (timeout 5min, name `cyrus-slack-<sessionId>`, metadata tagged `purpose: cyrus-slack-chat` for visibility in Daytona's console). - Setup commands install @anthropic-ai/claude-code with a user-local npm prefix and verify the version — same script that worked in the streaming-spike daytona-runtime probe. - Harness command is the full path `/home/daytona/.npm-global/bin/claude` (no PATH override at the env level — that broke npm in the earlier spike). - Secrets carry CLAUDE_CODE_OAUTH_TOKEN and ANTHROPIC_AUTH_TOKEN from the EdgeWorker process env into the sandbox. - Compute SDK is configured once per process via a module-level guard. - Refuses to construct without DAYTONA_API_KEY in env. - Posts a clear failure message if the Claude token is missing. Drops the unused createWorkspace helper and cyrusHome/ chatRepositoryProvider deps — Daytona owns its own working dir and the adapter holds its own ChatRepositoryProvider. Adds @computesdk/daytona as a direct edge-worker dep (was only transitively available before). Validation: - pnpm typecheck (clean across monorepo) - pnpm --filter cyrus-edge-worker test:run (601 tests)

The @daytonaio/sdk package compiles with TypeScript's `importHelpers` option but doesn't declare `tslib` as a runtime dependency. Without this, `import "tslib"` from the SDK fails at runtime with ERR_MODULE_NOT_FOUND when the SDK is loaded through @computesdk/daytona inside cyrus-edge-worker / cyrus-agent-runtime. `pnpm.packageExtensions` patches the upstream package.json at install time so pnpm installs tslib alongside @daytonaio/sdk, making the ESM resolver's bare-import lookup succeed. Verified: `import("@computesdk/daytona")` followed by an actual provider instantiation no longer hits ERR_MODULE_NOT_FOUND on tslib. Upstream fix is needed in @daytonaio/sdk itself; this workaround can be removed once they ship a version with tslib in dependencies.

The previous fix (pnpm.packageExtensions) added the tslib symlink under @daytonaio/sdk's isolated node_modules but did NOT rewrite the on-disk package.json. Node.js's standard ESM resolver doesn't care, but import-in-the-middle (used by @opentelemetry instrumentation) hooks the resolve step and validates against the importer's declared deps — because tslib wasn't in @daytonaio/sdk's package.json, the hook rejected the bare specifier even though the symlink was present. Use pnpm.patchedDependencies instead: writes a real patch file under patches/ that adds `"tslib": "^2"` to @daytonaio/sdk's dependencies on disk. The install directory hash now includes a patch_hash suffix (visible in error paths if it ever fails again), so it's easy to tell whether the patch applied at all. Keeps the packageExtensions entry too as belt-and-suspenders. To pick this up on a running host: git pull rm -rf node_modules # force re-link pnpm install pnpm build The rm -rf is the key step — pnpm sees the lockfile as up to date if the high-level dep graph hasn't changed and skips re-linking, which left the previous fix's tslib symlink in place but unused. Upstream fix is needed in @daytonaio/sdk; remove patches/ when they ship a version with tslib in dependencies.

Restructure the agent-runtime public API so an AgentSession is a long-lived handle that can be run multiple times against the same sandbox, with per-session state backing that makes the next run automatically resume the prior conversation. API changes (breaking, pre-1.0): - AgentSession.start() → AgentSession.run(userPrompt). Each call is one turn. First call materializes files/folders/repos and runs setup; subsequent calls skip all that and invoke the harness with its continue flag. - CreateAgentSessionConfig drops `userPrompt` (now passed per-turn to run()) and gains `agentSessionsRoot?: string` (default `~/.cyrus-agent-sessions/`). - HarnessAdapter gains `stateDirectories: readonly string[]` declaring the relative paths under HOME where the harness keeps its session state — Claude `.claude`, Codex `.codex`, Gemini `.gemini`. - HarnessAdapter.buildCommand now takes a second `HarnessRunOptions` argument with `userPrompt` and `continueSession: boolean`. Adapters map `continueSession` to their CLI's resume flag (Claude `--continue`) and suppress system-prompt injection on continuation. Runtime mechanics: - RuntimeAgentSession provisions `~/.cyrus-agent-sessions/<sessionId>/` per session and sets HOME to that dir on every harness invocation. Per-session HOME means concurrent local sessions don't trample each other's `.claude/projects/...jsonl` state, and resume Just Works because the .claude directory is naturally persistent between turns. - For Daytona, same mechanism — HOME inside the sandbox is the per-session backing path, and since the sandbox stays warm between run() calls, .claude/ survives there too. - session.stop() now cancels only the in-flight run (per-run abort controller) and does NOT destroy the sandbox. session.destroy() is the sole sandbox-release path and also runs folder syncback. - folder syncback moved from end-of-run to session.destroy() — same rationale as stop/destroy split. Tests: 30 passing (was 29). New test verifies multi-turn run() with the second turn passing --continue and skipping setup. Chat handler update (Slack): AgentChatSessionHandler rewritten for the warm-thread pattern: - threadSessions: Map<threadKey, { session, lastActivityAt, inFlight }> - First mention: createAgentSession (Daytona, Claude, install setup); state kept warm. - Subsequent mentions on same thread: session.run() reuses the warm sandbox via --continue. Setup commands don't re-run. - Concurrent mention while a run is in-flight: notifyBusy (no stdin injection yet). - Idle TTL (default 15min): periodic sweep destroys idle threads. - Run failure: destroy + free slot so next mention is a clean start. - Shutdown: clear sweep timer + destroy all warm sessions. Validation: - pnpm typecheck (clean across monorepo) - pnpm --filter cyrus-agent-runtime test:run (30 tests) - pnpm --filter cyrus-edge-worker test:run (601 tests)

Adds a sandbox.destroyWhileInactive flag to CreateAgentSessionConfig that pauses (Daytona: sandbox.stop()) the underlying sandbox after every session.run() returns and resumes it (sandbox.start()) before the next run. State on disk inside the sandbox (including ~/.claude/) is preserved by Daytona during stop, so the next turn's `--continue` finds the prior conversation intact at much lower cost than a from-scratch recreate. For the local sandbox the flag is a no-op (local sessions are always free). For Daytona it surfaces as new transcript events: sandbox.pause.started/completed/failed and sandbox.resume.started/completed/skipped. AgentChatSessionHandler turns the flag on so Slack chat threads stop billing compute between mentions. Three real proofs added under test-scripts/, all validated against real Daytona+Claude: resume-proof.mjs local — local sandbox, multi-turn resume-proof.mjs daytona-warm — Daytona warm, multi-turn resume-proof.mjs daytona-efficient — Daytona pause/resume, multi-turn slack-handler-proof.mjs — full AgentChatSessionHandler flow with two mentions (pause between) Each proof gives Claude a code word in turn 1 and verifies the turn-2 reply repeats it back, proving --continue actually preserved the conversation across the lifecycle event being tested. Recorded results: - local: turn1 4.1s "noted" turn2 4.4s "BANANA-7" - daytona-warm: turn1 19.8s "noted" turn2 6.7s "BANANA-7" - daytona-efficient: turn1 17.9s "noted" turn2 8.4s "BANANA-7" - slack-handler-proof: m1 19.9s "noted" m2 8.3s "BANANA-7" Other fixes in this commit: - session.ts no longer overrides HOME for any provider. The earlier override broke Claude auth locally (empty ~/.claude/) and silently produced no stream-json events on Daytona (HOME pointed at a host path that didn't exist inside the remote sandbox). The Daytona sandbox preserves state across stop/start so its natural /home/daytona HOME is the right answer for both warm and destroyWhileInactive modes. - session.ts can now resume a paused sandbox at destroy() time so syncFoldersBack still has a live sandbox to read from. Validation: - pnpm typecheck (clean across monorepo) - pnpm --filter cyrus-agent-runtime test:run (30 tests) - pnpm --filter cyrus-edge-worker test:run (601 tests) - Four real proofs above, all PASSED

… hooks + skills) Introduces a provider-agnostic RuntimePlugin shape and per-harness materializers that translate ONE declaration into Claude-, Cursor-, or Codex-native filesystem state (or CLI flags). Each materializer was developed against a real CLI smoke test before any code landed. ### Public surface CreateAgentSessionConfig grows `plugins: PluginInput[]` where PluginInput is either an inline RuntimePlugin or `{ rootPath: string }` (rootPath resolution is stubbed for v1 — inline only is fully implemented). The shape: interface RuntimePlugin { name: string; version?: string; description?: string; mcpServers?: Record<string, McpServerRuntimeConfig>; hooks?: PluginHook[]; skills?: PluginSkill[]; } Hook events are a universal subset: PreToolUse, PostToolUse, SessionStart, Stop, UserPromptSubmit. Each materializer maps these to harness-native names and silently drops events that don't translate. ### Per-harness materialization Claude — materializePluginForClaude writes: <workingDirectory>/.cyrus-plugins/<name>/ .claude-plugin/plugin.json .mcp.json (when mcpServers present) hooks/hooks.json (when hooks present) skills/<skillName>/SKILL.md (+ optional assets) The Claude harness adapter appends `--plugin-dir <pluginDir>` plus `--mcp-config <path> --strict-mcp-config` to the `claude -p` invocation. Cursor — materializePluginForCursor writes: <workspaceRoot>/.cursor/ mcp.json (merged across plugins) hooks.json (merged across plugins) skills/<skillName>/SKILL.md (+ optional assets) The Cursor adapter appends `--approve-mcps` when any plugin declared MCP servers (otherwise headless cursor-agent silently drops them). Codex — materializePluginForCodex: - Writes skills to `$HOME/.agents/skills/<name>/SKILL.md` plus `agents/openai.yaml` for the OpenAI runtime. Codex skill discovery is rooted at $HOME/.agents/skills/ (verified empirically — NOT $CODEX_HOME/skills/ as the docs suggested). - Returns MCP servers as inline `-c 'mcp_servers.<name>={...}'` TOML overrides on the CLI — no file write. - The session env-merges `HOME = <session-state-dir>` for codex runs so the materialized skills are isolated. - Hooks deferred for v1 (Codex hooks schema is version-pinned). ### Plugin lifecycle In RuntimeAgentSession.run() first-turn materialization order: files → folders → repositories → plugins → setup → harness Materializer outputs are persisted on the session so subsequent turns re-pass the same CLI flags via HarnessRunOptions.pluginOutputs. New transcript events: plugin.materialize.{started,completed, skipped,failed}. ### Validation Three CLI smoke tests, each writing a minimal plugin tree and verifying the real CLI loads it: - Claude: --plugin-dir loads .claude-plugin/plugin.json + SKILL.md; skill triggered, response = HELLO-FROM-PLUGIN. - Cursor: .cursor/skills/<n>/SKILL.md auto-discovered; skill triggered, response = HELLO-FROM-CURSOR-SKILL. - Codex: $HOME/.agents/skills/<n>/SKILL.md discovered with HOME override; response = HELLO-FROM-CODEX-HOMEAGENTS. `-c mcp_servers.<name>={...}` confirmed routing through to codex's MCP runtime. End-to-end plugin-proof.mjs against real Daytona + Claude: cold sandbox + Claude install + plugin materialization + skill-triggered run, response = "HELLO-FROM-PLUGIN" in 13.9s. Unit test added to runtime.test.ts (31 tests total) asserting the on-disk plugin tree shape AND the harness command-line flags. Surprises corrected vs. the original matrix: - Cursor DOES have first-class SKILL.md (not just rules). - Codex skills are at $HOME/.agents/skills/ not $CODEX_HOME/skills/. - Codex MCP can be passed entirely via -c flags, no file write.

After learning tests against codex 0.130.0, the hook engine exists for all documented events but `codex exec` filters every newly-discovered hook through a trust gate (see hooks/src/engine/discovery.rs). Trust comes from a TUI `/hooks` review step that exec mode has no access to, and the `bypass_hook_trust` field is hidden + doesn't fire hooks when set via `-c bypass_hook_trust=true` in our tests. Document the investigation in the codex materializer so future revisits don't repeat the rabbit hole. Skills + MCP servers continue to materialize correctly; only `plugin.hooks` is silently dropped.

The standalone `mcps` field was a back-compat carryover from the pre-plugin design. It hadn't been wired through the runtime since the RuntimePlugin abstraction landed, so callers got silent no-ops if they used it. Remove it from both the TypeScript surface and the zod schema so the API is honest about plugins being the only path to MCP servers. A plugin with `mcpServers` populated and `hooks`/`skills` omitted is the standard "MCP-only" carrier — the materializer fans it out into each harness's native shape (Claude plugin tree, .cursor/mcp.json, codex `-c mcp_servers.*` overrides).

…t in 0.130.0 Earlier comment implied bypass_hook_trust existed but was "not plumbed correctly." Re-checked the installed binary directly: `strings` on codex 0.130.0 has zero occurrences of bypass_hook_trust / bypass-hook-trust / bypassHookTrust. The field exists on the codex `main` branch but was added after 0.130.0. So `--bypass-hook-trust` genuinely doesn't exist as a CLI flag, and `-c bypass_hook_trust=true` is a silent no-op because nothing reads that key in this release. The conclusion (defer codex hooks until trust-bypass exists or codex pre-trusts plugin-bundled hooks) is unchanged.

…alization Rewrote the codex hooks deferral comment to point at the two open upstream issues that block our use case end-to-end: 1. openai/codex#21639 — direct config-layer hooks (hooks.json, [[hooks.X]] in config.toml) stopped firing in 0.129.0+ versus the working 0.128.0-alpha.1 baseline. Independently confirmed by ≥5 users across multiple releases; we reproduced on 0.131.0 with every combination of feature flags and bypass-trust. 2. openai/codex#16430 — plugin manifest `hooks` field silently dropped by the manifest parser; discovery walker never scans the installed-plugin tree. Confirmed via `codex plugin list` showing "(installed, enabled)" while plugin-bundled hooks never register, regardless of [features].plugin_hooks state. The 0.131.0 --dangerously-bypass-hook-trust flag is real but doesn't help: #21639 prevents discovery from finding any hook to bypass- trust, and #16430 prevents the plugin manifest from contributing any hook to discovery in the first place. Revisit conditions and a fallback materialization plan (write a session-local hooks.json under per-session CODEX_HOME if #21639 closes ahead of #16430) are now documented inline.

Introduces a new optional `defaultProvider` field on EdgeConfig backed by a ProviderTypeSchema enum that currently accepts "local" or "daytona". This lets users configure the default sandbox provider for sessions without needing to pass it explicitly per call. Additional provider backends (other ComputeSDK targets) can be added to the enum as they're wired through the runtime. - New ProviderTypeSchema / ProviderType exports from core - Re-exported from config-types.ts so consumers get them via the standard core public surface - Field placed next to defaultRunner for discoverability — both are "default" settings for runtime selection - Regenerated JSON schema artifacts under packages/core/schemas/

…nfig.defaultProvider The chat handler is no longer hardwired to Daytona. It now accepts a `provider: ProviderType` dependency (defaulting to `"local"`) and branches its session-config build accordingly: - **local**: harness runs on the host (`claude` from `PATH` or via the optional `claudeCliPath` override); no `DAYTONA_API_KEY` needed; the runtime gives each session its own HOME under `~/.cyrus-agent-sessions/<id>/` so `.claude/` is isolated and resumable across `--continue` turns. - **daytona**: existing behavior — fresh sandbox seeded via `npm install -g @anthropic-ai/claude-code`, paused between turns via `destroyWhileInactive`, destroyed on idle TTL eviction. EdgeWorker passes `this.config.defaultProvider` through when wiring up the Slack handler, so the value flows from `~/.cyrus/config.json` -> `EdgeConfig.defaultProvider` -> `AgentChatSessionHandler`. Also: re-exported `ProviderType` / `ProviderTypeSchema` from `cyrus-core`'s public index so consumers can reach them without reaching into `config-types.js`. Test coverage: 4 new tests in `packages/edge-worker/test/AgentChatSessionHandler.provider.test.ts` covering local default, explicit local, daytona without key (throws), daytona with key (constructs cleanly). Full edge-worker suite passes (605/605).

…ever forward both CLAUDE_CODE_OAUTH_TOKEN and ANTHROPIC_API_KEY are distinct auth modes in Claude Code with different billing semantics — OAuth runs against a Claude Code subscription, API key runs against direct Anthropic API access. They are not aliases for the same credential. The handler had been: - reading `ANTHROPIC_AUTH_TOKEN` (not a real env var Claude Code looks at) - forwarding the same value as BOTH `CLAUDE_CODE_OAUTH_TOKEN` and `ANTHROPIC_AUTH_TOKEN`, which conflated the two auth modes Replaced with a discriminated `ClaudeCredential` union and a `readClaudeCredential()` helper: - OAuth takes precedence: if `CLAUDE_CODE_OAUTH_TOKEN` is set, the handler uses subscription auth. - Otherwise falls back to `ANTHROPIC_API_KEY`. - Forwards exactly the env var that was set; never both. - Error message and class docstring updated to name both variables and explain the distinction.

Chat sessions now run with the same workspace-level MCP servers (Linear, cyrus-tools, cyrus-docs, optional Slack) that repo-bound sessions get. Previously the system prompt told Claude those servers existed while the runner ran with the default toolset only — a docstring-vs-reality mismatch the spike comment acknowledged. agent-runtime side: widen `McpServerRuntimeConfig` to accept the full SDK schema (`type: "http" | "sse" | "stdio"`, plus a permissive index signature for `tools`, `alwaysLoad`, …) and switch the zod schema to `.passthrough()` so SDK-shaped entries flow through to the materializer verbatim instead of being silently stripped. edge-worker side: - `AgentChatSessionHandlerDeps` is now generic over `TEvent` and carries a new optional `buildMcpServers(event)` callback. - The handler invokes it once per thread on first session creation (warm threads reuse the existing session as before), then wraps the result into a single anonymous `RuntimePlugin` named "chat" via a small `toRuntimeMcpServers` adapter that drops SDK-instance entries (those can't cross the runtime's subprocess boundary). - `EdgeWorker.buildChatMcpServers(event)` delegates to the existing `McpConfigService.buildMcpConfig` with a synthetic `chat-<teamId>` repoId and the first configured Linear workspace, so cyrus-tools context wiring stays uniform with repo-session paths. Class docstring updated: "No MCP servers" caveat removed; replaced by a section that describes the new `buildMcpServers` plumbing, supported transports, and the SDK-instance limitation. Tests: monorepo typecheck clean; all 605 edge-worker tests pass; all 31 agent-runtime tests pass (no schema regressions).

The pi harness adapter wrapped a binary named `pi` that has no traceable upstream (no npm package, no GitHub repo we could attribute it to, no install command in our docs). It's been in the codebase since the runtime's first commit but was never wired through to any runner test scope and we have no SDK or schema to type its stream-json output against. Removing rather than carrying it as `raw: unknown` forever: - delete `harnesses/pi.ts` - drop `"pi"` from `HarnessKind` union (types.ts) - drop `"pi"` from `HarnessKindSchema` enum (schemas.ts) - drop `piHarness` import + re-export + `harnessAdapters` record entry (harnesses/index.ts) - update the supported-kinds assertion in harnesses.test.ts If pi comes back as a first-class target (with attribution and a binary we can install), it goes back in cleanly — the adapter shape is simple enough that re-adding takes minutes. 31/31 agent-runtime tests pass; monorepo typecheck clean.

…eric The runtime now propagates harness-kind through to event typing. Saying `createAgentSession({ harness: "claude", … })` yields an `AgentSession<"claude">` whose `events` stream is `AsyncIterable<TranscriptEvent<SDKMessage>>` — `event.raw` narrows to the upstream SDK's union with no cast required. New types in `src/types.ts`: - `HarnessRawByKind` — lookup type from harness kind to its SDK event union. Empirically verified against each CLI's stdout (PR notes). - `OpenCodeStreamEvent` — local envelope for opencode's JSONL output (the SDK's `Event` union describes a different surface; we type only the inner `part: Part`). - `cursor` deliberately stays `unknown` — `@cursor/sdk`'s `SDKMessage` describes a different surface than `cursor-agent`'s stream-json. The follow-up plan is to vendor a small driver that wraps `@cursor/sdk` directly, at which point cursor's row becomes `import("@cursor/sdk").SDKMessage`. Generic propagation: - `TranscriptEvent<TRaw = unknown>` — `raw` is now `TRaw`, defaults to `unknown` for back-compat. - `AgentSession<H extends HarnessKind = HarnessKind>` — `events`, `run()`, etc. carry `H` through. - `AgentSessionResult<H>` and `RuntimeCallbacks<H>` follow. - `createAgentSession<H>(config: CreateAgentSessionConfigFor<H>)` infers H from `config.harness`. New helper type `CreateAgentSessionConfigFor<H>` narrows the `harness` field. Internal `RuntimeAgentSession` stays non-generic (operates on the loose union); the public factory casts at the boundary. Existing consumers reading `event.raw as unknown` continue to compile unchanged — `AgentSession` defaults to `AgentSession<HarnessKind>` which keeps the current weak typing. Also fixes a long-standing opencode adapter bug: `--output-format json` → `--format json`, the actual CLI flag per `opencode run --help`. The old flag would have failed at runtime on first invocation. Adds 4 type-only devDependencies under @Anthropic-AI, @openai, @google, @opencode-ai — never bundled, never imported at runtime. Tests: monorepo typecheck clean; 34/34 agent-runtime tests pass (adds 3 compile-time type-narrowing assertions); 605/605 edge-worker tests pass.

@latest

Two related changes — first a new API addition, then the first consumer that benefits from the per-harness typing we landed last commit. agent-runtime: - New AgentSession.transcript() — returns a snapshot of every event observed on the session so far, in insertion order. Useful for cross-turn replay, post-hoc inspection, building a UI timeline without consuming the live `events` async iterable, or resuming consumption from a known index across reconnects. Returns a fresh copy so callers can't mutate the internal buffer. - Implementation in RuntimeAgentSession is a one-liner over the existing `observedEvents` array; that array was already being populated for the rolling-result use case. - Two compile-time tests assert the typing: `transcript()` on AgentSession<"claude"> returns readonly TranscriptEvent<SDKMessage>[] and AgentSession<"codex"> returns readonly TranscriptEvent<ThreadEvent>[]. edge-worker: - AgentChatSessionHandler now narrows `state.session` to AgentSession<"claude">. The handler creates Claude sessions only (per the "Claude harness only" caveat in its docstring), so this narrowing surfaces SDKMessage typing throughout the run/result/ transcript chain. - extractAssistantFallback's manual cast soup is gone — the function now walks `events: readonly TranscriptEvent<SDKMessage>[]`, narrows via `e.raw.type === "assistant"` (TS discriminates on the SDK union), and iterates `e.raw.message.content` with full BetaContentBlock typing. Removes 8 lines of inline type guards. - buildSessionConfig returns CreateAgentSessionConfigFor<"claude"> (the H-narrowed variant), and the local-provider branch's harness config uses `kind: "claude" as const` so the literal flows through. createAgentSession is called with explicit <"claude"> type arg. Also: pinned agent-runtime's @anthropic-ai/claude-agent-sdk to 0.2.123 (exact) to match the rest of the workspace — pnpm add @latest had grabbed ^0.3.145, which made two copies of the SDK resolve and broke nominal type unification between SDKMessage references. Tests: 36/36 agent-runtime (adds 2 transcript-typing assertions); 605/605 edge-worker; monorepo typecheck clean.

…typed Closes the typing pass for the 5th and last harness. cursor's row in HarnessRawByKind flips from `unknown` to `@cursor/sdk`'s `SDKMessage`, matching the other four harnesses. Why a wrapper. We previously verified that `cursor-agent --output-format stream-json` emits a different schema than what `@cursor/sdk` declares: cursor-agent uses `session_id`, `subtype`, nested `tool_call.shellToolCall.{args,result}`, and a `result` event that the SDK union doesn't include; the SDK uses `agent_id`, `run_id`, `status`, top-level `args`/`result`, and no `result` variant. Typing `raw` as `SDKMessage` while spawning `cursor-agent` would be a lie. The fix: vendor a tiny driver that uses `@cursor/sdk`'s `Agent.create` + `run.stream()` ourselves. Spawn it as `node <driver>` from the cursor adapter. The bytes on the wire ARE `SDKMessage` by construction — there's no schema drift to worry about because we own the producer. What's new: - src/harnesses/cursor-driver.ts — Node ESM script that parses argv (--prompt, --model, --cwd, --system-prompt, --agent-id, --agent-id-file), creates an Agent via `@cursor/sdk`, streams SDKMessage events to stdout as JSONL, and exits 0/1/2. - src/harnesses/cursor.ts — rewritten to spawn `node <driver-path>` instead of `cursor-agent`. Driver path resolved via `import.meta.url` against `./cursor-driver.js`, sibling in both src and dist. The adapter's `extractResult` now walks `event.raw as SDKMessage` and narrows via the discriminator — no manual guards. - HarnessRawByKind["cursor"] = SDKMessage from @cursor/sdk. - @cursor/sdk added as a regular dependency (not devDep) since the driver imports it as a value. Internal cleanup forced by the typing: - RuntimeAgentSession no longer formally `implements AgentSession`. The public interface is generic over H; the internal class works with the loose `TranscriptEvent<unknown>` form and the factory in runtime.ts casts at the boundary (which it was already doing). - run()'s `turnEvents` is cast to `AgentSessionResult["events"]` at the return site — single boundary, type-safe. - emitEvent() casts the callback to the loose `TranscriptEvent` form at the call site (the public boundary is the factory cast in runtime.ts). End-to-end smoke test against the real Cursor API confirmed the driver's stdout matches the SDK union exactly: `status`/`tool_call`/ `assistant` variants with `agent_id`+`run_id`, no schema drift. Exit 0, 22 valid JSON lines, zero stderr noise. Known limitation. The driver's path is on the host, so this works unmodified for the local provider; Daytona needs the driver materialized into the sandbox + `@cursor/sdk` installed there. Left as a TODO in the cursor adapter — chat is Claude-only today so no existing functionality regresses. Tests: 36/36 agent-runtime (typed-events asserts cursor now resolves to CursorSDKMessage; harnesses test updated for the new command shape); 605/605 edge-worker; monorepo typecheck clean.

WorkerService cherry-picked fields from the loaded EdgeConfig into the EdgeWorkerConfig but never forwarded defaultProvider, so the value configured in ~/.cyrus/config.json was silently dropped and chat sessions always defaulted to the local sandbox provider — even when the operator had explicitly selected daytona.

…pass perms Add support for booting Daytona-backed chat sandboxes from a pre-built snapshot rather than the default base image, configurable via env: - DAYTONA_SNAPSHOT: pre-built snapshot to seed the sandbox from. When set, the npm-install bootstrap is skipped (the snapshot is expected to ship Claude Code preinstalled) and the CLI defaults to `claude` on PATH. - DAYTONA_WORKING_DIR: in-sandbox working/home directory (default `/home/daytona`). Set this when the snapshot uses a different user layout, e.g. `/home/cyrus`. - DAYTONA_CLAUDE_CLI_PATH: absolute path to the `claude` binary inside the sandbox. Defaults to `<workingDir>/.npm-global/bin/claude` when no snapshot is set, or `claude` (PATH-resolved) when a snapshot is. Plumbing: - Add an optional `snapshot` field to RuntimeSandboxConfig, forwarded by ComputeSdkSandboxProvider as `snapshotId` (which the ComputeSDK Daytona adapter maps to Daytona's snapshot create param). - Add the same field to the Zod schema so it survives normalization (the schema was silently stripping unknown sandbox keys, which is what made the wired-up snapshot value never reach the SDK call). - Translate Cyrus's cross-harness PermissionMode to Claude's CLI flag values in the Claude harness adapter (`"bypass"` -> `"bypassPermissions"`, `"ask"` -> `"default"`). Without this, passing `"bypass"` failed at the CLI boundary since Claude does not accept that string. - Default Daytona chat sessions to `permissions: { mode: "bypass" }` so the agent can run shell commands inside the sandbox — the sandbox itself is the isolation boundary, so per-tool prompts (which no user can answer) are noise. Local sessions are unchanged.

…ackage The driver script that vendors `@cursor/sdk` for the cursor harness adapter moves out of `cyrus-agent-runtime`'s `src/harnesses/` into its own publishable package at `packages/cursor-sdk-runner/` with the npm name `@cyrus/cursor-runner`. Why a dedicated package: - Standalone reusable tool. Anyone who wants typed Cursor streaming across a process boundary can `npm install -g @cyrus/cursor-runner` and spawn it from any language/runtime, not just from cyrus. - Cleaner dependency surface. agent-runtime no longer carries `@cursor/sdk` as a runtime dep — it stays a devDep just for the `SDKMessage` type import that backs `HarnessRawByKind["cursor"]`. The actual @cursor/sdk install moves into the new package. - Independent versioning. The driver can iterate against new Cursor SDK releases without forcing an agent-runtime release. - Distinct from the legacy `cyrus-cursor-runner` package (the IAgentRunner-style `cursor-agent`-CLI wrapper that the new agent-runtime is replacing). Two packages with clearly different scopes — no confusion. Package contents: - `package.json` — name `@cyrus/cursor-runner`, version synced to workspace (0.2.51), `bin: { "cursor-runner": "dist/index.js" }`, publishConfig.access public, MIT license inheriting from monorepo. - `src/index.ts` — same driver logic as before, with a `#!/usr/bin/env node` shebang so the bin is executable post-install. Argv contract unchanged (--prompt, --model, --cwd, --system-prompt, --agent-id, --agent-id-file). - `README.md` — install instructions, options table, exit codes, and the consumer narrowing pattern showing how to import SDKMessage. - `tsconfig.json` mirrors other workspace runners. agent-runtime wiring: - Removed `src/harnesses/cursor-driver.ts` (moved upstream). - `src/harnesses/cursor.ts` resolves the runner via `createRequire(import.meta.url).resolve("@cyrus/cursor-runner")` instead of a sibling-file URL. Works for both pnpm workspace symlinks (today) and standalone npm installs (when the package is published). - `@cursor/sdk` moved from dependencies to devDependencies (type-only). - Added `@cyrus/cursor-runner: workspace:*` as a dependency. End-to-end smoke test against the real Cursor API confirmed via the new resolved path: the bin spawns, streams 17 SDKMessage JSON lines, exits 0 with zero stderr noise. Wire format still has agent_id / run_id / status fields exactly per the `@cursor/sdk` union. Tests: 36/36 agent-runtime (cursor command shape test updated for the new path-resolution pattern); 605/605 edge-worker; monorepo typecheck clean. Known limitations carried forward unchanged from the previous commit: - Local provider only — the runner's path resolves to a host node_modules location that doesn't exist inside a remote sandbox. Daytona support needs the runner installed into the sandbox via setup commands (`npm install -g @cyrus/cursor-runner`) once the package is published. Slack chat is Claude-only so no current functionality regresses. - Multi-turn resume is wired in the runner (--agent-id-file / --agent-id) but not threaded from the cursor adapter yet; same TODO as before.

`git add <files>` skipped the deletion in the previous commit because none of the explicit paths I staged covered it. The driver moved to `packages/cursor-sdk-runner/src/index.ts` and was already referenced through the new `@cyrus/cursor-runner` package; the stale source file just lingered. Removing it now.

…apshot' into claude/agent-runtime-slack-chat-replacement

…ot mode Mirrors how Claude's adapter handles `DAYTONA_CLAUDE_CLI_PATH`. The cursor adapter now has two invocation shapes: - **Default (local provider)**: no `harness.command` set, falls back to `createRequire("@cyrus/cursor-runner")` resolution and spawns `node <host-resolved-path>`. Same behavior as before for anyone not setting a custom command. - **Override (Daytona snapshot mode)**: `harness.command` is the cursor-runner binary inside the sandbox. Spawned directly — the runner's `#!/usr/bin/env node` shebang makes it executable, no intermediary `node` needed. Callers pass `"cursor-runner"` to use the sandbox's PATH (which Daytona snapshots populate with the preinstalled bin) or an absolute path to pin a specific copy. This composes cleanly with the snapshot work: a Daytona snapshot ships `@cyrus/cursor-runner` preinstalled alongside the harness binaries, callers set `harness: { kind: "cursor", command: "cursor-runner" }`, and the adapter doesn't care that it's running in a remote sandbox vs on the host — same pattern as Claude getting `command: "claude"` for PATH lookup inside a snapshot. The chat handler doesn't use cursor today (Claude-only), so no existing consumer needs updating. When a cursor-on-Daytona consumer shows up, they wire `harness.command` from the same kind of env (e.g. `DAYTONA_CURSOR_RUNNER_PATH`) as the chat handler already does for Claude. Tests: added a paired test asserting the override shape; previous test renamed for clarity ("via the host-resolved … when harness.command is unset" vs "uses harness.command directly … Daytona-snapshot mode"). 40/40 agent-runtime tests pass; monorepo typecheck clean.

@Cyrus-AI

…rsor-runner The `@cyrus` npm scope is unclaimed; @Cyrus-AI is the registered org. Rename across: - packages/cursor-sdk-runner/package.json (name) - packages/cursor-sdk-runner/src/index.ts (docstring + usage example) - packages/cursor-sdk-runner/README.md (title, install command) - packages/agent-runtime/package.json (dependency entry) - packages/agent-runtime/src/harnesses/cursor.ts (createRequire target + docs) - packages/agent-runtime/test/harnesses.test.ts (test description) No behavioral change. 40/40 agent-runtime tests pass; 613/613 edge-worker; monorepo typecheck clean.

The typed events story (HarnessRawByKind narrowing SDKMessage / ThreadEvent / JsonStreamEvent / etc.) only holds if the SDK version we type against actually describes the bytes the CLI emits. Today the pins were a mix of exact (claude, cursor) and caret (gemini, codex, opencode) — carets let a future minor SDK release introduce shapes the runtime CLI doesn't emit (or vice versa) and quietly break the narrowing. Pinned everything to exact versions matching the CLI versions we've empirically tested against: | SDK pin | Was | Now | Matches CLI | | --------------------------- | --------- | -------- | ------------------------- | | @anthropic-ai/claude-agent | 0.2.123 | 0.2.123 | claude 2.1.145 | | @cursor/sdk | 1.0.13 | 1.0.13 | @cyrus-ai/cursor-runner | | @google/gemini-cli-core | ^0.42.0 | 0.17.0 | gemini 0.17.0 (per CLAUDE.md) | | @openai/codex-sdk | ^0.131.0 | 0.130.0 | codex 0.130.0 | | @opencode-ai/sdk | ^1.15.5 | 1.15.5 | opencode 1.15.5 | Also pinned every `@anthropic-ai/claude-code@latest` install command to `2.1.145` — the chat handler's Daytona setup commands, plus three test scripts. `@latest` was a silent drift surface: it would install a CLI whose stream-json shape might not match `@anthropic-ai/claude- agent-sdk@0.2.123` (the SDK we type against), and the breakage would show up as runtime type confusion in production rather than at build. Now a CLI version bump requires a coordinated SDK bump in the same PR — visible in the diff. Added a `PINNED_CLAUDE_CLI_VERSION` const + explanatory comment in AgentChatSessionHandler so future maintainers see the constraint. Tests: 40/40 agent-runtime, 613/613 edge-worker, monorepo typecheck clean. Downgrading gemini-cli-core (^0.42.0 → 0.17.0) and codex-sdk (^0.131.0 → 0.130.0) didn't break anything — both ship the same JsonStreamEvent / ThreadEvent unions we rely on at the older versions.

…the agent-runtime context The previous pin (0.17.0) was inherited from the LEGACY `cyrus-gemini-runner` package, which deliberately holds 0.17.0 for its own reasons (the package documents this in `packages/gemini-runner/CLAUDE.md` and pins identically in its own package.json). cyrus-agent-runtime is a different context — it's the new runtime that consumers will install going forward, so it should track the current gemini-cli-core line. Verified: `@google/gemini-cli-core@0.42.0` ships exactly the same `JsonStreamEvent` union we narrow against — 6 variants (InitEvent, MessageEvent, ToolUseEvent, ToolResultEvent, ErrorEvent, ResultEvent), identical to 0.17.0's union. So no narrowing breakage. `packages/gemini-runner/` is intentionally left at 0.17.0 — that's the legacy stack's own pinning decision and shouldn't move just because we touched the new runtime. Tests: 40/40 agent-runtime, 613/613 edge-worker, monorepo typecheck clean.

…ex-sdk to 0.131.0 Following the "agent-runtime tracks current" reasoning we used for the gemini bump, move the remaining SDK pins forward: - @anthropic-ai/claude-agent-sdk: 0.2.123 -> 0.2.141 (in 5 packages that all need to move together for type identity to unify at the package boundaries — agent-runtime devDep, plus runtime deps in core / claude-runner / edge-worker / simple-agent-runner) - @openai/codex-sdk: 0.130.0 -> 0.131.0 (latest; agent-runtime only) Both are additive minor bumps — the SDKMessage / ThreadEvent unions we narrow `HarnessRawByKind[H].raw` against are supersets of the older shapes (claude added `SDKPermissionDeniedMessage` to the union and `oauth_org_not_allowed`/`model_not_found` to the error enum; codex 0.131 ships the same ThreadEvent union as 0.130). Nothing we read from those types was removed or renamed, so existing consumers (notably AgentChatSessionHandler.extractAssistantFallback) typecheck clean. Also updated the inline comment in AgentChatSessionHandler that referenced `@anthropic-ai/claude-agent-sdk@0.2.123` as the SDK the Daytona CLI install pin is paired against — bumped to 0.2.141. Matching test assertion in AgentChatSessionHandler.provider.test.ts updated to keep the version-pair note in sync. Untouched: - PINNED_CLAUDE_CLI_VERSION ("2.1.145") — that's the latest CLI and pairs with the 0.2.141 SDK - packages/codex-runner (^0.125.0) and packages/gemini-runner (0.17.0) — legacy packages own their own pinning decisions Tests: 40/40 agent-runtime, 613/613 edge-worker, monorepo typecheck clean.

…drop SlackDaytonaRunner Pulls the agent-runtime additions for caller-driven harness session resume: - RuntimeVolumeConfig.subpath — per-binding isolation within a shared provider volume (Daytona Volumes pattern) - CreateAgentSessionConfig.resumeHarnessSessionId — caller-supplied harness session id to resume; adapter translates to its native CLI flag - HarnessAdapter.extractSessionId — pulls the harness-native session id out of the observed transcript so callers can persist it - AgentSessionResult.harnessSessionId — round-trips the id to the caller after each run - Claude adapter: extractSessionId reads `system.init.session_id`; buildCommand appends `--resume <id>` when resumeHarnessSessionId is set - resume-smoke.mjs — two-turn Daytona Volume smoke test script Explicitly DROPPED from the merge: - packages/edge-worker/src/SlackDaytonaRunner.ts - packages/edge-worker/test/SlackDaytonaRunner.test.ts - the EdgeWorker.ts wiring that constructs SlackDaytonaRunner under the legacy ChatSessionHandler — we use the new AgentChatSessionHandler instead (already wired from earlier work in this branch) - @computesdk/daytona + @daytonaio/sdk deps in edge-worker package.json (only SlackDaytonaRunner needed them; agent-runtime already depends on @computesdk/daytona directly) Conflict resolutions: - claude.ts: kept our plugin wiring (--plugin-dir / --mcp-config) AND added the volumes-branch's --resume handling. Both are additive in the args list. - session.ts: kept turnEvents semantics for AgentSessionResult.events (per-turn slice, what consumers already expect) but added the volumes-branch's harnessSessionId extraction over the FULL observedEvents (since system.init.session_id arrives once on turn 1 and is referenced by every subsequent turn). Adopted destroySandboxOnce() for the destroy callback (cleaner than destroy() which also tries to cancel an already-completed run). - harnesses.test.ts: kept all tests from both sides — bypass permission mapping + --resume + extractSessionId. - runtime.test.ts: fixed the volumes-branch's resume test to use our actual session API (session.run("prompt") vs session.start(), config.userPrompt was wrong and was just a stray field). Tests: 44/44 agent-runtime, 613/613 edge-worker, monorepo typecheck clean.

…-to-end Wires cursor into the same caller-driven resume contract Claude got from the volumes merge: - `extractSessionId(events)` — walks the SDKMessage stream for the first `agent_id` (every variant carries it; the value is stable across the whole run). Returned as `AgentSessionResult.harnessSessionId` for the caller to persist. - `buildCommand` — when `config.resumeHarnessSessionId` is set, appends `--agent-id <id>` to the cursor-runner invocation. The runner reads that flag and calls `Agent.resume(<id>)` instead of `Agent.create()`, picking up the prior conversation. The runner's `--agent-id-file` flag is unchanged — kept for callers that prefer the runner-writes-it-to-disk pattern; this adapter just doesn't use it because the runtime now surfaces the id via extractSessionId. Includes a guard against non-object `event.raw` in extractSessionId — runtime lifecycle events can emit string raw values, and the `in` operator throws on those. The chat-typed shape says `raw: SDKMessage` but the buffer holds both harness-streamed and runtime-lifecycle events; structural guard is the right move at the adapter boundary. Tests: 4 new tests in harnesses.test.ts covering `--agent-id` passed when set, omitted when not, agent_id extracted from a realistic status+assistant transcript, and undefined when nothing in the transcript carries one. 48/48 agent-runtime pass; 613/613 edge-worker; monorepo typecheck clean. Cursor is now feature-equivalent to Claude on the multi-turn resume contract: caller persists `result.harnessSessionId`, passes it back as `resumeHarnessSessionId` on the next session config, gets a continuation run. Works for local and Daytona-snapshot modes alike since the resume state lives on Cursor's servers (addressable by agentId) rather than in any filesystem the runtime needs to preserve.

…ldStateEnv Adds a consumer-facing `sandbox.persistentState: { volume, bindingId }` config that hides the per-harness env-var math. The runtime mounts the caller's volume at a fixed internal path with bindingId as the subpath, then asks the harness adapter (via the new `buildStateEnv(mountPath)` hook) which env vars to set so the harness writes its state-dir there instead of under `$HOME`. Each adapter's env mapping is grounded in upstream source, not guessed: - claude → CLAUDE_CONFIG_DIR = `${m}/.claude` - cursor → CURSOR_DATA_DIR = `${m}/.cursor` - codex → CODEX_HOME = `${m}/.codex` (`codex-rs/utils/home-dir/src/lib.rs::find_codex_home`) - gemini → GEMINI_CLI_HOME = `${m}` (CLI appends `.gemini` itself; `@google/gemini-cli-core::homedir()`) - opencode → all four XDG dirs under `${m}/.opencode-xdg/{config,data,state,cache}` (no app-specific override; opencode derives via `xdg-basedir`)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19dce79533

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-20T01:54:22Z

+	const oauth = process.env.CLAUDE_CODE_OAUTH_TOKEN?.trim();
+	if (oauth) return { kind: "oauth", token: oauth };
+	const apiKey = process.env.ANTHROPIC_API_KEY?.trim();
+	if (apiKey) return { kind: "apiKey", token: apiKey };


Support ANTHROPIC_AUTH_TOKEN in chat credential detection

readClaudeCredential() only accepts CLAUDE_CODE_OAUTH_TOKEN or ANTHROPIC_API_KEY, so Slack chat sessions now reject environments that authenticate Claude with ANTHROPIC_AUTH_TOKEN (they hit the "not configured" reply path). This is a regression from the prior runner path, which still treats ANTHROPIC_AUTH_TOKEN as a valid auth env (see packages/claude-runner/src/session-env.ts), so existing deployments using that variable will fail for every chat mention until they reconfigure.

Useful? React with 👍 / 👎.

Fixed in 1e1531d — added a third authToken variant to the ClaudeCredential union, scan ANTHROPIC_AUTH_TOKEN last (matching session-env.ts AUTH_ENV_KEYS precedence), and forward exactly that env var to the harness for kind=authToken. Updated the "not configured" error to list all three options. New tests cover detection precedence, whitespace-only handling, and per-kind forwarding.

chatgpt-codex-connector · 2026-05-20T01:54:22Z

+					if (out.mcpConfigPath) {
+						this.pluginOutputs.claudeMcpConfigPath = out.mcpConfigPath;


Merge Claude MCP configs across all plugins

In materializePlugins(), each Claude plugin's mcpConfigPath overwrites the previous one, but claudeHarness can pass only the final scalar via --mcp-config. When multiple plugins define MCP servers, only the last plugin's MCP config is actually wired into Claude (especially with --strict-mcp-config), so tools from earlier plugins silently disappear. This should aggregate all plugin MCP servers into one combined config (or equivalent) before command construction.

Useful? React with 👍 / 👎.

Fixed in 1e1531d — materializePlugins now accumulates every plugin's mcpServers map and, after the loop, writes one combined <pluginsRoot>/.mcp.combined.json that --mcp-config points at. Per-plugin .mcp.json files are still written (part of the documented Claude plugin layout for --plugin-dir consumers); the combined file is just the handoff target for the single-scalar --mcp-config flag. Caller plugin order determines precedence on duplicate server names — later wins. New tests prove three plugins all reach the harness via one merged config and that the shadow rule holds.

Conflict resolutions: - CHANGELOG.internal.md: keep both — our Unreleased agent-runtime entries (refreshed to match what's actually on this branch, drop the SlackDaytonaRunner entry since that work was reverted) and main's [0.2.52] release marker. - package.json: keep all new pnpm.overrides from main; collapse the five overrides that appeared in both halves of the conflict block. - packages/edge-worker/src/EdgeWorker.ts: accept main's new buildSkillSessionContext method (three call sites depend on it). - pnpm-lock.yaml: regenerated via pnpm install --no-frozen-lockfile. Post-merge fixes (auto-merge produced bad output that compiled to duplicate object literal keys): - packages/codex-runner/src/CodexRunner.ts: remove duplicate stop_details: null from two assistant message factories. - packages/cursor-runner/src/CursorRunner.ts: same. - packages/gemini-runner/src/adapters.ts: same. Post-merge cleanup: - packages/edge-worker/src/EdgeWorker.ts: drop unused getDefaultModelForRunner / getDefaultFallbackModelForRunner private wrappers that main added — all call sites in this branch already use runnerSelectionService.* directly. Dependency security policy compliance: - Add root pnpm.overrides for brace-expansion (>=5.0.6), ws (>=8.20.1), protobufjs (>=7.5.8) to keep pnpm audit clean after the merge surfaced advisories under our pinned @google/gemini-cli-core transitive dep. Verified: pnpm build / pnpm typecheck / pnpm test:packages:run all clean; pnpm audit reports no known vulnerabilities.

Four standalone smoke scripts (created before they got linted via the pre-commit hook) had format / organize-imports diffs that pnpm lint caught on CI. Pure formatting — no behavior change.

P1 (edge-worker, AgentChatSessionHandler): accept ANTHROPIC_AUTH_TOKEN in Slack chat credential detection. The handler previously read only CLAUDE_CODE_OAUTH_TOKEN and ANTHROPIC_API_KEY, so deployments that auth Claude via an Anthropic-compatible proxy/gateway (using ANTHROPIC_AUTH_TOKEN) hit the "not configured" reply path even though the legacy claude-runner accepts that env var (see claude-runner/src/session-env.ts AUTH_ENV_KEYS). Extend the ClaudeCredential discriminated union with a third variant, scan that env var third in the documented precedence order, and forward only the matching env var to the harness so the three auth modes don't get conflated. Update the "not configured" error message to mention all three options. P2 (agent-runtime, RuntimeAgentSession.materializePlugins): merge Claude MCP server configs across plugins. Claude's `--mcp-config` flag takes a single scalar path, so the previous "last writer wins" overwrite of claudeMcpConfigPath silently dropped every plugin's MCP servers except the last (and `--strict-mcp-config` made that fatal for tool calls into the dropped servers). Accumulate every plugin's mcpServers map and, after the loop, write one combined .mcp.combined.json at the plugins root and point `--mcp-config` at that. Per-plugin .mcp.json files are still written by the materializer because they're part of the documented Claude plugin layout that `--plugin-dir` consumers expect; the combined file is purely the handoff target for `--mcp-config`. Caller-supplied plugin order determines precedence on duplicate server names (later wins), which the new test locks in. Tests: - edge-worker: 7 new tests covering credential detection precedence, trim-whitespace, and per-kind env-var forwarding (oauth / apiKey / authToken) into the Daytona session config. - agent-runtime: updated the existing single-plugin assertion to expect `--mcp-config .mcp.combined.json`; added two new tests — one proving three plugins all reach the harness via one merged config, one proving caller plugin order picks the winning shadow on duplicate server names. Verified: pnpm -F cyrus-agent-runtime test:run (59 tests, was 57); pnpm -F cyrus-edge-worker test:run (631 tests, was 624); pnpm typecheck + pnpm lint clean.

Connoropolous · 2026-05-20T02:24:22Z

Sample usage of createAgentSession — still WIP (no exported examples / no narrative README), but this is the shape today. Three representative cases:

1. Minimal local invocation — packages/agent-runtime/test/runtime.test.ts:54-69

import { createAgentSession } from "cyrus-agent-runtime";

const session = await createAgentSession({
  sessionId: "session-1",
  harness: "codex",                       // shorthand for { kind: "codex" }
  env: { NODE_ENV: "test" },
  secrets: { API_KEY: "secret" },         // string shorthand → { value, redact: true }
});

await session.addMessage("queued");
const result = await session.run("Do it");
// result.events / result.result / result.harnessSessionId / result.destroy()

Default sandbox: { provider: "local", workingDirectory: cwd } is auto-filled by normalizeConfig. No type parameter needed when the harness is unambiguous.

2. Production Daytona Claude invocation — packages/edge-worker/src/AgentChatSessionHandler.ts:485-672

const sessionConfig = {
  sessionId,
  harness: { kind: "claude", command: this.daytonaClaudeCliPath },
  systemPrompt,
  // Discriminated on credential.kind so the three Claude auth modes
  // don't get conflated — only one of these env vars ships through.
  secrets: credential.kind === "oauth"
    ? { CLAUDE_CODE_OAUTH_TOKEN: credential.token }
    : credential.kind === "apiKey"
      ? { ANTHROPIC_API_KEY: credential.token }
      : { ANTHROPIC_AUTH_TOKEN: credential.token },
  permissions: { mode: "bypass" },        // sandbox is the isolation boundary
  packages: { commands: [...this.daytonaSetupCommands] },
  // Chat-session MCP servers wrapped into one anonymous plugin; the
  // materializer fans this out to Claude's native .mcp.json + writes a
  // session-level .mcp.combined.json that --mcp-config points at.
  plugins: [{ name: "chat", mcpServers: toRuntimeMcpServers(mcpServers) }],
  sandbox: {
    provider: "daytona",
    name: `cyrus-slack-${sessionId}`,
    workingDirectory: this.daytonaWorkingDir,
    timeoutMs: 300_000,
    destroyWhileInactive: true,           // stop()/start() between turns
    snapshot: this.daytonaSnapshot,       // pre-installed harness binaries
    metadata: { purpose: "cyrus-slack-chat", threadKey },
  },
};

// Explicit <"claude"> threads SDKMessage typing through to
// session.events / result.events with no cast at consumer sites.
const session = await createAgentSession<"claude">(sessionConfig, {
  callbacks: {
    onTranscriptEvent: (te) => {
      // te.raw is typed `SDKMessage` here.
      logger.debug(`[${sessionId}] transcript event: ${te.kind}`);
    },
  },
});

const result = await session.run(userPrompt);  // first turn
// ...later, same `session` object:
const result2 = await session.run(followUpPrompt);  // resumes via Claude --continue

3. Multi-turn resume across brand-new sandboxes — `sandbox.persistentState` + `resumeHarnessSessionId`

// Turn 1 — create
const s1 = await createAgentSession<"claude">({
  sessionId: "turn-1",
  harness: "claude",
  sandbox: {
    provider: "daytona",
    persistentState: {
      // Caller picks backing volume + a stable binding identifier.
      // No knowledge of CLAUDE_CONFIG_DIR or mount paths — the runtime
      // mounts the volume internally and the adapter contributes the
      // right state-env var (Claude→CLAUDE_CONFIG_DIR, Cursor→CURSOR_DATA_DIR,
      // Codex→CODEX_HOME, Gemini→GEMINI_CLI_HOME, OpenCode→XDG_*_HOME).
      volume: { name: "cyrus-prod-vol", kind: "fuse" },
      bindingId: threadKey,
    },
  },
});
const r1 = await s1.run("Tell me a joke");
const harnessId = r1.harnessSessionId;   // capture for later
await r1.destroy();                       // sandbox torn down

// Days later, brand new sandbox — same on-disk state via volume + bindingId:
const s2 = await createAgentSession<"claude">({
  sessionId: "turn-2",
  harness: "claude",
  resumeHarnessSessionId: harnessId,      // Claude --resume <id>
  sandbox: {
    provider: "daytona",
    persistentState: {
      volume: { name: "cyrus-prod-vol", kind: "fuse" },
      bindingId: threadKey,               // same → same state visible
    },
  },
});
const r2 = await s2.run("What was the punchline?");

The pattern in AgentChatSessionHandler keeps the same session object across turns and relies on destroyWhileInactive: true to pause/resume the sandbox between mentions. resumeHarnessSessionId + persistentState is the cross-process variant for cases where the caller is itself stateless between turns (e.g. serverless).

Connoropolous and others added 30 commits May 15, 2026 09:30

feat(agent-runtime): add sandboxed harness runtime package

8a9fac9

docs(agent-runtime): record claude daytona validation

f9b6150

Add wokring implementation of Daytona volume mounting in sandbox runtime

27a4a4c

Connoropolous added 9 commits May 19, 2026 17:09

Merge remote-tracking branch 'cyrusagents/feature/add-daytona-base-sn…

58ef6f1

…apshot' into claude/agent-runtime-slack-chat-replacement

Connoropolous mentioned this pull request May 20, 2026

Add sandboxed agent runtime package #1220

Closed

chatgpt-codex-connector Bot reviewed May 20, 2026

View reviewed changes

Connoropolous added 3 commits May 19, 2026 18:57

style: biome --write on test-scripts (CI lint fix)

3d97239

Four standalone smoke scripts (created before they got linted via the pre-commit hook) had format / organize-imports diffs that pnpm lint caught on CI. Pure formatting — no behavior change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sandboxed agent runtime, plugins, Daytona volumes, persistent state, and edge-worker chat integration#1229

Sandboxed agent runtime, plugins, Daytona volumes, persistent state, and edge-worker chat integration#1229
Connoropolous wants to merge 42 commits into
mainfrom
claude/agent-runtime-slack-chat-replacement

Connoropolous commented May 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Uh oh!

Connoropolous May 20, 2026

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Uh oh!

Connoropolous May 20, 2026

Uh oh!

Connoropolous commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if (out.mcpConfigPath) {
		this.pluginOutputs.claudeMcpConfigPath = out.mcpConfigPath;

Conversation

Connoropolous commented May 20, 2026

Summary

Validation

Follow-ups (not in this PR)

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Connoropolous May 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Connoropolous May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Connoropolous commented May 20, 2026

1. Minimal local invocation — packages/agent-runtime/test/runtime.test.ts:54-69

2. Production Daytona Claude invocation — packages/edge-worker/src/AgentChatSessionHandler.ts:485-672

3. Multi-turn resume across brand-new sandboxes — sandbox.persistentState + resumeHarnessSessionId

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

3. Multi-turn resume across brand-new sandboxes — `sandbox.persistentState` + `resumeHarnessSessionId`