Sandboxed agent runtime, plugins, Daytona volumes, persistent state, and edge-worker chat integration#1229
Sandboxed agent runtime, plugins, Daytona volumes, persistent state, and edge-worker chat integration#1229Connoropolous wants to merge 42 commits into
Conversation
…dboxes Add an optional `streamCommand(command, options)` capability to `RunnerSandbox`, with `onStdout` / `onStderr` chunk callbacks, an `AbortSignal` for cancellation, and an `AsyncIterable<string> input` option for live stdin. Local provider implements it via `child_process.spawn`; Daytona is reached through a pluggable `NativeStreamAdapter` registry that unwraps ComputeSDK's `ProviderSandbox.getInstance()` to the native `@daytonaio/sdk` Sandbox and uses async sessions + `getSessionCommandLogs(onStdout, onStderr)`. `RuntimeAgentSession.start()` now prefers `streamCommand` when `capabilities.streamingProcess` is true, line-buffers chunks across packet boundaries, and emits `TranscriptEvent`s as the harness CLI produces them. New `interactiveInput` opt-in routes `addMessage()` into the running process's stdin (default off — most one-shot CLIs block on a piped-but-never-closed stdin). Verified end-to-end: - local `spawn`: chunks land at the exact 400ms cadence the child emits - real `codex exec` via `createAgentSession`: events emitted ~8.6s before turn end - real Daytona Claude `stream-json`: system event landed 1.7s before result event over a remote sandbox
…config Add two materialization concepts to `CreateAgentSessionConfig`, deliberately distinct from the existing `volumes` (provider-attached persistent storage): - `RuntimeFolderConfig` — exposes a host filesystem folder inside the sandbox. Walks the host tree and uploads each file via `SandboxFilesystem.writeFile`. Supports `exclude` globs. With `access: "readwrite"` the runtime syncs sandbox edits and any newly-created files back to the host folder after the harness command completes. - `RuntimeRepositoryConfig` — runs `git clone` inside the sandbox at `mountPath` with optional `branch` checkout and `depth` shallow-clone. Local-path sources are rewritten to `file://...` to preserve git semantics. Shallow clones with a branch use `--branch` on the clone itself, since `git checkout` of a non-default branch fails after a shallow clone. Both emit lifecycle transcript events (`folder.materialize.*`, `folder.syncback.*`, `repository.materialize.*`) and run after files but before package setup commands, so setup steps that depend on the cloned tree or the mounted folder see them ready. 27 tests pass (5 new): one materializer unit test per concept and one runtime-level integration test verifying that the session wires each through to the right sandbox calls and emits the right events.
Equates to ComputeSDK's ProviderSandbox.destroy() for ComputeSDK-backed providers (deletes the remote sandbox, releases compute resources) and is a no-op for the local provider. Lets a caller hold only the result object, consume events/result, then tear down without keeping a reference to the session. Idempotent — backed by a one-shot destroy promise on the session that both `AgentSession.stop()` and `AgentSessionResult.destroy()` share, so callers can call either or both in any order without double-destroying the underlying ComputeSDK / local sandbox. Verified with a new test that asserts: - the returned result exposes destroy() - calling result.destroy() invokes sandbox.destroy() exactly once - calling result.destroy() twice is a no-op the second time - calling session.stop() after result.destroy() does not double-destroy
`stop()` and `destroy()` were doing two unrelated things bundled into one method. Split them. `stop()` now cancels the in-flight run only — aborts the harness process, closes the live event stream, closes the input pipe — and leaves the sandbox alive. This enables future workflows that reuse a warm sandbox across runs (per CYPACK-1209): a single run's `stop()` no longer destroys shared compute. `destroy()` is the sole sandbox-release path. It exists symmetrically on both `AgentSession` and `AgentSessionResult` (sharing a one-shot internal teardown promise). `AgentSession.destroy()` also implicitly cancels an in-flight run via `stop()` before releasing the sandbox, so callers don't need a two-step. Pre-1.0 package, clean break — no consumers to migrate.
…ndler Brutal spike — wire the Slack chat session lifecycle through cyrus-agent-runtime's createAgentSession instead of the legacy IAgentRunner + AgentSessionManager + RunnerConfigBuilder stack. Removed: - packages/edge-worker/src/ChatSessionHandler.ts (515 lines) - packages/edge-worker/test/chat-sessions.test.ts - EdgeWorker.getDefaultModelForRunner / getDefaultFallbackModelForRunner (only used by the deleted chat-session createRunner callback) - EdgeWorker.getChatThreadLastReply stub now returns null (F1 tests that depended on the runner's getMessages() need a new approach) Added: - packages/edge-worker/src/AgentChatSessionHandler.ts — ~280-line replacement that drives createAgentSession per Slack mention, posts the harness-extracted result back to Slack, destroys the sandbox in a finally. Modified: - SlackChatAdapter.postReply(event, runner: IAgentRunner) → postReply(event, finalText: string). Decouples the adapter from the runner machinery. - EdgeWorker wiring: drops runnerConfigBuilder/createRunner/ onStateChange/onClaudeError from chat-session deps, replaces shutdown's getAllRunners() with chatSessionHandler.shutdown(). - package.json: adds cyrus-agent-runtime workspace dep. Brutal cuts (documented in the new file's header): - No multi-turn --continue resume — each Slack mention is a fresh AgentSession. Conversation continuity comes from the adapter's fetchThreadContext() injecting prior thread messages as text. - No mid-flight stream injection — busy threads get notifyBusy. - No MCPs — agent-runtime doesn't yet wire mcps through to the harness CLI, and the in-process cyrus-tools server wouldn't translate across the subprocess boundary anyway. Slack chat sessions run with the Claude CLI default toolset only. - Claude harness only — no runner selection. - No persisted session state across restarts. Validation: - pnpm typecheck (clean across monorepo) - pnpm test:packages:run (601 edge-worker tests, 114 claude-runner, 198 gemini-runner, 62 slack-event-transport, etc. — all green)
You asked for the Slack chat replacement to use the Daytona+Claude flow we just validated end-to-end; the first cut accidentally used the local sandbox. Switching to Daytona: - Each Slack mention spawns a fresh Daytona sandbox at /home/daytona (timeout 5min, name `cyrus-slack-<sessionId>`, metadata tagged `purpose: cyrus-slack-chat` for visibility in Daytona's console). - Setup commands install @anthropic-ai/claude-code with a user-local npm prefix and verify the version — same script that worked in the streaming-spike daytona-runtime probe. - Harness command is the full path `/home/daytona/.npm-global/bin/claude` (no PATH override at the env level — that broke npm in the earlier spike). - Secrets carry CLAUDE_CODE_OAUTH_TOKEN and ANTHROPIC_AUTH_TOKEN from the EdgeWorker process env into the sandbox. - Compute SDK is configured once per process via a module-level guard. - Refuses to construct without DAYTONA_API_KEY in env. - Posts a clear failure message if the Claude token is missing. Drops the unused createWorkspace helper and cyrusHome/ chatRepositoryProvider deps — Daytona owns its own working dir and the adapter holds its own ChatRepositoryProvider. Adds @computesdk/daytona as a direct edge-worker dep (was only transitively available before). Validation: - pnpm typecheck (clean across monorepo) - pnpm --filter cyrus-edge-worker test:run (601 tests)
The @daytonaio/sdk package compiles with TypeScript's `importHelpers`
option but doesn't declare `tslib` as a runtime dependency. Without
this, `import "tslib"` from the SDK fails at runtime with
ERR_MODULE_NOT_FOUND when the SDK is loaded through @computesdk/daytona
inside cyrus-edge-worker / cyrus-agent-runtime.
`pnpm.packageExtensions` patches the upstream package.json at install
time so pnpm installs tslib alongside @daytonaio/sdk, making the ESM
resolver's bare-import lookup succeed.
Verified: `import("@computesdk/daytona")` followed by an actual
provider instantiation no longer hits ERR_MODULE_NOT_FOUND on tslib.
Upstream fix is needed in @daytonaio/sdk itself; this workaround can
be removed once they ship a version with tslib in dependencies.
The previous fix (pnpm.packageExtensions) added the tslib symlink under @daytonaio/sdk's isolated node_modules but did NOT rewrite the on-disk package.json. Node.js's standard ESM resolver doesn't care, but import-in-the-middle (used by @opentelemetry instrumentation) hooks the resolve step and validates against the importer's declared deps — because tslib wasn't in @daytonaio/sdk's package.json, the hook rejected the bare specifier even though the symlink was present. Use pnpm.patchedDependencies instead: writes a real patch file under patches/ that adds `"tslib": "^2"` to @daytonaio/sdk's dependencies on disk. The install directory hash now includes a patch_hash suffix (visible in error paths if it ever fails again), so it's easy to tell whether the patch applied at all. Keeps the packageExtensions entry too as belt-and-suspenders. To pick this up on a running host: git pull rm -rf node_modules # force re-link pnpm install pnpm build The rm -rf is the key step — pnpm sees the lockfile as up to date if the high-level dep graph hasn't changed and skips re-linking, which left the previous fix's tslib symlink in place but unused. Upstream fix is needed in @daytonaio/sdk; remove patches/ when they ship a version with tslib in dependencies.
Restructure the agent-runtime public API so an AgentSession is a
long-lived handle that can be run multiple times against the same
sandbox, with per-session state backing that makes the next run
automatically resume the prior conversation.
API changes (breaking, pre-1.0):
- AgentSession.start() → AgentSession.run(userPrompt). Each call is one
turn. First call materializes files/folders/repos and runs setup;
subsequent calls skip all that and invoke the harness with its
continue flag.
- CreateAgentSessionConfig drops `userPrompt` (now passed per-turn to
run()) and gains `agentSessionsRoot?: string` (default
`~/.cyrus-agent-sessions/`).
- HarnessAdapter gains `stateDirectories: readonly string[]` declaring
the relative paths under HOME where the harness keeps its session
state — Claude `.claude`, Codex `.codex`, Gemini `.gemini`.
- HarnessAdapter.buildCommand now takes a second `HarnessRunOptions`
argument with `userPrompt` and `continueSession: boolean`. Adapters
map `continueSession` to their CLI's resume flag (Claude `--continue`)
and suppress system-prompt injection on continuation.
Runtime mechanics:
- RuntimeAgentSession provisions `~/.cyrus-agent-sessions/<sessionId>/`
per session and sets HOME to that dir on every harness invocation.
Per-session HOME means concurrent local sessions don't trample each
other's `.claude/projects/...jsonl` state, and resume Just Works
because the .claude directory is naturally persistent between turns.
- For Daytona, same mechanism — HOME inside the sandbox is the
per-session backing path, and since the sandbox stays warm between
run() calls, .claude/ survives there too.
- session.stop() now cancels only the in-flight run (per-run abort
controller) and does NOT destroy the sandbox. session.destroy() is
the sole sandbox-release path and also runs folder syncback.
- folder syncback moved from end-of-run to session.destroy() — same
rationale as stop/destroy split.
Tests: 30 passing (was 29). New test verifies multi-turn run() with
the second turn passing --continue and skipping setup.
Chat handler update (Slack):
AgentChatSessionHandler rewritten for the warm-thread pattern:
- threadSessions: Map<threadKey, { session, lastActivityAt, inFlight }>
- First mention: createAgentSession (Daytona, Claude, install setup);
state kept warm.
- Subsequent mentions on same thread: session.run() reuses the warm
sandbox via --continue. Setup commands don't re-run.
- Concurrent mention while a run is in-flight: notifyBusy (no stdin
injection yet).
- Idle TTL (default 15min): periodic sweep destroys idle threads.
- Run failure: destroy + free slot so next mention is a clean start.
- Shutdown: clear sweep timer + destroy all warm sessions.
Validation:
- pnpm typecheck (clean across monorepo)
- pnpm --filter cyrus-agent-runtime test:run (30 tests)
- pnpm --filter cyrus-edge-worker test:run (601 tests)
Adds a sandbox.destroyWhileInactive flag to CreateAgentSessionConfig
that pauses (Daytona: sandbox.stop()) the underlying sandbox after every
session.run() returns and resumes it (sandbox.start()) before the next
run. State on disk inside the sandbox (including ~/.claude/) is
preserved by Daytona during stop, so the next turn's `--continue` finds
the prior conversation intact at much lower cost than a from-scratch
recreate.
For the local sandbox the flag is a no-op (local sessions are always
free). For Daytona it surfaces as new transcript events:
sandbox.pause.started/completed/failed and
sandbox.resume.started/completed/skipped.
AgentChatSessionHandler turns the flag on so Slack chat threads stop
billing compute between mentions.
Three real proofs added under test-scripts/, all validated against
real Daytona+Claude:
resume-proof.mjs local — local sandbox, multi-turn
resume-proof.mjs daytona-warm — Daytona warm, multi-turn
resume-proof.mjs daytona-efficient — Daytona pause/resume, multi-turn
slack-handler-proof.mjs — full AgentChatSessionHandler
flow with two mentions
(pause between)
Each proof gives Claude a code word in turn 1 and verifies the
turn-2 reply repeats it back, proving --continue actually preserved
the conversation across the lifecycle event being tested.
Recorded results:
- local: turn1 4.1s "noted" turn2 4.4s "BANANA-7"
- daytona-warm: turn1 19.8s "noted" turn2 6.7s "BANANA-7"
- daytona-efficient: turn1 17.9s "noted" turn2 8.4s "BANANA-7"
- slack-handler-proof: m1 19.9s "noted" m2 8.3s "BANANA-7"
Other fixes in this commit:
- session.ts no longer overrides HOME for any provider. The earlier
override broke Claude auth locally (empty ~/.claude/) and silently
produced no stream-json events on Daytona (HOME pointed at a host
path that didn't exist inside the remote sandbox). The Daytona
sandbox preserves state across stop/start so its natural
/home/daytona HOME is the right answer for both warm and
destroyWhileInactive modes.
- session.ts can now resume a paused sandbox at destroy() time so
syncFoldersBack still has a live sandbox to read from.
Validation:
- pnpm typecheck (clean across monorepo)
- pnpm --filter cyrus-agent-runtime test:run (30 tests)
- pnpm --filter cyrus-edge-worker test:run (601 tests)
- Four real proofs above, all PASSED
… hooks + skills)
Introduces a provider-agnostic RuntimePlugin shape and per-harness
materializers that translate ONE declaration into Claude-, Cursor-,
or Codex-native filesystem state (or CLI flags). Each materializer
was developed against a real CLI smoke test before any code landed.
### Public surface
CreateAgentSessionConfig grows `plugins: PluginInput[]` where
PluginInput is either an inline RuntimePlugin or `{ rootPath: string }`
(rootPath resolution is stubbed for v1 — inline only is fully
implemented). The shape:
interface RuntimePlugin {
name: string;
version?: string;
description?: string;
mcpServers?: Record<string, McpServerRuntimeConfig>;
hooks?: PluginHook[];
skills?: PluginSkill[];
}
Hook events are a universal subset: PreToolUse, PostToolUse,
SessionStart, Stop, UserPromptSubmit. Each materializer maps these
to harness-native names and silently drops events that don't
translate.
### Per-harness materialization
Claude — materializePluginForClaude writes:
<workingDirectory>/.cyrus-plugins/<name>/
.claude-plugin/plugin.json
.mcp.json (when mcpServers present)
hooks/hooks.json (when hooks present)
skills/<skillName>/SKILL.md (+ optional assets)
The Claude harness adapter appends `--plugin-dir <pluginDir>` plus
`--mcp-config <path> --strict-mcp-config` to the `claude -p`
invocation.
Cursor — materializePluginForCursor writes:
<workspaceRoot>/.cursor/
mcp.json (merged across plugins)
hooks.json (merged across plugins)
skills/<skillName>/SKILL.md (+ optional assets)
The Cursor adapter appends `--approve-mcps` when any plugin
declared MCP servers (otherwise headless cursor-agent silently
drops them).
Codex — materializePluginForCodex:
- Writes skills to `$HOME/.agents/skills/<name>/SKILL.md` plus
`agents/openai.yaml` for the OpenAI runtime. Codex skill
discovery is rooted at $HOME/.agents/skills/ (verified
empirically — NOT $CODEX_HOME/skills/ as the docs suggested).
- Returns MCP servers as inline `-c 'mcp_servers.<name>={...}'`
TOML overrides on the CLI — no file write.
- The session env-merges `HOME = <session-state-dir>` for codex
runs so the materialized skills are isolated.
- Hooks deferred for v1 (Codex hooks schema is version-pinned).
### Plugin lifecycle
In RuntimeAgentSession.run() first-turn materialization order:
files → folders → repositories → plugins → setup → harness
Materializer outputs are persisted on the session so subsequent
turns re-pass the same CLI flags via HarnessRunOptions.pluginOutputs.
New transcript events: plugin.materialize.{started,completed,
skipped,failed}.
### Validation
Three CLI smoke tests, each writing a minimal plugin tree and
verifying the real CLI loads it:
- Claude: --plugin-dir loads .claude-plugin/plugin.json + SKILL.md;
skill triggered, response = HELLO-FROM-PLUGIN.
- Cursor: .cursor/skills/<n>/SKILL.md auto-discovered; skill
triggered, response = HELLO-FROM-CURSOR-SKILL.
- Codex: $HOME/.agents/skills/<n>/SKILL.md discovered with HOME
override; response = HELLO-FROM-CODEX-HOMEAGENTS. `-c
mcp_servers.<name>={...}` confirmed routing through to codex's
MCP runtime.
End-to-end plugin-proof.mjs against real Daytona + Claude: cold
sandbox + Claude install + plugin materialization + skill-triggered
run, response = "HELLO-FROM-PLUGIN" in 13.9s.
Unit test added to runtime.test.ts (31 tests total) asserting the
on-disk plugin tree shape AND the harness command-line flags.
Surprises corrected vs. the original matrix:
- Cursor DOES have first-class SKILL.md (not just rules).
- Codex skills are at $HOME/.agents/skills/ not $CODEX_HOME/skills/.
- Codex MCP can be passed entirely via -c flags, no file write.
After learning tests against codex 0.130.0, the hook engine exists for all documented events but `codex exec` filters every newly-discovered hook through a trust gate (see hooks/src/engine/discovery.rs). Trust comes from a TUI `/hooks` review step that exec mode has no access to, and the `bypass_hook_trust` field is hidden + doesn't fire hooks when set via `-c bypass_hook_trust=true` in our tests. Document the investigation in the codex materializer so future revisits don't repeat the rabbit hole. Skills + MCP servers continue to materialize correctly; only `plugin.hooks` is silently dropped.
The standalone `mcps` field was a back-compat carryover from the pre-plugin design. It hadn't been wired through the runtime since the RuntimePlugin abstraction landed, so callers got silent no-ops if they used it. Remove it from both the TypeScript surface and the zod schema so the API is honest about plugins being the only path to MCP servers. A plugin with `mcpServers` populated and `hooks`/`skills` omitted is the standard "MCP-only" carrier — the materializer fans it out into each harness's native shape (Claude plugin tree, .cursor/mcp.json, codex `-c mcp_servers.*` overrides).
…t in 0.130.0 Earlier comment implied bypass_hook_trust existed but was "not plumbed correctly." Re-checked the installed binary directly: `strings` on codex 0.130.0 has zero occurrences of bypass_hook_trust / bypass-hook-trust / bypassHookTrust. The field exists on the codex `main` branch but was added after 0.130.0. So `--bypass-hook-trust` genuinely doesn't exist as a CLI flag, and `-c bypass_hook_trust=true` is a silent no-op because nothing reads that key in this release. The conclusion (defer codex hooks until trust-bypass exists or codex pre-trusts plugin-bundled hooks) is unchanged.
…alization Rewrote the codex hooks deferral comment to point at the two open upstream issues that block our use case end-to-end: 1. openai/codex#21639 — direct config-layer hooks (hooks.json, [[hooks.X]] in config.toml) stopped firing in 0.129.0+ versus the working 0.128.0-alpha.1 baseline. Independently confirmed by ≥5 users across multiple releases; we reproduced on 0.131.0 with every combination of feature flags and bypass-trust. 2. openai/codex#16430 — plugin manifest `hooks` field silently dropped by the manifest parser; discovery walker never scans the installed-plugin tree. Confirmed via `codex plugin list` showing "(installed, enabled)" while plugin-bundled hooks never register, regardless of [features].plugin_hooks state. The 0.131.0 --dangerously-bypass-hook-trust flag is real but doesn't help: #21639 prevents discovery from finding any hook to bypass- trust, and #16430 prevents the plugin manifest from contributing any hook to discovery in the first place. Revisit conditions and a fallback materialization plan (write a session-local hooks.json under per-session CODEX_HOME if #21639 closes ahead of #16430) are now documented inline.
Introduces a new optional `defaultProvider` field on EdgeConfig backed by a ProviderTypeSchema enum that currently accepts "local" or "daytona". This lets users configure the default sandbox provider for sessions without needing to pass it explicitly per call. Additional provider backends (other ComputeSDK targets) can be added to the enum as they're wired through the runtime. - New ProviderTypeSchema / ProviderType exports from core - Re-exported from config-types.ts so consumers get them via the standard core public surface - Field placed next to defaultRunner for discoverability — both are "default" settings for runtime selection - Regenerated JSON schema artifacts under packages/core/schemas/
…nfig.defaultProvider The chat handler is no longer hardwired to Daytona. It now accepts a `provider: ProviderType` dependency (defaulting to `"local"`) and branches its session-config build accordingly: - **local**: harness runs on the host (`claude` from `PATH` or via the optional `claudeCliPath` override); no `DAYTONA_API_KEY` needed; the runtime gives each session its own HOME under `~/.cyrus-agent-sessions/<id>/` so `.claude/` is isolated and resumable across `--continue` turns. - **daytona**: existing behavior — fresh sandbox seeded via `npm install -g @anthropic-ai/claude-code`, paused between turns via `destroyWhileInactive`, destroyed on idle TTL eviction. EdgeWorker passes `this.config.defaultProvider` through when wiring up the Slack handler, so the value flows from `~/.cyrus/config.json` -> `EdgeConfig.defaultProvider` -> `AgentChatSessionHandler`. Also: re-exported `ProviderType` / `ProviderTypeSchema` from `cyrus-core`'s public index so consumers can reach them without reaching into `config-types.js`. Test coverage: 4 new tests in `packages/edge-worker/test/AgentChatSessionHandler.provider.test.ts` covering local default, explicit local, daytona without key (throws), daytona with key (constructs cleanly). Full edge-worker suite passes (605/605).
…ever forward both CLAUDE_CODE_OAUTH_TOKEN and ANTHROPIC_API_KEY are distinct auth modes in Claude Code with different billing semantics — OAuth runs against a Claude Code subscription, API key runs against direct Anthropic API access. They are not aliases for the same credential. The handler had been: - reading `ANTHROPIC_AUTH_TOKEN` (not a real env var Claude Code looks at) - forwarding the same value as BOTH `CLAUDE_CODE_OAUTH_TOKEN` and `ANTHROPIC_AUTH_TOKEN`, which conflated the two auth modes Replaced with a discriminated `ClaudeCredential` union and a `readClaudeCredential()` helper: - OAuth takes precedence: if `CLAUDE_CODE_OAUTH_TOKEN` is set, the handler uses subscription auth. - Otherwise falls back to `ANTHROPIC_API_KEY`. - Forwards exactly the env var that was set; never both. - Error message and class docstring updated to name both variables and explain the distinction.
Chat sessions now run with the same workspace-level MCP servers (Linear, cyrus-tools, cyrus-docs, optional Slack) that repo-bound sessions get. Previously the system prompt told Claude those servers existed while the runner ran with the default toolset only — a docstring-vs-reality mismatch the spike comment acknowledged. agent-runtime side: widen `McpServerRuntimeConfig` to accept the full SDK schema (`type: "http" | "sse" | "stdio"`, plus a permissive index signature for `tools`, `alwaysLoad`, …) and switch the zod schema to `.passthrough()` so SDK-shaped entries flow through to the materializer verbatim instead of being silently stripped. edge-worker side: - `AgentChatSessionHandlerDeps` is now generic over `TEvent` and carries a new optional `buildMcpServers(event)` callback. - The handler invokes it once per thread on first session creation (warm threads reuse the existing session as before), then wraps the result into a single anonymous `RuntimePlugin` named "chat" via a small `toRuntimeMcpServers` adapter that drops SDK-instance entries (those can't cross the runtime's subprocess boundary). - `EdgeWorker.buildChatMcpServers(event)` delegates to the existing `McpConfigService.buildMcpConfig` with a synthetic `chat-<teamId>` repoId and the first configured Linear workspace, so cyrus-tools context wiring stays uniform with repo-session paths. Class docstring updated: "No MCP servers" caveat removed; replaced by a section that describes the new `buildMcpServers` plumbing, supported transports, and the SDK-instance limitation. Tests: monorepo typecheck clean; all 605 edge-worker tests pass; all 31 agent-runtime tests pass (no schema regressions).
The pi harness adapter wrapped a binary named `pi` that has no traceable upstream (no npm package, no GitHub repo we could attribute it to, no install command in our docs). It's been in the codebase since the runtime's first commit but was never wired through to any runner test scope and we have no SDK or schema to type its stream-json output against. Removing rather than carrying it as `raw: unknown` forever: - delete `harnesses/pi.ts` - drop `"pi"` from `HarnessKind` union (types.ts) - drop `"pi"` from `HarnessKindSchema` enum (schemas.ts) - drop `piHarness` import + re-export + `harnessAdapters` record entry (harnesses/index.ts) - update the supported-kinds assertion in harnesses.test.ts If pi comes back as a first-class target (with attribution and a binary we can install), it goes back in cleanly — the adapter shape is simple enough that re-adding takes minutes. 31/31 agent-runtime tests pass; monorepo typecheck clean.
…eric
The runtime now propagates harness-kind through to event typing. Saying
`createAgentSession({ harness: "claude", … })` yields an
`AgentSession<"claude">` whose `events` stream is
`AsyncIterable<TranscriptEvent<SDKMessage>>` — `event.raw` narrows to
the upstream SDK's union with no cast required.
New types in `src/types.ts`:
- `HarnessRawByKind` — lookup type from harness kind to its SDK event
union. Empirically verified against each CLI's stdout (PR notes).
- `OpenCodeStreamEvent` — local envelope for opencode's JSONL output
(the SDK's `Event` union describes a different surface; we type only
the inner `part: Part`).
- `cursor` deliberately stays `unknown` — `@cursor/sdk`'s `SDKMessage`
describes a different surface than `cursor-agent`'s stream-json. The
follow-up plan is to vendor a small driver that wraps `@cursor/sdk`
directly, at which point cursor's row becomes
`import("@cursor/sdk").SDKMessage`.
Generic propagation:
- `TranscriptEvent<TRaw = unknown>` — `raw` is now `TRaw`, defaults to
`unknown` for back-compat.
- `AgentSession<H extends HarnessKind = HarnessKind>` — `events`,
`run()`, etc. carry `H` through.
- `AgentSessionResult<H>` and `RuntimeCallbacks<H>` follow.
- `createAgentSession<H>(config: CreateAgentSessionConfigFor<H>)`
infers H from `config.harness`. New helper type
`CreateAgentSessionConfigFor<H>` narrows the `harness` field.
Internal `RuntimeAgentSession` stays non-generic (operates on the
loose union); the public factory casts at the boundary.
Existing consumers reading `event.raw as unknown` continue to compile
unchanged — `AgentSession` defaults to `AgentSession<HarnessKind>`
which keeps the current weak typing.
Also fixes a long-standing opencode adapter bug: `--output-format json`
→ `--format json`, the actual CLI flag per `opencode run --help`. The
old flag would have failed at runtime on first invocation.
Adds 4 type-only devDependencies under @Anthropic-AI, @openai,
@google, @opencode-ai — never bundled, never imported at runtime.
Tests: monorepo typecheck clean; 34/34 agent-runtime tests pass
(adds 3 compile-time type-narrowing assertions); 605/605 edge-worker
tests pass.
Two related changes — first a new API addition, then the first consumer that benefits from the per-harness typing we landed last commit. agent-runtime: - New AgentSession.transcript() — returns a snapshot of every event observed on the session so far, in insertion order. Useful for cross-turn replay, post-hoc inspection, building a UI timeline without consuming the live `events` async iterable, or resuming consumption from a known index across reconnects. Returns a fresh copy so callers can't mutate the internal buffer. - Implementation in RuntimeAgentSession is a one-liner over the existing `observedEvents` array; that array was already being populated for the rolling-result use case. - Two compile-time tests assert the typing: `transcript()` on AgentSession<"claude"> returns readonly TranscriptEvent<SDKMessage>[] and AgentSession<"codex"> returns readonly TranscriptEvent<ThreadEvent>[]. edge-worker: - AgentChatSessionHandler now narrows `state.session` to AgentSession<"claude">. The handler creates Claude sessions only (per the "Claude harness only" caveat in its docstring), so this narrowing surfaces SDKMessage typing throughout the run/result/ transcript chain. - extractAssistantFallback's manual cast soup is gone — the function now walks `events: readonly TranscriptEvent<SDKMessage>[]`, narrows via `e.raw.type === "assistant"` (TS discriminates on the SDK union), and iterates `e.raw.message.content` with full BetaContentBlock typing. Removes 8 lines of inline type guards. - buildSessionConfig returns CreateAgentSessionConfigFor<"claude"> (the H-narrowed variant), and the local-provider branch's harness config uses `kind: "claude" as const` so the literal flows through. createAgentSession is called with explicit <"claude"> type arg. Also: pinned agent-runtime's @anthropic-ai/claude-agent-sdk to 0.2.123 (exact) to match the rest of the workspace — pnpm add @latest had grabbed ^0.3.145, which made two copies of the SDK resolve and broke nominal type unification between SDKMessage references. Tests: 36/36 agent-runtime (adds 2 transcript-typing assertions); 605/605 edge-worker; monorepo typecheck clean.
…typed
Closes the typing pass for the 5th and last harness. cursor's row in
HarnessRawByKind flips from `unknown` to `@cursor/sdk`'s `SDKMessage`,
matching the other four harnesses.
Why a wrapper. We previously verified that `cursor-agent
--output-format stream-json` emits a different schema than what
`@cursor/sdk` declares: cursor-agent uses `session_id`, `subtype`,
nested `tool_call.shellToolCall.{args,result}`, and a `result` event
that the SDK union doesn't include; the SDK uses `agent_id`, `run_id`,
`status`, top-level `args`/`result`, and no `result` variant. Typing
`raw` as `SDKMessage` while spawning `cursor-agent` would be a lie.
The fix: vendor a tiny driver that uses `@cursor/sdk`'s `Agent.create`
+ `run.stream()` ourselves. Spawn it as `node <driver>` from the
cursor adapter. The bytes on the wire ARE `SDKMessage` by
construction — there's no schema drift to worry about because we own
the producer.
What's new:
- src/harnesses/cursor-driver.ts — Node ESM script that parses argv
(--prompt, --model, --cwd, --system-prompt, --agent-id,
--agent-id-file), creates an Agent via `@cursor/sdk`, streams
SDKMessage events to stdout as JSONL, and exits 0/1/2.
- src/harnesses/cursor.ts — rewritten to spawn `node <driver-path>`
instead of `cursor-agent`. Driver path resolved via
`import.meta.url` against `./cursor-driver.js`, sibling in both src
and dist. The adapter's `extractResult` now walks
`event.raw as SDKMessage` and narrows via the discriminator — no
manual guards.
- HarnessRawByKind["cursor"] = SDKMessage from @cursor/sdk.
- @cursor/sdk added as a regular dependency (not devDep) since the
driver imports it as a value.
Internal cleanup forced by the typing:
- RuntimeAgentSession no longer formally `implements AgentSession`.
The public interface is generic over H; the internal class works
with the loose `TranscriptEvent<unknown>` form and the factory in
runtime.ts casts at the boundary (which it was already doing).
- run()'s `turnEvents` is cast to `AgentSessionResult["events"]` at
the return site — single boundary, type-safe.
- emitEvent() casts the callback to the loose `TranscriptEvent` form
at the call site (the public boundary is the factory cast in
runtime.ts).
End-to-end smoke test against the real Cursor API confirmed the
driver's stdout matches the SDK union exactly: `status`/`tool_call`/
`assistant` variants with `agent_id`+`run_id`, no schema drift. Exit
0, 22 valid JSON lines, zero stderr noise.
Known limitation. The driver's path is on the host, so this works
unmodified for the local provider; Daytona needs the driver
materialized into the sandbox + `@cursor/sdk` installed there. Left
as a TODO in the cursor adapter — chat is Claude-only today so no
existing functionality regresses.
Tests: 36/36 agent-runtime (typed-events asserts cursor now resolves
to CursorSDKMessage; harnesses test updated for the new command
shape); 605/605 edge-worker; monorepo typecheck clean.
WorkerService cherry-picked fields from the loaded EdgeConfig into the EdgeWorkerConfig but never forwarded defaultProvider, so the value configured in ~/.cyrus/config.json was silently dropped and chat sessions always defaulted to the local sandbox provider — even when the operator had explicitly selected daytona.
…pass perms
Add support for booting Daytona-backed chat sandboxes from a pre-built
snapshot rather than the default base image, configurable via env:
- DAYTONA_SNAPSHOT: pre-built snapshot to seed the sandbox from. When
set, the npm-install bootstrap is skipped (the snapshot is expected
to ship Claude Code preinstalled) and the CLI defaults to `claude`
on PATH.
- DAYTONA_WORKING_DIR: in-sandbox working/home directory (default
`/home/daytona`). Set this when the snapshot uses a different user
layout, e.g. `/home/cyrus`.
- DAYTONA_CLAUDE_CLI_PATH: absolute path to the `claude` binary inside
the sandbox. Defaults to `<workingDir>/.npm-global/bin/claude` when
no snapshot is set, or `claude` (PATH-resolved) when a snapshot is.
Plumbing:
- Add an optional `snapshot` field to RuntimeSandboxConfig, forwarded
by ComputeSdkSandboxProvider as `snapshotId` (which the ComputeSDK
Daytona adapter maps to Daytona's snapshot create param).
- Add the same field to the Zod schema so it survives normalization
(the schema was silently stripping unknown sandbox keys, which is
what made the wired-up snapshot value never reach the SDK call).
- Translate Cyrus's cross-harness PermissionMode to Claude's CLI flag
values in the Claude harness adapter (`"bypass"` -> `"bypassPermissions"`,
`"ask"` -> `"default"`). Without this, passing `"bypass"` failed at
the CLI boundary since Claude does not accept that string.
- Default Daytona chat sessions to `permissions: { mode: "bypass" }`
so the agent can run shell commands inside the sandbox — the
sandbox itself is the isolation boundary, so per-tool prompts (which
no user can answer) are noise. Local sessions are unchanged.
…ackage
The driver script that vendors `@cursor/sdk` for the cursor harness
adapter moves out of `cyrus-agent-runtime`'s `src/harnesses/` into its
own publishable package at `packages/cursor-sdk-runner/` with the npm
name `@cyrus/cursor-runner`.
Why a dedicated package:
- Standalone reusable tool. Anyone who wants typed Cursor streaming
across a process boundary can `npm install -g @cyrus/cursor-runner`
and spawn it from any language/runtime, not just from cyrus.
- Cleaner dependency surface. agent-runtime no longer carries
`@cursor/sdk` as a runtime dep — it stays a devDep just for the
`SDKMessage` type import that backs `HarnessRawByKind["cursor"]`.
The actual @cursor/sdk install moves into the new package.
- Independent versioning. The driver can iterate against new Cursor
SDK releases without forcing an agent-runtime release.
- Distinct from the legacy `cyrus-cursor-runner` package (the
IAgentRunner-style `cursor-agent`-CLI wrapper that the new
agent-runtime is replacing). Two packages with clearly different
scopes — no confusion.
Package contents:
- `package.json` — name `@cyrus/cursor-runner`, version synced to
workspace (0.2.51), `bin: { "cursor-runner": "dist/index.js" }`,
publishConfig.access public, MIT license inheriting from monorepo.
- `src/index.ts` — same driver logic as before, with a `#!/usr/bin/env
node` shebang so the bin is executable post-install. Argv contract
unchanged (--prompt, --model, --cwd, --system-prompt, --agent-id,
--agent-id-file).
- `README.md` — install instructions, options table, exit codes, and
the consumer narrowing pattern showing how to import SDKMessage.
- `tsconfig.json` mirrors other workspace runners.
agent-runtime wiring:
- Removed `src/harnesses/cursor-driver.ts` (moved upstream).
- `src/harnesses/cursor.ts` resolves the runner via
`createRequire(import.meta.url).resolve("@cyrus/cursor-runner")`
instead of a sibling-file URL. Works for both pnpm workspace
symlinks (today) and standalone npm installs (when the package is
published).
- `@cursor/sdk` moved from dependencies to devDependencies (type-only).
- Added `@cyrus/cursor-runner: workspace:*` as a dependency.
End-to-end smoke test against the real Cursor API confirmed via the
new resolved path: the bin spawns, streams 17 SDKMessage JSON lines,
exits 0 with zero stderr noise. Wire format still has agent_id /
run_id / status fields exactly per the `@cursor/sdk` union.
Tests: 36/36 agent-runtime (cursor command shape test updated for the
new path-resolution pattern); 605/605 edge-worker; monorepo
typecheck clean.
Known limitations carried forward unchanged from the previous commit:
- Local provider only — the runner's path resolves to a host node_modules
location that doesn't exist inside a remote sandbox. Daytona support
needs the runner installed into the sandbox via setup commands
(`npm install -g @cyrus/cursor-runner`) once the package is
published. Slack chat is Claude-only so no current functionality
regresses.
- Multi-turn resume is wired in the runner (--agent-id-file /
--agent-id) but not threaded from the cursor adapter yet; same
TODO as before.
`git add <files>` skipped the deletion in the previous commit because none of the explicit paths I staged covered it. The driver moved to `packages/cursor-sdk-runner/src/index.ts` and was already referenced through the new `@cyrus/cursor-runner` package; the stale source file just lingered. Removing it now.
…apshot' into claude/agent-runtime-slack-chat-replacement
…ot mode
Mirrors how Claude's adapter handles `DAYTONA_CLAUDE_CLI_PATH`. The
cursor adapter now has two invocation shapes:
- **Default (local provider)**: no `harness.command` set, falls back to
`createRequire("@cyrus/cursor-runner")` resolution and spawns
`node <host-resolved-path>`. Same behavior as before for anyone not
setting a custom command.
- **Override (Daytona snapshot mode)**: `harness.command` is the
cursor-runner binary inside the sandbox. Spawned directly — the
runner's `#!/usr/bin/env node` shebang makes it executable, no
intermediary `node` needed. Callers pass `"cursor-runner"` to use
the sandbox's PATH (which Daytona snapshots populate with the
preinstalled bin) or an absolute path to pin a specific copy.
This composes cleanly with the snapshot work: a Daytona snapshot
ships `@cyrus/cursor-runner` preinstalled alongside the harness
binaries, callers set `harness: { kind: "cursor", command: "cursor-runner" }`,
and the adapter doesn't care that it's running in a remote sandbox vs
on the host — same pattern as Claude getting `command: "claude"` for
PATH lookup inside a snapshot.
The chat handler doesn't use cursor today (Claude-only), so no
existing consumer needs updating. When a cursor-on-Daytona consumer
shows up, they wire `harness.command` from the same kind of env
(e.g. `DAYTONA_CURSOR_RUNNER_PATH`) as the chat handler already does
for Claude.
Tests: added a paired test asserting the override shape; previous test
renamed for clarity ("via the host-resolved … when harness.command is
unset" vs "uses harness.command directly … Daytona-snapshot mode").
40/40 agent-runtime tests pass; monorepo typecheck clean.
…rsor-runner The `@cyrus` npm scope is unclaimed; @Cyrus-AI is the registered org. Rename across: - packages/cursor-sdk-runner/package.json (name) - packages/cursor-sdk-runner/src/index.ts (docstring + usage example) - packages/cursor-sdk-runner/README.md (title, install command) - packages/agent-runtime/package.json (dependency entry) - packages/agent-runtime/src/harnesses/cursor.ts (createRequire target + docs) - packages/agent-runtime/test/harnesses.test.ts (test description) No behavioral change. 40/40 agent-runtime tests pass; 613/613 edge-worker; monorepo typecheck clean.
The typed events story (HarnessRawByKind narrowing SDKMessage / ThreadEvent / JsonStreamEvent / etc.) only holds if the SDK version we type against actually describes the bytes the CLI emits. Today the pins were a mix of exact (claude, cursor) and caret (gemini, codex, opencode) — carets let a future minor SDK release introduce shapes the runtime CLI doesn't emit (or vice versa) and quietly break the narrowing. Pinned everything to exact versions matching the CLI versions we've empirically tested against: | SDK pin | Was | Now | Matches CLI | | --------------------------- | --------- | -------- | ------------------------- | | @anthropic-ai/claude-agent | 0.2.123 | 0.2.123 | claude 2.1.145 | | @cursor/sdk | 1.0.13 | 1.0.13 | @cyrus-ai/cursor-runner | | @google/gemini-cli-core | ^0.42.0 | 0.17.0 | gemini 0.17.0 (per CLAUDE.md) | | @openai/codex-sdk | ^0.131.0 | 0.130.0 | codex 0.130.0 | | @opencode-ai/sdk | ^1.15.5 | 1.15.5 | opencode 1.15.5 | Also pinned every `@anthropic-ai/claude-code@latest` install command to `2.1.145` — the chat handler's Daytona setup commands, plus three test scripts. `@latest` was a silent drift surface: it would install a CLI whose stream-json shape might not match `@anthropic-ai/claude- agent-sdk@0.2.123` (the SDK we type against), and the breakage would show up as runtime type confusion in production rather than at build. Now a CLI version bump requires a coordinated SDK bump in the same PR — visible in the diff. Added a `PINNED_CLAUDE_CLI_VERSION` const + explanatory comment in AgentChatSessionHandler so future maintainers see the constraint. Tests: 40/40 agent-runtime, 613/613 edge-worker, monorepo typecheck clean. Downgrading gemini-cli-core (^0.42.0 → 0.17.0) and codex-sdk (^0.131.0 → 0.130.0) didn't break anything — both ship the same JsonStreamEvent / ThreadEvent unions we rely on at the older versions.
…the agent-runtime context The previous pin (0.17.0) was inherited from the LEGACY `cyrus-gemini-runner` package, which deliberately holds 0.17.0 for its own reasons (the package documents this in `packages/gemini-runner/CLAUDE.md` and pins identically in its own package.json). cyrus-agent-runtime is a different context — it's the new runtime that consumers will install going forward, so it should track the current gemini-cli-core line. Verified: `@google/gemini-cli-core@0.42.0` ships exactly the same `JsonStreamEvent` union we narrow against — 6 variants (InitEvent, MessageEvent, ToolUseEvent, ToolResultEvent, ErrorEvent, ResultEvent), identical to 0.17.0's union. So no narrowing breakage. `packages/gemini-runner/` is intentionally left at 0.17.0 — that's the legacy stack's own pinning decision and shouldn't move just because we touched the new runtime. Tests: 40/40 agent-runtime, 613/613 edge-worker, monorepo typecheck clean.
…ex-sdk to 0.131.0
Following the "agent-runtime tracks current" reasoning we used for the
gemini bump, move the remaining SDK pins forward:
- @anthropic-ai/claude-agent-sdk: 0.2.123 -> 0.2.141 (in 5 packages
that all need to move together for type identity to unify at the
package boundaries — agent-runtime devDep, plus runtime deps in
core / claude-runner / edge-worker / simple-agent-runner)
- @openai/codex-sdk: 0.130.0 -> 0.131.0 (latest; agent-runtime only)
Both are additive minor bumps — the SDKMessage / ThreadEvent unions
we narrow `HarnessRawByKind[H].raw` against are supersets of the
older shapes (claude added `SDKPermissionDeniedMessage` to the union
and `oauth_org_not_allowed`/`model_not_found` to the error enum;
codex 0.131 ships the same ThreadEvent union as 0.130). Nothing we
read from those types was removed or renamed, so existing consumers
(notably AgentChatSessionHandler.extractAssistantFallback) typecheck
clean.
Also updated the inline comment in AgentChatSessionHandler that
referenced `@anthropic-ai/claude-agent-sdk@0.2.123` as the SDK the
Daytona CLI install pin is paired against — bumped to 0.2.141.
Matching test assertion in AgentChatSessionHandler.provider.test.ts
updated to keep the version-pair note in sync.
Untouched:
- PINNED_CLAUDE_CLI_VERSION ("2.1.145") — that's the latest CLI
and pairs with the 0.2.141 SDK
- packages/codex-runner (^0.125.0) and packages/gemini-runner
(0.17.0) — legacy packages own their own pinning decisions
Tests: 40/40 agent-runtime, 613/613 edge-worker, monorepo typecheck
clean.
…drop SlackDaytonaRunner
Pulls the agent-runtime additions for caller-driven harness session
resume:
- RuntimeVolumeConfig.subpath — per-binding isolation within a
shared provider volume (Daytona Volumes pattern)
- CreateAgentSessionConfig.resumeHarnessSessionId — caller-supplied
harness session id to resume; adapter translates to its native
CLI flag
- HarnessAdapter.extractSessionId — pulls the harness-native session
id out of the observed transcript so callers can persist it
- AgentSessionResult.harnessSessionId — round-trips the id to the
caller after each run
- Claude adapter: extractSessionId reads `system.init.session_id`;
buildCommand appends `--resume <id>` when resumeHarnessSessionId
is set
- resume-smoke.mjs — two-turn Daytona Volume smoke test script
Explicitly DROPPED from the merge:
- packages/edge-worker/src/SlackDaytonaRunner.ts
- packages/edge-worker/test/SlackDaytonaRunner.test.ts
- the EdgeWorker.ts wiring that constructs SlackDaytonaRunner under
the legacy ChatSessionHandler — we use the new
AgentChatSessionHandler instead (already wired from earlier work
in this branch)
- @computesdk/daytona + @daytonaio/sdk deps in edge-worker package.json
(only SlackDaytonaRunner needed them; agent-runtime already
depends on @computesdk/daytona directly)
Conflict resolutions:
- claude.ts: kept our plugin wiring (--plugin-dir / --mcp-config)
AND added the volumes-branch's --resume handling. Both are
additive in the args list.
- session.ts: kept turnEvents semantics for AgentSessionResult.events
(per-turn slice, what consumers already expect) but added the
volumes-branch's harnessSessionId extraction over the FULL
observedEvents (since system.init.session_id arrives once on
turn 1 and is referenced by every subsequent turn). Adopted
destroySandboxOnce() for the destroy callback (cleaner than
destroy() which also tries to cancel an already-completed run).
- harnesses.test.ts: kept all tests from both sides — bypass
permission mapping + --resume + extractSessionId.
- runtime.test.ts: fixed the volumes-branch's resume test to use
our actual session API (session.run("prompt") vs session.start(),
config.userPrompt was wrong and was just a stray field).
Tests: 44/44 agent-runtime, 613/613 edge-worker, monorepo typecheck
clean.
…-to-end Wires cursor into the same caller-driven resume contract Claude got from the volumes merge: - `extractSessionId(events)` — walks the SDKMessage stream for the first `agent_id` (every variant carries it; the value is stable across the whole run). Returned as `AgentSessionResult.harnessSessionId` for the caller to persist. - `buildCommand` — when `config.resumeHarnessSessionId` is set, appends `--agent-id <id>` to the cursor-runner invocation. The runner reads that flag and calls `Agent.resume(<id>)` instead of `Agent.create()`, picking up the prior conversation. The runner's `--agent-id-file` flag is unchanged — kept for callers that prefer the runner-writes-it-to-disk pattern; this adapter just doesn't use it because the runtime now surfaces the id via extractSessionId. Includes a guard against non-object `event.raw` in extractSessionId — runtime lifecycle events can emit string raw values, and the `in` operator throws on those. The chat-typed shape says `raw: SDKMessage` but the buffer holds both harness-streamed and runtime-lifecycle events; structural guard is the right move at the adapter boundary. Tests: 4 new tests in harnesses.test.ts covering `--agent-id` passed when set, omitted when not, agent_id extracted from a realistic status+assistant transcript, and undefined when nothing in the transcript carries one. 48/48 agent-runtime pass; 613/613 edge-worker; monorepo typecheck clean. Cursor is now feature-equivalent to Claude on the multi-turn resume contract: caller persists `result.harnessSessionId`, passes it back as `resumeHarnessSessionId` on the next session config, gets a continuation run. Works for local and Daytona-snapshot modes alike since the resume state lives on Cursor's servers (addressable by agentId) rather than in any filesystem the runtime needs to preserve.
…ldStateEnv
Adds a consumer-facing `sandbox.persistentState: { volume, bindingId }`
config that hides the per-harness env-var math. The runtime mounts the
caller's volume at a fixed internal path with bindingId as the subpath,
then asks the harness adapter (via the new `buildStateEnv(mountPath)`
hook) which env vars to set so the harness writes its state-dir there
instead of under `$HOME`.
Each adapter's env mapping is grounded in upstream source, not guessed:
- claude → CLAUDE_CONFIG_DIR = `${m}/.claude`
- cursor → CURSOR_DATA_DIR = `${m}/.cursor`
- codex → CODEX_HOME = `${m}/.codex`
(`codex-rs/utils/home-dir/src/lib.rs::find_codex_home`)
- gemini → GEMINI_CLI_HOME = `${m}` (CLI appends `.gemini` itself;
`@google/gemini-cli-core::homedir()`)
- opencode → all four XDG dirs under `${m}/.opencode-xdg/{config,data,state,cache}`
(no app-specific override; opencode derives via `xdg-basedir`)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 19dce79533
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const oauth = process.env.CLAUDE_CODE_OAUTH_TOKEN?.trim(); | ||
| if (oauth) return { kind: "oauth", token: oauth }; | ||
| const apiKey = process.env.ANTHROPIC_API_KEY?.trim(); | ||
| if (apiKey) return { kind: "apiKey", token: apiKey }; |
There was a problem hiding this comment.
Support ANTHROPIC_AUTH_TOKEN in chat credential detection
readClaudeCredential() only accepts CLAUDE_CODE_OAUTH_TOKEN or ANTHROPIC_API_KEY, so Slack chat sessions now reject environments that authenticate Claude with ANTHROPIC_AUTH_TOKEN (they hit the "not configured" reply path). This is a regression from the prior runner path, which still treats ANTHROPIC_AUTH_TOKEN as a valid auth env (see packages/claude-runner/src/session-env.ts), so existing deployments using that variable will fail for every chat mention until they reconfigure.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 1e1531d — added a third authToken variant to the ClaudeCredential union, scan ANTHROPIC_AUTH_TOKEN last (matching session-env.ts AUTH_ENV_KEYS precedence), and forward exactly that env var to the harness for kind=authToken. Updated the "not configured" error to list all three options. New tests cover detection precedence, whitespace-only handling, and per-kind forwarding.
| if (out.mcpConfigPath) { | ||
| this.pluginOutputs.claudeMcpConfigPath = out.mcpConfigPath; |
There was a problem hiding this comment.
Merge Claude MCP configs across all plugins
In materializePlugins(), each Claude plugin's mcpConfigPath overwrites the previous one, but claudeHarness can pass only the final scalar via --mcp-config. When multiple plugins define MCP servers, only the last plugin's MCP config is actually wired into Claude (especially with --strict-mcp-config), so tools from earlier plugins silently disappear. This should aggregate all plugin MCP servers into one combined config (or equivalent) before command construction.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 1e1531d — materializePlugins now accumulates every plugin's mcpServers map and, after the loop, writes one combined <pluginsRoot>/.mcp.combined.json that --mcp-config points at. Per-plugin .mcp.json files are still written (part of the documented Claude plugin layout for --plugin-dir consumers); the combined file is just the handoff target for the single-scalar --mcp-config flag. Caller plugin order determines precedence on duplicate server names — later wins. New tests prove three plugins all reach the harness via one merged config and that the shadow rule holds.
Conflict resolutions: - CHANGELOG.internal.md: keep both — our Unreleased agent-runtime entries (refreshed to match what's actually on this branch, drop the SlackDaytonaRunner entry since that work was reverted) and main's [0.2.52] release marker. - package.json: keep all new pnpm.overrides from main; collapse the five overrides that appeared in both halves of the conflict block. - packages/edge-worker/src/EdgeWorker.ts: accept main's new buildSkillSessionContext method (three call sites depend on it). - pnpm-lock.yaml: regenerated via pnpm install --no-frozen-lockfile. Post-merge fixes (auto-merge produced bad output that compiled to duplicate object literal keys): - packages/codex-runner/src/CodexRunner.ts: remove duplicate stop_details: null from two assistant message factories. - packages/cursor-runner/src/CursorRunner.ts: same. - packages/gemini-runner/src/adapters.ts: same. Post-merge cleanup: - packages/edge-worker/src/EdgeWorker.ts: drop unused getDefaultModelForRunner / getDefaultFallbackModelForRunner private wrappers that main added — all call sites in this branch already use runnerSelectionService.* directly. Dependency security policy compliance: - Add root pnpm.overrides for brace-expansion (>=5.0.6), ws (>=8.20.1), protobufjs (>=7.5.8) to keep pnpm audit clean after the merge surfaced advisories under our pinned @google/gemini-cli-core transitive dep. Verified: pnpm build / pnpm typecheck / pnpm test:packages:run all clean; pnpm audit reports no known vulnerabilities.
Four standalone smoke scripts (created before they got linted via the pre-commit hook) had format / organize-imports diffs that pnpm lint caught on CI. Pure formatting — no behavior change.
P1 (edge-worker, AgentChatSessionHandler): accept ANTHROPIC_AUTH_TOKEN in Slack chat credential detection. The handler previously read only CLAUDE_CODE_OAUTH_TOKEN and ANTHROPIC_API_KEY, so deployments that auth Claude via an Anthropic-compatible proxy/gateway (using ANTHROPIC_AUTH_TOKEN) hit the "not configured" reply path even though the legacy claude-runner accepts that env var (see claude-runner/src/session-env.ts AUTH_ENV_KEYS). Extend the ClaudeCredential discriminated union with a third variant, scan that env var third in the documented precedence order, and forward only the matching env var to the harness so the three auth modes don't get conflated. Update the "not configured" error message to mention all three options. P2 (agent-runtime, RuntimeAgentSession.materializePlugins): merge Claude MCP server configs across plugins. Claude's `--mcp-config` flag takes a single scalar path, so the previous "last writer wins" overwrite of claudeMcpConfigPath silently dropped every plugin's MCP servers except the last (and `--strict-mcp-config` made that fatal for tool calls into the dropped servers). Accumulate every plugin's mcpServers map and, after the loop, write one combined .mcp.combined.json at the plugins root and point `--mcp-config` at that. Per-plugin .mcp.json files are still written by the materializer because they're part of the documented Claude plugin layout that `--plugin-dir` consumers expect; the combined file is purely the handoff target for `--mcp-config`. Caller-supplied plugin order determines precedence on duplicate server names (later wins), which the new test locks in. Tests: - edge-worker: 7 new tests covering credential detection precedence, trim-whitespace, and per-kind env-var forwarding (oauth / apiKey / authToken) into the Daytona session config. - agent-runtime: updated the existing single-plugin assertion to expect `--mcp-config .mcp.combined.json`; added two new tests — one proving three plugins all reach the harness via one merged config, one proving caller plugin order picks the winning shadow on duplicate server names. Verified: pnpm -F cyrus-agent-runtime test:run (59 tests, was 57); pnpm -F cyrus-edge-worker test:run (631 tests, was 624); pnpm typecheck + pnpm lint clean.
|
Sample usage of 1. Minimal local invocation — packages/agent-runtime/test/runtime.test.ts:54-69import { createAgentSession } from "cyrus-agent-runtime";
const session = await createAgentSession({
sessionId: "session-1",
harness: "codex", // shorthand for { kind: "codex" }
env: { NODE_ENV: "test" },
secrets: { API_KEY: "secret" }, // string shorthand → { value, redact: true }
});
await session.addMessage("queued");
const result = await session.run("Do it");
// result.events / result.result / result.harnessSessionId / result.destroy()Default 2. Production Daytona Claude invocation — packages/edge-worker/src/AgentChatSessionHandler.ts:485-672const sessionConfig = {
sessionId,
harness: { kind: "claude", command: this.daytonaClaudeCliPath },
systemPrompt,
// Discriminated on credential.kind so the three Claude auth modes
// don't get conflated — only one of these env vars ships through.
secrets: credential.kind === "oauth"
? { CLAUDE_CODE_OAUTH_TOKEN: credential.token }
: credential.kind === "apiKey"
? { ANTHROPIC_API_KEY: credential.token }
: { ANTHROPIC_AUTH_TOKEN: credential.token },
permissions: { mode: "bypass" }, // sandbox is the isolation boundary
packages: { commands: [...this.daytonaSetupCommands] },
// Chat-session MCP servers wrapped into one anonymous plugin; the
// materializer fans this out to Claude's native .mcp.json + writes a
// session-level .mcp.combined.json that --mcp-config points at.
plugins: [{ name: "chat", mcpServers: toRuntimeMcpServers(mcpServers) }],
sandbox: {
provider: "daytona",
name: `cyrus-slack-${sessionId}`,
workingDirectory: this.daytonaWorkingDir,
timeoutMs: 300_000,
destroyWhileInactive: true, // stop()/start() between turns
snapshot: this.daytonaSnapshot, // pre-installed harness binaries
metadata: { purpose: "cyrus-slack-chat", threadKey },
},
};
// Explicit <"claude"> threads SDKMessage typing through to
// session.events / result.events with no cast at consumer sites.
const session = await createAgentSession<"claude">(sessionConfig, {
callbacks: {
onTranscriptEvent: (te) => {
// te.raw is typed `SDKMessage` here.
logger.debug(`[${sessionId}] transcript event: ${te.kind}`);
},
},
});
const result = await session.run(userPrompt); // first turn
// ...later, same `session` object:
const result2 = await session.run(followUpPrompt); // resumes via Claude --continue3. Multi-turn resume across brand-new sandboxes —
|
Supersedes #1220 — same foundational agent-runtime work, plus everything built on top since.
Summary
cyrus-agent-runtimepackage: unifiedAgentSessionacross harnesses (Claude / Codex / Cursor / Gemini / OpenCode) and sandbox providers (local + ComputeSDK / Daytona), with livestreamCommand, line-buffered transcript emission, opt-ininteractiveInput, and a decoupledstop()(cancel-run-only) /destroy()(sole sandbox-release path) lifecycle.@cyrus-ai/cursor-runnerpackage (published): thin CLI wrapper around@cursor/sdkthat emitsSDKMessageJSONL. Lets the cursor harness adapter consume a typed, version-pinned wire format owned by us.RuntimePluginbundles MCP servers + hooks + skills. Per-harness materializers translate one declaration into Claude / Cursor / Codex native filesystem state inside the sandbox; the bundled MCP-config path replaces the old standalonemcpsfield.session.run(prompt)callable repeatedly with per-session state backing surviving across calls; first turn materializes files/folders/repos and runs setup, later turns resume with the harness's native flag (Claude--continue, etc.). Caller-driven cross-binding resume viaresumeHarnessSessionId↔harnessSessionIdround-trip (--resume <id>for Claude,--agent-id <id>for Cursor).RuntimeVolumeConfigwith provider-drivenkind: \"bind\" | \"fuse\" | \"provider\", plussubpathfor per-binding isolation when many sessions share one provider volume.sandbox.persistentState: { volume, bindingId }: consumer-facing abstraction that hides the per-harness state-env-var math. Runtime mounts the caller's volume at a fixed internal path withbindingIdas subpath, then calls each adapter's newbuildStateEnv(mountPath)hook to inject the right env vars. Mappings verified upstream — not guessed:CLAUDE_CONFIG_DIR = ${m}/.claudeCURSOR_DATA_DIR = ${m}/.cursorCODEX_HOME = ${m}/.codex(codex-rs/utils/home-dir/src/lib.rs)GEMINI_CLI_HOME = ${m}(CLI appends.geminiitself;@google/gemini-cli-core::homedir())XDG_*_HOMEdirs under${m}/.opencode-xdg/{config,data,state,cache}(no app-specific override exists)AgentSession<H>generic threads the right SDK union intoevent.raw—harness: \"claude\"readsSDKMessagewith no cast, etc. Addedtranscript()snapshot accessor for cross-turn replay.destroyWhileInactive: optional flag that pauses (stops + later resumes) the underlying sandbox betweenrun()calls. For Daytona this maps tosandbox.stop()/sandbox.start()— preserves all on-disk state at a few-second resume cost.harness.commandlets the adapter spawn the snapshot-resident binary directly (Cursor uses this forcursor-runner).RuntimeFolderConfigwithreadwritesync-back,RuntimeRepositoryConfigrunninggit cloneinside the sandbox with optional branch / depth).AgentChatSessionHandler: picks provider fromEdgeConfig.defaultProvider, wires MCP servers via the new plugin shape, forwards Daytona chat sandbox snapshot + custom layout + bypass perms, fixesANTHROPIC_API_KEYvsANTHROPIC_AUTH_TOKENprecedence.Validation
pnpm --filter cyrus-agent-runtime typecheck— cleanpnpm --filter cyrus-agent-runtime test:run— 57 tests passingpnpm --filter cyrus-agent-runtime build— cleanpnpm build+pnpm typecheckacross the monorepo — cleanpnpm audit— cleancreateAgentSession; live local streaming spike; folder materialization tests against real local sandbox withexclude+ sync-back; repository materialization with full and shallow clones + branch checkout;stop()/destroy()decoupling proofs.Follow-ups (not in this PR)
EnvironmentFactoryfor run/environment split — captured as CYPACK-1209.envprecedence on setup commands vs harness (oneenvfield overridingPATHcan break setup) — to be filed.Test plan
cyrus-agent-runtimepublic surface (AgentSession<H>,createAgentSession,RuntimePlugin,sandbox.persistentState)AgentChatSessionHandlerprovider wiring + MCP server forwarding