Skip to content

Sandboxed agent runtime, plugins, Daytona volumes, persistent state, and edge-worker chat integration#1229

Open
Connoropolous wants to merge 42 commits into
mainfrom
claude/agent-runtime-slack-chat-replacement
Open

Sandboxed agent runtime, plugins, Daytona volumes, persistent state, and edge-worker chat integration#1229
Connoropolous wants to merge 42 commits into
mainfrom
claude/agent-runtime-slack-chat-replacement

Conversation

@Connoropolous
Copy link
Copy Markdown
Contributor

Supersedes #1220 — same foundational agent-runtime work, plus everything built on top since.

Summary

  • New cyrus-agent-runtime package: unified AgentSession across harnesses (Claude / Codex / Cursor / Gemini / OpenCode) and sandbox providers (local + ComputeSDK / Daytona), with live streamCommand, line-buffered transcript emission, opt-in interactiveInput, and a decoupled stop() (cancel-run-only) / destroy() (sole sandbox-release path) lifecycle.
  • New @cyrus-ai/cursor-runner package (published): thin CLI wrapper around @cursor/sdk that emits SDKMessage JSONL. Lets the cursor harness adapter consume a typed, version-pinned wire format owned by us.
  • Plugins: RuntimePlugin bundles MCP servers + hooks + skills. Per-harness materializers translate one declaration into Claude / Cursor / Codex native filesystem state inside the sandbox; the bundled MCP-config path replaces the old standalone mcps field.
  • Multi-turn sessions: session.run(prompt) callable repeatedly with per-session state backing surviving across calls; first turn materializes files/folders/repos and runs setup, later turns resume with the harness's native flag (Claude --continue, etc.). Caller-driven cross-binding resume via resumeHarnessSessionIdharnessSessionId round-trip (--resume <id> for Claude, --agent-id <id> for Cursor).
  • Daytona volume mounting: RuntimeVolumeConfig with provider-driven kind: \"bind\" | \"fuse\" | \"provider\", plus subpath for per-binding isolation when many sessions share one provider volume.
  • sandbox.persistentState: { volume, bindingId }: consumer-facing abstraction that hides the per-harness state-env-var math. Runtime mounts the caller's volume at a fixed internal path with bindingId as subpath, then calls each adapter's new buildStateEnv(mountPath) hook to inject the right env vars. Mappings verified upstream — not guessed:
    • claude → CLAUDE_CONFIG_DIR = ${m}/.claude
    • cursor → CURSOR_DATA_DIR = ${m}/.cursor
    • codex → CODEX_HOME = ${m}/.codex (codex-rs/utils/home-dir/src/lib.rs)
    • gemini → GEMINI_CLI_HOME = ${m} (CLI appends .gemini itself; @google/gemini-cli-core::homedir())
    • opencode → all four XDG_*_HOME dirs under ${m}/.opencode-xdg/{config,data,state,cache} (no app-specific override exists)
  • Typed events per harness: AgentSession<H> generic threads the right SDK union into event.rawharness: \"claude\" reads SDKMessage with no cast, etc. Added transcript() snapshot accessor for cross-turn replay.
  • destroyWhileInactive: optional flag that pauses (stops + later resumes) the underlying sandbox between run() calls. For Daytona this maps to sandbox.stop() / sandbox.start() — preserves all on-disk state at a few-second resume cost.
  • Daytona base snapshot path: pre-installed harness binaries inside the snapshot; harness.command lets the adapter spawn the snapshot-resident binary directly (Cursor uses this for cursor-runner).
  • Folders and repositories as first-class session config, separate from volumes (RuntimeFolderConfig with readwrite sync-back, RuntimeRepositoryConfig running git clone inside the sandbox with optional branch / depth).
  • Edge-worker AgentChatSessionHandler: picks provider from EdgeConfig.defaultProvider, wires MCP servers via the new plugin shape, forwards Daytona chat sandbox snapshot + custom layout + bypass perms, fixes ANTHROPIC_API_KEY vs ANTHROPIC_AUTH_TOKEN precedence.
  • SDK pinning: lockstep-pinned dev-dep SDK versions across all harness adapters so the runtime's typed event unions can't drift from the CLI wire format. Bumped Claude CLI install version pin for Daytona setup.

Validation

  • pnpm --filter cyrus-agent-runtime typecheck — clean
  • pnpm --filter cyrus-agent-runtime test:run — 57 tests passing
  • pnpm --filter cyrus-agent-runtime build — clean
  • pnpm build + pnpm typecheck across the monorepo — clean
  • pnpm audit — clean
  • (Carried forward from Add sandboxed agent runtime package #1220) Real Daytona Claude smoke via createAgentSession; live local streaming spike; folder materialization tests against real local sandbox with exclude + sync-back; repository materialization with full and shallow clones + branch checkout; stop() / destroy() decoupling proofs.

Follow-ups (not in this PR)

  • Hashed EnvironmentFactory for run/environment split — captured as CYPACK-1209.
  • env precedence on setup commands vs harness (one env field overriding PATH can break setup) — to be filed.
  • Codex hooks materializer is deferred — see inline comments citing the upstream blockers.

Test plan

  • CI green
  • Reviewer sanity-check on the new cyrus-agent-runtime public surface (AgentSession<H>, createAgentSession, RuntimePlugin, sandbox.persistentState)
  • Reviewer sanity-check on the edge-worker AgentChatSessionHandler provider wiring + MCP server forwarding

Connoropolous and others added 30 commits May 15, 2026 09:30
…dboxes

Add an optional `streamCommand(command, options)` capability to
`RunnerSandbox`, with `onStdout` / `onStderr` chunk callbacks, an
`AbortSignal` for cancellation, and an `AsyncIterable<string> input`
option for live stdin. Local provider implements it via
`child_process.spawn`; Daytona is reached through a pluggable
`NativeStreamAdapter` registry that unwraps ComputeSDK's
`ProviderSandbox.getInstance()` to the native `@daytonaio/sdk` Sandbox
and uses async sessions + `getSessionCommandLogs(onStdout, onStderr)`.

`RuntimeAgentSession.start()` now prefers `streamCommand` when
`capabilities.streamingProcess` is true, line-buffers chunks across
packet boundaries, and emits `TranscriptEvent`s as the harness CLI
produces them. New `interactiveInput` opt-in routes `addMessage()`
into the running process's stdin (default off — most one-shot CLIs
block on a piped-but-never-closed stdin).

Verified end-to-end:
- local `spawn`: chunks land at the exact 400ms cadence the child emits
- real `codex exec` via `createAgentSession`: events emitted ~8.6s
  before turn end
- real Daytona Claude `stream-json`: system event landed 1.7s before
  result event over a remote sandbox
…config

Add two materialization concepts to `CreateAgentSessionConfig`, deliberately
distinct from the existing `volumes` (provider-attached persistent storage):

- `RuntimeFolderConfig` — exposes a host filesystem folder inside the
  sandbox. Walks the host tree and uploads each file via
  `SandboxFilesystem.writeFile`. Supports `exclude` globs. With
  `access: "readwrite"` the runtime syncs sandbox edits and any
  newly-created files back to the host folder after the harness
  command completes.
- `RuntimeRepositoryConfig` — runs `git clone` inside the sandbox at
  `mountPath` with optional `branch` checkout and `depth` shallow-clone.
  Local-path sources are rewritten to `file://...` to preserve git
  semantics. Shallow clones with a branch use `--branch` on the clone
  itself, since `git checkout` of a non-default branch fails after a
  shallow clone.

Both emit lifecycle transcript events (`folder.materialize.*`,
`folder.syncback.*`, `repository.materialize.*`) and run after files
but before package setup commands, so setup steps that depend on the
cloned tree or the mounted folder see them ready.

27 tests pass (5 new): one materializer unit test per concept and
one runtime-level integration test verifying that the session wires
each through to the right sandbox calls and emits the right events.
Equates to ComputeSDK's ProviderSandbox.destroy() for ComputeSDK-backed
providers (deletes the remote sandbox, releases compute resources) and
is a no-op for the local provider. Lets a caller hold only the result
object, consume events/result, then tear down without keeping a
reference to the session.

Idempotent — backed by a one-shot destroy promise on the session that
both `AgentSession.stop()` and `AgentSessionResult.destroy()` share, so
callers can call either or both in any order without double-destroying
the underlying ComputeSDK / local sandbox.

Verified with a new test that asserts:
  - the returned result exposes destroy()
  - calling result.destroy() invokes sandbox.destroy() exactly once
  - calling result.destroy() twice is a no-op the second time
  - calling session.stop() after result.destroy() does not double-destroy
`stop()` and `destroy()` were doing two unrelated things bundled into
one method. Split them.

`stop()` now cancels the in-flight run only — aborts the harness
process, closes the live event stream, closes the input pipe — and
leaves the sandbox alive. This enables future workflows that reuse a
warm sandbox across runs (per CYPACK-1209): a single run's `stop()`
no longer destroys shared compute.

`destroy()` is the sole sandbox-release path. It exists symmetrically
on both `AgentSession` and `AgentSessionResult` (sharing a one-shot
internal teardown promise). `AgentSession.destroy()` also implicitly
cancels an in-flight run via `stop()` before releasing the sandbox,
so callers don't need a two-step.

Pre-1.0 package, clean break — no consumers to migrate.
…ndler

Brutal spike — wire the Slack chat session lifecycle through
cyrus-agent-runtime's createAgentSession instead of the legacy
IAgentRunner + AgentSessionManager + RunnerConfigBuilder stack.

Removed:
- packages/edge-worker/src/ChatSessionHandler.ts (515 lines)
- packages/edge-worker/test/chat-sessions.test.ts
- EdgeWorker.getDefaultModelForRunner / getDefaultFallbackModelForRunner
  (only used by the deleted chat-session createRunner callback)
- EdgeWorker.getChatThreadLastReply stub now returns null (F1 tests
  that depended on the runner's getMessages() need a new approach)

Added:
- packages/edge-worker/src/AgentChatSessionHandler.ts — ~280-line
  replacement that drives createAgentSession per Slack mention,
  posts the harness-extracted result back to Slack, destroys the
  sandbox in a finally.

Modified:
- SlackChatAdapter.postReply(event, runner: IAgentRunner) →
  postReply(event, finalText: string). Decouples the adapter from
  the runner machinery.
- EdgeWorker wiring: drops runnerConfigBuilder/createRunner/
  onStateChange/onClaudeError from chat-session deps, replaces
  shutdown's getAllRunners() with chatSessionHandler.shutdown().
- package.json: adds cyrus-agent-runtime workspace dep.

Brutal cuts (documented in the new file's header):
- No multi-turn --continue resume — each Slack mention is a fresh
  AgentSession. Conversation continuity comes from the adapter's
  fetchThreadContext() injecting prior thread messages as text.
- No mid-flight stream injection — busy threads get notifyBusy.
- No MCPs — agent-runtime doesn't yet wire mcps through to the
  harness CLI, and the in-process cyrus-tools server wouldn't
  translate across the subprocess boundary anyway. Slack chat
  sessions run with the Claude CLI default toolset only.
- Claude harness only — no runner selection.
- No persisted session state across restarts.

Validation:
- pnpm typecheck (clean across monorepo)
- pnpm test:packages:run (601 edge-worker tests, 114 claude-runner,
  198 gemini-runner, 62 slack-event-transport, etc. — all green)
You asked for the Slack chat replacement to use the Daytona+Claude flow
we just validated end-to-end; the first cut accidentally used the local
sandbox. Switching to Daytona:

- Each Slack mention spawns a fresh Daytona sandbox at /home/daytona
  (timeout 5min, name `cyrus-slack-<sessionId>`, metadata tagged
  `purpose: cyrus-slack-chat` for visibility in Daytona's console).
- Setup commands install @anthropic-ai/claude-code with a user-local
  npm prefix and verify the version — same script that worked in the
  streaming-spike daytona-runtime probe.
- Harness command is the full path `/home/daytona/.npm-global/bin/claude`
  (no PATH override at the env level — that broke npm in the earlier
  spike).
- Secrets carry CLAUDE_CODE_OAUTH_TOKEN and ANTHROPIC_AUTH_TOKEN from
  the EdgeWorker process env into the sandbox.
- Compute SDK is configured once per process via a module-level guard.
- Refuses to construct without DAYTONA_API_KEY in env.
- Posts a clear failure message if the Claude token is missing.

Drops the unused createWorkspace helper and cyrusHome/
chatRepositoryProvider deps — Daytona owns its own working dir and
the adapter holds its own ChatRepositoryProvider.

Adds @computesdk/daytona as a direct edge-worker dep (was only
transitively available before).

Validation:
- pnpm typecheck (clean across monorepo)
- pnpm --filter cyrus-edge-worker test:run (601 tests)
The @daytonaio/sdk package compiles with TypeScript's `importHelpers`
option but doesn't declare `tslib` as a runtime dependency. Without
this, `import "tslib"` from the SDK fails at runtime with
ERR_MODULE_NOT_FOUND when the SDK is loaded through @computesdk/daytona
inside cyrus-edge-worker / cyrus-agent-runtime.

`pnpm.packageExtensions` patches the upstream package.json at install
time so pnpm installs tslib alongside @daytonaio/sdk, making the ESM
resolver's bare-import lookup succeed.

Verified: `import("@computesdk/daytona")` followed by an actual
provider instantiation no longer hits ERR_MODULE_NOT_FOUND on tslib.

Upstream fix is needed in @daytonaio/sdk itself; this workaround can
be removed once they ship a version with tslib in dependencies.
The previous fix (pnpm.packageExtensions) added the tslib symlink under
@daytonaio/sdk's isolated node_modules but did NOT rewrite the on-disk
package.json. Node.js's standard ESM resolver doesn't care, but
import-in-the-middle (used by @opentelemetry instrumentation) hooks the
resolve step and validates against the importer's declared deps —
because tslib wasn't in @daytonaio/sdk's package.json, the hook
rejected the bare specifier even though the symlink was present.

Use pnpm.patchedDependencies instead: writes a real patch file under
patches/ that adds `"tslib": "^2"` to @daytonaio/sdk's dependencies
on disk. The install directory hash now includes a patch_hash suffix
(visible in error paths if it ever fails again), so it's easy to tell
whether the patch applied at all.

Keeps the packageExtensions entry too as belt-and-suspenders.

To pick this up on a running host:

  git pull
  rm -rf node_modules           # force re-link
  pnpm install
  pnpm build

The rm -rf is the key step — pnpm sees the lockfile as up to date if
the high-level dep graph hasn't changed and skips re-linking, which
left the previous fix's tslib symlink in place but unused.

Upstream fix is needed in @daytonaio/sdk; remove patches/ when they
ship a version with tslib in dependencies.
Restructure the agent-runtime public API so an AgentSession is a
long-lived handle that can be run multiple times against the same
sandbox, with per-session state backing that makes the next run
automatically resume the prior conversation.

API changes (breaking, pre-1.0):

- AgentSession.start() → AgentSession.run(userPrompt). Each call is one
  turn. First call materializes files/folders/repos and runs setup;
  subsequent calls skip all that and invoke the harness with its
  continue flag.
- CreateAgentSessionConfig drops `userPrompt` (now passed per-turn to
  run()) and gains `agentSessionsRoot?: string` (default
  `~/.cyrus-agent-sessions/`).
- HarnessAdapter gains `stateDirectories: readonly string[]` declaring
  the relative paths under HOME where the harness keeps its session
  state — Claude `.claude`, Codex `.codex`, Gemini `.gemini`.
- HarnessAdapter.buildCommand now takes a second `HarnessRunOptions`
  argument with `userPrompt` and `continueSession: boolean`. Adapters
  map `continueSession` to their CLI's resume flag (Claude `--continue`)
  and suppress system-prompt injection on continuation.

Runtime mechanics:

- RuntimeAgentSession provisions `~/.cyrus-agent-sessions/<sessionId>/`
  per session and sets HOME to that dir on every harness invocation.
  Per-session HOME means concurrent local sessions don't trample each
  other's `.claude/projects/...jsonl` state, and resume Just Works
  because the .claude directory is naturally persistent between turns.
- For Daytona, same mechanism — HOME inside the sandbox is the
  per-session backing path, and since the sandbox stays warm between
  run() calls, .claude/ survives there too.
- session.stop() now cancels only the in-flight run (per-run abort
  controller) and does NOT destroy the sandbox. session.destroy() is
  the sole sandbox-release path and also runs folder syncback.
- folder syncback moved from end-of-run to session.destroy() — same
  rationale as stop/destroy split.

Tests: 30 passing (was 29). New test verifies multi-turn run() with
the second turn passing --continue and skipping setup.

Chat handler update (Slack):

AgentChatSessionHandler rewritten for the warm-thread pattern:
- threadSessions: Map<threadKey, { session, lastActivityAt, inFlight }>
- First mention: createAgentSession (Daytona, Claude, install setup);
  state kept warm.
- Subsequent mentions on same thread: session.run() reuses the warm
  sandbox via --continue. Setup commands don't re-run.
- Concurrent mention while a run is in-flight: notifyBusy (no stdin
  injection yet).
- Idle TTL (default 15min): periodic sweep destroys idle threads.
- Run failure: destroy + free slot so next mention is a clean start.
- Shutdown: clear sweep timer + destroy all warm sessions.

Validation:
- pnpm typecheck (clean across monorepo)
- pnpm --filter cyrus-agent-runtime test:run (30 tests)
- pnpm --filter cyrus-edge-worker test:run (601 tests)
Adds a sandbox.destroyWhileInactive flag to CreateAgentSessionConfig
that pauses (Daytona: sandbox.stop()) the underlying sandbox after every
session.run() returns and resumes it (sandbox.start()) before the next
run. State on disk inside the sandbox (including ~/.claude/) is
preserved by Daytona during stop, so the next turn's `--continue` finds
the prior conversation intact at much lower cost than a from-scratch
recreate.

For the local sandbox the flag is a no-op (local sessions are always
free). For Daytona it surfaces as new transcript events:
sandbox.pause.started/completed/failed and
sandbox.resume.started/completed/skipped.

AgentChatSessionHandler turns the flag on so Slack chat threads stop
billing compute between mentions.

Three real proofs added under test-scripts/, all validated against
real Daytona+Claude:

  resume-proof.mjs local                  — local sandbox, multi-turn
  resume-proof.mjs daytona-warm           — Daytona warm, multi-turn
  resume-proof.mjs daytona-efficient      — Daytona pause/resume, multi-turn
  slack-handler-proof.mjs                 — full AgentChatSessionHandler
                                            flow with two mentions
                                            (pause between)

Each proof gives Claude a code word in turn 1 and verifies the
turn-2 reply repeats it back, proving --continue actually preserved
the conversation across the lifecycle event being tested.

Recorded results:
  - local:                 turn1 4.1s "noted"     turn2 4.4s "BANANA-7"
  - daytona-warm:          turn1 19.8s "noted"    turn2 6.7s "BANANA-7"
  - daytona-efficient:     turn1 17.9s "noted"    turn2 8.4s "BANANA-7"
  - slack-handler-proof:   m1 19.9s "noted"       m2 8.3s "BANANA-7"

Other fixes in this commit:
- session.ts no longer overrides HOME for any provider. The earlier
  override broke Claude auth locally (empty ~/.claude/) and silently
  produced no stream-json events on Daytona (HOME pointed at a host
  path that didn't exist inside the remote sandbox). The Daytona
  sandbox preserves state across stop/start so its natural
  /home/daytona HOME is the right answer for both warm and
  destroyWhileInactive modes.
- session.ts can now resume a paused sandbox at destroy() time so
  syncFoldersBack still has a live sandbox to read from.

Validation:
- pnpm typecheck (clean across monorepo)
- pnpm --filter cyrus-agent-runtime test:run (30 tests)
- pnpm --filter cyrus-edge-worker test:run (601 tests)
- Four real proofs above, all PASSED
… hooks + skills)

Introduces a provider-agnostic RuntimePlugin shape and per-harness
materializers that translate ONE declaration into Claude-, Cursor-,
or Codex-native filesystem state (or CLI flags). Each materializer
was developed against a real CLI smoke test before any code landed.

### Public surface

CreateAgentSessionConfig grows `plugins: PluginInput[]` where
PluginInput is either an inline RuntimePlugin or `{ rootPath: string }`
(rootPath resolution is stubbed for v1 — inline only is fully
implemented). The shape:

  interface RuntimePlugin {
    name: string;
    version?: string;
    description?: string;
    mcpServers?: Record<string, McpServerRuntimeConfig>;
    hooks?: PluginHook[];
    skills?: PluginSkill[];
  }

Hook events are a universal subset: PreToolUse, PostToolUse,
SessionStart, Stop, UserPromptSubmit. Each materializer maps these
to harness-native names and silently drops events that don't
translate.

### Per-harness materialization

Claude — materializePluginForClaude writes:
  <workingDirectory>/.cyrus-plugins/<name>/
    .claude-plugin/plugin.json
    .mcp.json                              (when mcpServers present)
    hooks/hooks.json                       (when hooks present)
    skills/<skillName>/SKILL.md            (+ optional assets)
The Claude harness adapter appends `--plugin-dir <pluginDir>` plus
`--mcp-config <path> --strict-mcp-config` to the `claude -p`
invocation.

Cursor — materializePluginForCursor writes:
  <workspaceRoot>/.cursor/
    mcp.json                               (merged across plugins)
    hooks.json                             (merged across plugins)
    skills/<skillName>/SKILL.md            (+ optional assets)
The Cursor adapter appends `--approve-mcps` when any plugin
declared MCP servers (otherwise headless cursor-agent silently
drops them).

Codex — materializePluginForCodex:
  - Writes skills to `$HOME/.agents/skills/<name>/SKILL.md` plus
    `agents/openai.yaml` for the OpenAI runtime. Codex skill
    discovery is rooted at $HOME/.agents/skills/ (verified
    empirically — NOT $CODEX_HOME/skills/ as the docs suggested).
  - Returns MCP servers as inline `-c 'mcp_servers.<name>={...}'`
    TOML overrides on the CLI — no file write.
  - The session env-merges `HOME = <session-state-dir>` for codex
    runs so the materialized skills are isolated.
  - Hooks deferred for v1 (Codex hooks schema is version-pinned).

### Plugin lifecycle

In RuntimeAgentSession.run() first-turn materialization order:
  files → folders → repositories → plugins → setup → harness

Materializer outputs are persisted on the session so subsequent
turns re-pass the same CLI flags via HarnessRunOptions.pluginOutputs.

New transcript events: plugin.materialize.{started,completed,
skipped,failed}.

### Validation

Three CLI smoke tests, each writing a minimal plugin tree and
verifying the real CLI loads it:
  - Claude: --plugin-dir loads .claude-plugin/plugin.json + SKILL.md;
    skill triggered, response = HELLO-FROM-PLUGIN.
  - Cursor: .cursor/skills/<n>/SKILL.md auto-discovered; skill
    triggered, response = HELLO-FROM-CURSOR-SKILL.
  - Codex: $HOME/.agents/skills/<n>/SKILL.md discovered with HOME
    override; response = HELLO-FROM-CODEX-HOMEAGENTS. `-c
    mcp_servers.<name>={...}` confirmed routing through to codex's
    MCP runtime.

End-to-end plugin-proof.mjs against real Daytona + Claude: cold
sandbox + Claude install + plugin materialization + skill-triggered
run, response = "HELLO-FROM-PLUGIN" in 13.9s.

Unit test added to runtime.test.ts (31 tests total) asserting the
on-disk plugin tree shape AND the harness command-line flags.

Surprises corrected vs. the original matrix:
  - Cursor DOES have first-class SKILL.md (not just rules).
  - Codex skills are at $HOME/.agents/skills/ not $CODEX_HOME/skills/.
  - Codex MCP can be passed entirely via -c flags, no file write.
After learning tests against codex 0.130.0, the hook engine exists for
all documented events but `codex exec` filters every newly-discovered
hook through a trust gate (see hooks/src/engine/discovery.rs). Trust
comes from a TUI `/hooks` review step that exec mode has no access
to, and the `bypass_hook_trust` field is hidden + doesn't fire hooks
when set via `-c bypass_hook_trust=true` in our tests.

Document the investigation in the codex materializer so future
revisits don't repeat the rabbit hole. Skills + MCP servers continue
to materialize correctly; only `plugin.hooks` is silently dropped.
The standalone `mcps` field was a back-compat carryover from the
pre-plugin design. It hadn't been wired through the runtime since the
RuntimePlugin abstraction landed, so callers got silent no-ops if they
used it. Remove it from both the TypeScript surface and the zod schema
so the API is honest about plugins being the only path to MCP servers.

A plugin with `mcpServers` populated and `hooks`/`skills` omitted is
the standard "MCP-only" carrier — the materializer fans it out into
each harness's native shape (Claude plugin tree, .cursor/mcp.json,
codex `-c mcp_servers.*` overrides).
…t in 0.130.0

Earlier comment implied bypass_hook_trust existed but was "not plumbed
correctly." Re-checked the installed binary directly: `strings` on
codex 0.130.0 has zero occurrences of bypass_hook_trust /
bypass-hook-trust / bypassHookTrust. The field exists on the codex
`main` branch but was added after 0.130.0. So `--bypass-hook-trust`
genuinely doesn't exist as a CLI flag, and `-c bypass_hook_trust=true`
is a silent no-op because nothing reads that key in this release.

The conclusion (defer codex hooks until trust-bypass exists or codex
pre-trusts plugin-bundled hooks) is unchanged.
…alization

Rewrote the codex hooks deferral comment to point at the two open
upstream issues that block our use case end-to-end:

1. openai/codex#21639 — direct config-layer hooks (hooks.json,
   [[hooks.X]] in config.toml) stopped firing in 0.129.0+ versus the
   working 0.128.0-alpha.1 baseline. Independently confirmed by ≥5
   users across multiple releases; we reproduced on 0.131.0 with
   every combination of feature flags and bypass-trust.

2. openai/codex#16430 — plugin manifest `hooks` field silently
   dropped by the manifest parser; discovery walker never scans the
   installed-plugin tree. Confirmed via `codex plugin list` showing
   "(installed, enabled)" while plugin-bundled hooks never register,
   regardless of [features].plugin_hooks state.

The 0.131.0 --dangerously-bypass-hook-trust flag is real but doesn't
help: #21639 prevents discovery from finding any hook to bypass-
trust, and #16430 prevents the plugin manifest from contributing any
hook to discovery in the first place.

Revisit conditions and a fallback materialization plan (write a
session-local hooks.json under per-session CODEX_HOME if #21639
closes ahead of #16430) are now documented inline.
Introduces a new optional `defaultProvider` field on EdgeConfig backed
by a ProviderTypeSchema enum that currently accepts "local" or
"daytona". This lets users configure the default sandbox provider
for sessions without needing to pass it explicitly per call.

Additional provider backends (other ComputeSDK targets) can be added
to the enum as they're wired through the runtime.

- New ProviderTypeSchema / ProviderType exports from core
- Re-exported from config-types.ts so consumers get them via the
  standard core public surface
- Field placed next to defaultRunner for discoverability — both are
  "default" settings for runtime selection
- Regenerated JSON schema artifacts under packages/core/schemas/
…nfig.defaultProvider

The chat handler is no longer hardwired to Daytona. It now accepts a
`provider: ProviderType` dependency (defaulting to `"local"`) and
branches its session-config build accordingly:

- **local**: harness runs on the host (`claude` from `PATH` or via the
  optional `claudeCliPath` override); no `DAYTONA_API_KEY` needed; the
  runtime gives each session its own HOME under
  `~/.cyrus-agent-sessions/<id>/` so `.claude/` is isolated and
  resumable across `--continue` turns.

- **daytona**: existing behavior — fresh sandbox seeded via
  `npm install -g @anthropic-ai/claude-code`, paused between turns via
  `destroyWhileInactive`, destroyed on idle TTL eviction.

EdgeWorker passes `this.config.defaultProvider` through when wiring up
the Slack handler, so the value flows from `~/.cyrus/config.json` ->
`EdgeConfig.defaultProvider` -> `AgentChatSessionHandler`.

Also: re-exported `ProviderType` / `ProviderTypeSchema` from
`cyrus-core`'s public index so consumers can reach them without
reaching into `config-types.js`.

Test coverage: 4 new tests in
`packages/edge-worker/test/AgentChatSessionHandler.provider.test.ts`
covering local default, explicit local, daytona without key (throws),
daytona with key (constructs cleanly). Full edge-worker suite passes
(605/605).
…ever forward both

CLAUDE_CODE_OAUTH_TOKEN and ANTHROPIC_API_KEY are distinct auth modes
in Claude Code with different billing semantics — OAuth runs against
a Claude Code subscription, API key runs against direct Anthropic API
access. They are not aliases for the same credential. The handler had
been:

- reading `ANTHROPIC_AUTH_TOKEN` (not a real env var Claude Code
  looks at)
- forwarding the same value as BOTH `CLAUDE_CODE_OAUTH_TOKEN` and
  `ANTHROPIC_AUTH_TOKEN`, which conflated the two auth modes

Replaced with a discriminated `ClaudeCredential` union and a
`readClaudeCredential()` helper:

- OAuth takes precedence: if `CLAUDE_CODE_OAUTH_TOKEN` is set, the
  handler uses subscription auth.
- Otherwise falls back to `ANTHROPIC_API_KEY`.
- Forwards exactly the env var that was set; never both.
- Error message and class docstring updated to name both variables
  and explain the distinction.
Chat sessions now run with the same workspace-level MCP servers
(Linear, cyrus-tools, cyrus-docs, optional Slack) that repo-bound
sessions get. Previously the system prompt told Claude those servers
existed while the runner ran with the default toolset only — a
docstring-vs-reality mismatch the spike comment acknowledged.

agent-runtime side: widen `McpServerRuntimeConfig` to accept the full
SDK schema (`type: "http" | "sse" | "stdio"`, plus a permissive index
signature for `tools`, `alwaysLoad`, …) and switch the zod schema to
`.passthrough()` so SDK-shaped entries flow through to the materializer
verbatim instead of being silently stripped.

edge-worker side:
- `AgentChatSessionHandlerDeps` is now generic over `TEvent` and
  carries a new optional `buildMcpServers(event)` callback.
- The handler invokes it once per thread on first session creation
  (warm threads reuse the existing session as before), then wraps the
  result into a single anonymous `RuntimePlugin` named "chat" via a
  small `toRuntimeMcpServers` adapter that drops SDK-instance entries
  (those can't cross the runtime's subprocess boundary).
- `EdgeWorker.buildChatMcpServers(event)` delegates to the existing
  `McpConfigService.buildMcpConfig` with a synthetic `chat-<teamId>`
  repoId and the first configured Linear workspace, so cyrus-tools
  context wiring stays uniform with repo-session paths.

Class docstring updated: "No MCP servers" caveat removed; replaced by
a section that describes the new `buildMcpServers` plumbing,
supported transports, and the SDK-instance limitation.

Tests: monorepo typecheck clean; all 605 edge-worker tests pass; all
31 agent-runtime tests pass (no schema regressions).
The pi harness adapter wrapped a binary named `pi` that has no
traceable upstream (no npm package, no GitHub repo we could attribute
it to, no install command in our docs). It's been in the codebase
since the runtime's first commit but was never wired through to any
runner test scope and we have no SDK or schema to type its
stream-json output against.

Removing rather than carrying it as `raw: unknown` forever:
- delete `harnesses/pi.ts`
- drop `"pi"` from `HarnessKind` union (types.ts)
- drop `"pi"` from `HarnessKindSchema` enum (schemas.ts)
- drop `piHarness` import + re-export + `harnessAdapters` record
  entry (harnesses/index.ts)
- update the supported-kinds assertion in harnesses.test.ts

If pi comes back as a first-class target (with attribution and a
binary we can install), it goes back in cleanly — the adapter shape
is simple enough that re-adding takes minutes.

31/31 agent-runtime tests pass; monorepo typecheck clean.
…eric

The runtime now propagates harness-kind through to event typing. Saying
`createAgentSession({ harness: "claude", … })` yields an
`AgentSession<"claude">` whose `events` stream is
`AsyncIterable<TranscriptEvent<SDKMessage>>` — `event.raw` narrows to
the upstream SDK's union with no cast required.

New types in `src/types.ts`:
- `HarnessRawByKind` — lookup type from harness kind to its SDK event
  union. Empirically verified against each CLI's stdout (PR notes).
- `OpenCodeStreamEvent` — local envelope for opencode's JSONL output
  (the SDK's `Event` union describes a different surface; we type only
  the inner `part: Part`).
- `cursor` deliberately stays `unknown` — `@cursor/sdk`'s `SDKMessage`
  describes a different surface than `cursor-agent`'s stream-json. The
  follow-up plan is to vendor a small driver that wraps `@cursor/sdk`
  directly, at which point cursor's row becomes
  `import("@cursor/sdk").SDKMessage`.

Generic propagation:
- `TranscriptEvent<TRaw = unknown>` — `raw` is now `TRaw`, defaults to
  `unknown` for back-compat.
- `AgentSession<H extends HarnessKind = HarnessKind>` — `events`,
  `run()`, etc. carry `H` through.
- `AgentSessionResult<H>` and `RuntimeCallbacks<H>` follow.
- `createAgentSession<H>(config: CreateAgentSessionConfigFor<H>)`
  infers H from `config.harness`. New helper type
  `CreateAgentSessionConfigFor<H>` narrows the `harness` field.

Internal `RuntimeAgentSession` stays non-generic (operates on the
loose union); the public factory casts at the boundary.

Existing consumers reading `event.raw as unknown` continue to compile
unchanged — `AgentSession` defaults to `AgentSession<HarnessKind>`
which keeps the current weak typing.

Also fixes a long-standing opencode adapter bug: `--output-format json`
→ `--format json`, the actual CLI flag per `opencode run --help`. The
old flag would have failed at runtime on first invocation.

Adds 4 type-only devDependencies under @Anthropic-AI, @openai,
@google, @opencode-ai — never bundled, never imported at runtime.

Tests: monorepo typecheck clean; 34/34 agent-runtime tests pass
(adds 3 compile-time type-narrowing assertions); 605/605 edge-worker
tests pass.
Two related changes — first a new API addition, then the first
consumer that benefits from the per-harness typing we landed last
commit.

agent-runtime:
- New AgentSession.transcript() — returns a snapshot of every event
  observed on the session so far, in insertion order. Useful for
  cross-turn replay, post-hoc inspection, building a UI timeline
  without consuming the live `events` async iterable, or resuming
  consumption from a known index across reconnects. Returns a fresh
  copy so callers can't mutate the internal buffer.
- Implementation in RuntimeAgentSession is a one-liner over the
  existing `observedEvents` array; that array was already being
  populated for the rolling-result use case.
- Two compile-time tests assert the typing: `transcript()` on
  AgentSession<"claude"> returns readonly TranscriptEvent<SDKMessage>[]
  and AgentSession<"codex"> returns readonly TranscriptEvent<ThreadEvent>[].

edge-worker:
- AgentChatSessionHandler now narrows `state.session` to
  AgentSession<"claude">. The handler creates Claude sessions only
  (per the "Claude harness only" caveat in its docstring), so this
  narrowing surfaces SDKMessage typing throughout the run/result/
  transcript chain.
- extractAssistantFallback's manual cast soup is gone — the function
  now walks `events: readonly TranscriptEvent<SDKMessage>[]`, narrows
  via `e.raw.type === "assistant"` (TS discriminates on the SDK union),
  and iterates `e.raw.message.content` with full BetaContentBlock
  typing. Removes 8 lines of inline type guards.
- buildSessionConfig returns CreateAgentSessionConfigFor<"claude">
  (the H-narrowed variant), and the local-provider branch's
  harness config uses `kind: "claude" as const` so the literal
  flows through. createAgentSession is called with explicit
  <"claude"> type arg.

Also: pinned agent-runtime's @anthropic-ai/claude-agent-sdk to
0.2.123 (exact) to match the rest of the workspace — pnpm add @latest
had grabbed ^0.3.145, which made two copies of the SDK resolve and
broke nominal type unification between SDKMessage references.

Tests: 36/36 agent-runtime (adds 2 transcript-typing assertions);
605/605 edge-worker; monorepo typecheck clean.
…typed

Closes the typing pass for the 5th and last harness. cursor's row in
HarnessRawByKind flips from `unknown` to `@cursor/sdk`'s `SDKMessage`,
matching the other four harnesses.

Why a wrapper. We previously verified that `cursor-agent
--output-format stream-json` emits a different schema than what
`@cursor/sdk` declares: cursor-agent uses `session_id`, `subtype`,
nested `tool_call.shellToolCall.{args,result}`, and a `result` event
that the SDK union doesn't include; the SDK uses `agent_id`, `run_id`,
`status`, top-level `args`/`result`, and no `result` variant. Typing
`raw` as `SDKMessage` while spawning `cursor-agent` would be a lie.

The fix: vendor a tiny driver that uses `@cursor/sdk`'s `Agent.create`
+ `run.stream()` ourselves. Spawn it as `node <driver>` from the
cursor adapter. The bytes on the wire ARE `SDKMessage` by
construction — there's no schema drift to worry about because we own
the producer.

What's new:
- src/harnesses/cursor-driver.ts — Node ESM script that parses argv
  (--prompt, --model, --cwd, --system-prompt, --agent-id,
  --agent-id-file), creates an Agent via `@cursor/sdk`, streams
  SDKMessage events to stdout as JSONL, and exits 0/1/2.
- src/harnesses/cursor.ts — rewritten to spawn `node <driver-path>`
  instead of `cursor-agent`. Driver path resolved via
  `import.meta.url` against `./cursor-driver.js`, sibling in both src
  and dist. The adapter's `extractResult` now walks
  `event.raw as SDKMessage` and narrows via the discriminator — no
  manual guards.
- HarnessRawByKind["cursor"] = SDKMessage from @cursor/sdk.
- @cursor/sdk added as a regular dependency (not devDep) since the
  driver imports it as a value.

Internal cleanup forced by the typing:
- RuntimeAgentSession no longer formally `implements AgentSession`.
  The public interface is generic over H; the internal class works
  with the loose `TranscriptEvent<unknown>` form and the factory in
  runtime.ts casts at the boundary (which it was already doing).
- run()'s `turnEvents` is cast to `AgentSessionResult["events"]` at
  the return site — single boundary, type-safe.
- emitEvent() casts the callback to the loose `TranscriptEvent` form
  at the call site (the public boundary is the factory cast in
  runtime.ts).

End-to-end smoke test against the real Cursor API confirmed the
driver's stdout matches the SDK union exactly: `status`/`tool_call`/
`assistant` variants with `agent_id`+`run_id`, no schema drift. Exit
0, 22 valid JSON lines, zero stderr noise.

Known limitation. The driver's path is on the host, so this works
unmodified for the local provider; Daytona needs the driver
materialized into the sandbox + `@cursor/sdk` installed there. Left
as a TODO in the cursor adapter — chat is Claude-only today so no
existing functionality regresses.

Tests: 36/36 agent-runtime (typed-events asserts cursor now resolves
to CursorSDKMessage; harnesses test updated for the new command
shape); 605/605 edge-worker; monorepo typecheck clean.
WorkerService cherry-picked fields from the loaded EdgeConfig into the
EdgeWorkerConfig but never forwarded defaultProvider, so the value
configured in ~/.cyrus/config.json was silently dropped and chat sessions
always defaulted to the local sandbox provider — even when the operator
had explicitly selected daytona.
…pass perms

Add support for booting Daytona-backed chat sandboxes from a pre-built
snapshot rather than the default base image, configurable via env:

- DAYTONA_SNAPSHOT: pre-built snapshot to seed the sandbox from. When
  set, the npm-install bootstrap is skipped (the snapshot is expected
  to ship Claude Code preinstalled) and the CLI defaults to `claude`
  on PATH.
- DAYTONA_WORKING_DIR: in-sandbox working/home directory (default
  `/home/daytona`). Set this when the snapshot uses a different user
  layout, e.g. `/home/cyrus`.
- DAYTONA_CLAUDE_CLI_PATH: absolute path to the `claude` binary inside
  the sandbox. Defaults to `<workingDir>/.npm-global/bin/claude` when
  no snapshot is set, or `claude` (PATH-resolved) when a snapshot is.

Plumbing:

- Add an optional `snapshot` field to RuntimeSandboxConfig, forwarded
  by ComputeSdkSandboxProvider as `snapshotId` (which the ComputeSDK
  Daytona adapter maps to Daytona's snapshot create param).
- Add the same field to the Zod schema so it survives normalization
  (the schema was silently stripping unknown sandbox keys, which is
  what made the wired-up snapshot value never reach the SDK call).
- Translate Cyrus's cross-harness PermissionMode to Claude's CLI flag
  values in the Claude harness adapter (`"bypass"` -> `"bypassPermissions"`,
  `"ask"` -> `"default"`). Without this, passing `"bypass"` failed at
  the CLI boundary since Claude does not accept that string.
- Default Daytona chat sessions to `permissions: { mode: "bypass" }`
  so the agent can run shell commands inside the sandbox — the
  sandbox itself is the isolation boundary, so per-tool prompts (which
  no user can answer) are noise. Local sessions are unchanged.
…ackage

The driver script that vendors `@cursor/sdk` for the cursor harness
adapter moves out of `cyrus-agent-runtime`'s `src/harnesses/` into its
own publishable package at `packages/cursor-sdk-runner/` with the npm
name `@cyrus/cursor-runner`.

Why a dedicated package:
- Standalone reusable tool. Anyone who wants typed Cursor streaming
  across a process boundary can `npm install -g @cyrus/cursor-runner`
  and spawn it from any language/runtime, not just from cyrus.
- Cleaner dependency surface. agent-runtime no longer carries
  `@cursor/sdk` as a runtime dep — it stays a devDep just for the
  `SDKMessage` type import that backs `HarnessRawByKind["cursor"]`.
  The actual @cursor/sdk install moves into the new package.
- Independent versioning. The driver can iterate against new Cursor
  SDK releases without forcing an agent-runtime release.
- Distinct from the legacy `cyrus-cursor-runner` package (the
  IAgentRunner-style `cursor-agent`-CLI wrapper that the new
  agent-runtime is replacing). Two packages with clearly different
  scopes — no confusion.

Package contents:
- `package.json` — name `@cyrus/cursor-runner`, version synced to
  workspace (0.2.51), `bin: { "cursor-runner": "dist/index.js" }`,
  publishConfig.access public, MIT license inheriting from monorepo.
- `src/index.ts` — same driver logic as before, with a `#!/usr/bin/env
  node` shebang so the bin is executable post-install. Argv contract
  unchanged (--prompt, --model, --cwd, --system-prompt, --agent-id,
  --agent-id-file).
- `README.md` — install instructions, options table, exit codes, and
  the consumer narrowing pattern showing how to import SDKMessage.
- `tsconfig.json` mirrors other workspace runners.

agent-runtime wiring:
- Removed `src/harnesses/cursor-driver.ts` (moved upstream).
- `src/harnesses/cursor.ts` resolves the runner via
  `createRequire(import.meta.url).resolve("@cyrus/cursor-runner")`
  instead of a sibling-file URL. Works for both pnpm workspace
  symlinks (today) and standalone npm installs (when the package is
  published).
- `@cursor/sdk` moved from dependencies to devDependencies (type-only).
- Added `@cyrus/cursor-runner: workspace:*` as a dependency.

End-to-end smoke test against the real Cursor API confirmed via the
new resolved path: the bin spawns, streams 17 SDKMessage JSON lines,
exits 0 with zero stderr noise. Wire format still has agent_id /
run_id / status fields exactly per the `@cursor/sdk` union.

Tests: 36/36 agent-runtime (cursor command shape test updated for the
new path-resolution pattern); 605/605 edge-worker; monorepo
typecheck clean.

Known limitations carried forward unchanged from the previous commit:
- Local provider only — the runner's path resolves to a host node_modules
  location that doesn't exist inside a remote sandbox. Daytona support
  needs the runner installed into the sandbox via setup commands
  (`npm install -g @cyrus/cursor-runner`) once the package is
  published. Slack chat is Claude-only so no current functionality
  regresses.
- Multi-turn resume is wired in the runner (--agent-id-file /
  --agent-id) but not threaded from the cursor adapter yet; same
  TODO as before.
`git add <files>` skipped the deletion in the previous commit because
none of the explicit paths I staged covered it. The driver moved to
`packages/cursor-sdk-runner/src/index.ts` and was already referenced
through the new `@cyrus/cursor-runner` package; the stale source file
just lingered. Removing it now.
…apshot' into claude/agent-runtime-slack-chat-replacement
…ot mode

Mirrors how Claude's adapter handles `DAYTONA_CLAUDE_CLI_PATH`. The
cursor adapter now has two invocation shapes:

- **Default (local provider)**: no `harness.command` set, falls back to
  `createRequire("@cyrus/cursor-runner")` resolution and spawns
  `node <host-resolved-path>`. Same behavior as before for anyone not
  setting a custom command.

- **Override (Daytona snapshot mode)**: `harness.command` is the
  cursor-runner binary inside the sandbox. Spawned directly — the
  runner's `#!/usr/bin/env node` shebang makes it executable, no
  intermediary `node` needed. Callers pass `"cursor-runner"` to use
  the sandbox's PATH (which Daytona snapshots populate with the
  preinstalled bin) or an absolute path to pin a specific copy.

This composes cleanly with the snapshot work: a Daytona snapshot
ships `@cyrus/cursor-runner` preinstalled alongside the harness
binaries, callers set `harness: { kind: "cursor", command: "cursor-runner" }`,
and the adapter doesn't care that it's running in a remote sandbox vs
on the host — same pattern as Claude getting `command: "claude"` for
PATH lookup inside a snapshot.

The chat handler doesn't use cursor today (Claude-only), so no
existing consumer needs updating. When a cursor-on-Daytona consumer
shows up, they wire `harness.command` from the same kind of env
(e.g. `DAYTONA_CURSOR_RUNNER_PATH`) as the chat handler already does
for Claude.

Tests: added a paired test asserting the override shape; previous test
renamed for clarity ("via the host-resolved … when harness.command is
unset" vs "uses harness.command directly … Daytona-snapshot mode").
40/40 agent-runtime tests pass; monorepo typecheck clean.
…rsor-runner

The `@cyrus` npm scope is unclaimed; @Cyrus-AI is the registered org.
Rename across:
- packages/cursor-sdk-runner/package.json (name)
- packages/cursor-sdk-runner/src/index.ts (docstring + usage example)
- packages/cursor-sdk-runner/README.md (title, install command)
- packages/agent-runtime/package.json (dependency entry)
- packages/agent-runtime/src/harnesses/cursor.ts (createRequire target + docs)
- packages/agent-runtime/test/harnesses.test.ts (test description)

No behavioral change. 40/40 agent-runtime tests pass; 613/613
edge-worker; monorepo typecheck clean.
The typed events story (HarnessRawByKind narrowing SDKMessage / ThreadEvent /
JsonStreamEvent / etc.) only holds if the SDK version we type against
actually describes the bytes the CLI emits. Today the pins were a mix
of exact (claude, cursor) and caret (gemini, codex, opencode) — carets
let a future minor SDK release introduce shapes the runtime CLI
doesn't emit (or vice versa) and quietly break the narrowing.

Pinned everything to exact versions matching the CLI versions we've
empirically tested against:

  | SDK pin                     | Was       | Now      | Matches CLI               |
  | --------------------------- | --------- | -------- | ------------------------- |
  | @anthropic-ai/claude-agent  | 0.2.123   | 0.2.123  | claude 2.1.145            |
  | @cursor/sdk                 | 1.0.13    | 1.0.13   | @cyrus-ai/cursor-runner   |
  | @google/gemini-cli-core     | ^0.42.0   | 0.17.0   | gemini 0.17.0 (per CLAUDE.md) |
  | @openai/codex-sdk           | ^0.131.0  | 0.130.0  | codex 0.130.0             |
  | @opencode-ai/sdk            | ^1.15.5   | 1.15.5   | opencode 1.15.5           |

Also pinned every `@anthropic-ai/claude-code@latest` install command
to `2.1.145` — the chat handler's Daytona setup commands, plus three
test scripts. `@latest` was a silent drift surface: it would install
a CLI whose stream-json shape might not match `@anthropic-ai/claude-
agent-sdk@0.2.123` (the SDK we type against), and the breakage would
show up as runtime type confusion in production rather than at build.
Now a CLI version bump requires a coordinated SDK bump in the same
PR — visible in the diff.

Added a `PINNED_CLAUDE_CLI_VERSION` const + explanatory comment in
AgentChatSessionHandler so future maintainers see the constraint.

Tests: 40/40 agent-runtime, 613/613 edge-worker, monorepo typecheck
clean. Downgrading gemini-cli-core (^0.42.0 → 0.17.0) and codex-sdk
(^0.131.0 → 0.130.0) didn't break anything — both ship the same
JsonStreamEvent / ThreadEvent unions we rely on at the older versions.
…the agent-runtime context

The previous pin (0.17.0) was inherited from the LEGACY
`cyrus-gemini-runner` package, which deliberately holds 0.17.0 for
its own reasons (the package documents this in
`packages/gemini-runner/CLAUDE.md` and pins identically in its own
package.json). cyrus-agent-runtime is a different context — it's the
new runtime that consumers will install going forward, so it should
track the current gemini-cli-core line.

Verified: `@google/gemini-cli-core@0.42.0` ships exactly the same
`JsonStreamEvent` union we narrow against — 6 variants (InitEvent,
MessageEvent, ToolUseEvent, ToolResultEvent, ErrorEvent, ResultEvent),
identical to 0.17.0's union. So no narrowing breakage.

`packages/gemini-runner/` is intentionally left at 0.17.0 — that's
the legacy stack's own pinning decision and shouldn't move just
because we touched the new runtime.

Tests: 40/40 agent-runtime, 613/613 edge-worker, monorepo typecheck clean.
…ex-sdk to 0.131.0

Following the "agent-runtime tracks current" reasoning we used for the
gemini bump, move the remaining SDK pins forward:

  - @anthropic-ai/claude-agent-sdk: 0.2.123 -> 0.2.141 (in 5 packages
    that all need to move together for type identity to unify at the
    package boundaries — agent-runtime devDep, plus runtime deps in
    core / claude-runner / edge-worker / simple-agent-runner)
  - @openai/codex-sdk: 0.130.0 -> 0.131.0 (latest; agent-runtime only)

Both are additive minor bumps — the SDKMessage / ThreadEvent unions
we narrow `HarnessRawByKind[H].raw` against are supersets of the
older shapes (claude added `SDKPermissionDeniedMessage` to the union
and `oauth_org_not_allowed`/`model_not_found` to the error enum;
codex 0.131 ships the same ThreadEvent union as 0.130). Nothing we
read from those types was removed or renamed, so existing consumers
(notably AgentChatSessionHandler.extractAssistantFallback) typecheck
clean.

Also updated the inline comment in AgentChatSessionHandler that
referenced `@anthropic-ai/claude-agent-sdk@0.2.123` as the SDK the
Daytona CLI install pin is paired against — bumped to 0.2.141.
Matching test assertion in AgentChatSessionHandler.provider.test.ts
updated to keep the version-pair note in sync.

Untouched:
  - PINNED_CLAUDE_CLI_VERSION ("2.1.145") — that's the latest CLI
    and pairs with the 0.2.141 SDK
  - packages/codex-runner (^0.125.0) and packages/gemini-runner
    (0.17.0) — legacy packages own their own pinning decisions

Tests: 40/40 agent-runtime, 613/613 edge-worker, monorepo typecheck
clean.
…drop SlackDaytonaRunner

Pulls the agent-runtime additions for caller-driven harness session
resume:

  - RuntimeVolumeConfig.subpath — per-binding isolation within a
    shared provider volume (Daytona Volumes pattern)
  - CreateAgentSessionConfig.resumeHarnessSessionId — caller-supplied
    harness session id to resume; adapter translates to its native
    CLI flag
  - HarnessAdapter.extractSessionId — pulls the harness-native session
    id out of the observed transcript so callers can persist it
  - AgentSessionResult.harnessSessionId — round-trips the id to the
    caller after each run
  - Claude adapter: extractSessionId reads `system.init.session_id`;
    buildCommand appends `--resume <id>` when resumeHarnessSessionId
    is set
  - resume-smoke.mjs — two-turn Daytona Volume smoke test script

Explicitly DROPPED from the merge:
  - packages/edge-worker/src/SlackDaytonaRunner.ts
  - packages/edge-worker/test/SlackDaytonaRunner.test.ts
  - the EdgeWorker.ts wiring that constructs SlackDaytonaRunner under
    the legacy ChatSessionHandler — we use the new
    AgentChatSessionHandler instead (already wired from earlier work
    in this branch)
  - @computesdk/daytona + @daytonaio/sdk deps in edge-worker package.json
    (only SlackDaytonaRunner needed them; agent-runtime already
    depends on @computesdk/daytona directly)

Conflict resolutions:
  - claude.ts: kept our plugin wiring (--plugin-dir / --mcp-config)
    AND added the volumes-branch's --resume handling. Both are
    additive in the args list.
  - session.ts: kept turnEvents semantics for AgentSessionResult.events
    (per-turn slice, what consumers already expect) but added the
    volumes-branch's harnessSessionId extraction over the FULL
    observedEvents (since system.init.session_id arrives once on
    turn 1 and is referenced by every subsequent turn). Adopted
    destroySandboxOnce() for the destroy callback (cleaner than
    destroy() which also tries to cancel an already-completed run).
  - harnesses.test.ts: kept all tests from both sides — bypass
    permission mapping + --resume + extractSessionId.
  - runtime.test.ts: fixed the volumes-branch's resume test to use
    our actual session API (session.run("prompt") vs session.start(),
    config.userPrompt was wrong and was just a stray field).

Tests: 44/44 agent-runtime, 613/613 edge-worker, monorepo typecheck
clean.
…-to-end

Wires cursor into the same caller-driven resume contract Claude got
from the volumes merge:

- `extractSessionId(events)` — walks the SDKMessage stream for the
  first `agent_id` (every variant carries it; the value is stable
  across the whole run). Returned as
  `AgentSessionResult.harnessSessionId` for the caller to persist.

- `buildCommand` — when `config.resumeHarnessSessionId` is set,
  appends `--agent-id <id>` to the cursor-runner invocation. The
  runner reads that flag and calls `Agent.resume(<id>)` instead of
  `Agent.create()`, picking up the prior conversation. The runner's
  `--agent-id-file` flag is unchanged — kept for callers that prefer
  the runner-writes-it-to-disk pattern; this adapter just doesn't
  use it because the runtime now surfaces the id via
  extractSessionId.

Includes a guard against non-object `event.raw` in extractSessionId
— runtime lifecycle events can emit string raw values, and the `in`
operator throws on those. The chat-typed shape says `raw: SDKMessage`
but the buffer holds both harness-streamed and runtime-lifecycle
events; structural guard is the right move at the adapter boundary.

Tests: 4 new tests in harnesses.test.ts covering `--agent-id` passed
when set, omitted when not, agent_id extracted from a realistic
status+assistant transcript, and undefined when nothing in the
transcript carries one. 48/48 agent-runtime pass; 613/613 edge-worker;
monorepo typecheck clean.

Cursor is now feature-equivalent to Claude on the multi-turn resume
contract: caller persists `result.harnessSessionId`, passes it back
as `resumeHarnessSessionId` on the next session config, gets a
continuation run. Works for local and Daytona-snapshot modes alike
since the resume state lives on Cursor's servers (addressable by
agentId) rather than in any filesystem the runtime needs to
preserve.
…ldStateEnv

Adds a consumer-facing `sandbox.persistentState: { volume, bindingId }`
config that hides the per-harness env-var math. The runtime mounts the
caller's volume at a fixed internal path with bindingId as the subpath,
then asks the harness adapter (via the new `buildStateEnv(mountPath)`
hook) which env vars to set so the harness writes its state-dir there
instead of under `$HOME`.

Each adapter's env mapping is grounded in upstream source, not guessed:
- claude  → CLAUDE_CONFIG_DIR = `${m}/.claude`
- cursor  → CURSOR_DATA_DIR  = `${m}/.cursor`
- codex   → CODEX_HOME       = `${m}/.codex`
  (`codex-rs/utils/home-dir/src/lib.rs::find_codex_home`)
- gemini  → GEMINI_CLI_HOME  = `${m}`  (CLI appends `.gemini` itself;
  `@google/gemini-cli-core::homedir()`)
- opencode → all four XDG dirs under `${m}/.opencode-xdg/{config,data,state,cache}`
  (no app-specific override; opencode derives via `xdg-basedir`)
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19dce79533

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +158 to +161
const oauth = process.env.CLAUDE_CODE_OAUTH_TOKEN?.trim();
if (oauth) return { kind: "oauth", token: oauth };
const apiKey = process.env.ANTHROPIC_API_KEY?.trim();
if (apiKey) return { kind: "apiKey", token: apiKey };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Support ANTHROPIC_AUTH_TOKEN in chat credential detection

readClaudeCredential() only accepts CLAUDE_CODE_OAUTH_TOKEN or ANTHROPIC_API_KEY, so Slack chat sessions now reject environments that authenticate Claude with ANTHROPIC_AUTH_TOKEN (they hit the "not configured" reply path). This is a regression from the prior runner path, which still treats ANTHROPIC_AUTH_TOKEN as a valid auth env (see packages/claude-runner/src/session-env.ts), so existing deployments using that variable will fail for every chat mention until they reconfigure.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e1531d — added a third authToken variant to the ClaudeCredential union, scan ANTHROPIC_AUTH_TOKEN last (matching session-env.ts AUTH_ENV_KEYS precedence), and forward exactly that env var to the harness for kind=authToken. Updated the "not configured" error to list all three options. New tests cover detection precedence, whitespace-only handling, and per-kind forwarding.

Comment thread packages/agent-runtime/src/session.ts Outdated
Comment on lines +730 to +731
if (out.mcpConfigPath) {
this.pluginOutputs.claudeMcpConfigPath = out.mcpConfigPath;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Merge Claude MCP configs across all plugins

In materializePlugins(), each Claude plugin's mcpConfigPath overwrites the previous one, but claudeHarness can pass only the final scalar via --mcp-config. When multiple plugins define MCP servers, only the last plugin's MCP config is actually wired into Claude (especially with --strict-mcp-config), so tools from earlier plugins silently disappear. This should aggregate all plugin MCP servers into one combined config (or equivalent) before command construction.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e1531dmaterializePlugins now accumulates every plugin's mcpServers map and, after the loop, writes one combined <pluginsRoot>/.mcp.combined.json that --mcp-config points at. Per-plugin .mcp.json files are still written (part of the documented Claude plugin layout for --plugin-dir consumers); the combined file is just the handoff target for the single-scalar --mcp-config flag. Caller plugin order determines precedence on duplicate server names — later wins. New tests prove three plugins all reach the harness via one merged config and that the shadow rule holds.

Conflict resolutions:
- CHANGELOG.internal.md: keep both — our Unreleased agent-runtime entries
  (refreshed to match what's actually on this branch, drop the
  SlackDaytonaRunner entry since that work was reverted) and main's
  [0.2.52] release marker.
- package.json: keep all new pnpm.overrides from main; collapse the
  five overrides that appeared in both halves of the conflict block.
- packages/edge-worker/src/EdgeWorker.ts: accept main's new
  buildSkillSessionContext method (three call sites depend on it).
- pnpm-lock.yaml: regenerated via pnpm install --no-frozen-lockfile.

Post-merge fixes (auto-merge produced bad output that compiled to
duplicate object literal keys):
- packages/codex-runner/src/CodexRunner.ts: remove duplicate
  stop_details: null from two assistant message factories.
- packages/cursor-runner/src/CursorRunner.ts: same.
- packages/gemini-runner/src/adapters.ts: same.

Post-merge cleanup:
- packages/edge-worker/src/EdgeWorker.ts: drop unused
  getDefaultModelForRunner / getDefaultFallbackModelForRunner private
  wrappers that main added — all call sites in this branch already use
  runnerSelectionService.* directly.

Dependency security policy compliance:
- Add root pnpm.overrides for brace-expansion (>=5.0.6),
  ws (>=8.20.1), protobufjs (>=7.5.8) to keep pnpm audit clean after
  the merge surfaced advisories under our pinned @google/gemini-cli-core
  transitive dep.

Verified: pnpm build / pnpm typecheck / pnpm test:packages:run all clean;
pnpm audit reports no known vulnerabilities.
Four standalone smoke scripts (created before they got linted via the
pre-commit hook) had format / organize-imports diffs that pnpm lint
caught on CI. Pure formatting — no behavior change.
P1 (edge-worker, AgentChatSessionHandler): accept ANTHROPIC_AUTH_TOKEN
in Slack chat credential detection. The handler previously read only
CLAUDE_CODE_OAUTH_TOKEN and ANTHROPIC_API_KEY, so deployments that
auth Claude via an Anthropic-compatible proxy/gateway (using
ANTHROPIC_AUTH_TOKEN) hit the "not configured" reply path even though
the legacy claude-runner accepts that env var (see
claude-runner/src/session-env.ts AUTH_ENV_KEYS). Extend the
ClaudeCredential discriminated union with a third variant, scan that
env var third in the documented precedence order, and forward only the
matching env var to the harness so the three auth modes don't get
conflated. Update the "not configured" error message to mention all
three options.

P2 (agent-runtime, RuntimeAgentSession.materializePlugins): merge
Claude MCP server configs across plugins. Claude's `--mcp-config` flag
takes a single scalar path, so the previous "last writer wins"
overwrite of claudeMcpConfigPath silently dropped every plugin's MCP
servers except the last (and `--strict-mcp-config` made that fatal for
tool calls into the dropped servers). Accumulate every plugin's
mcpServers map and, after the loop, write one combined
.mcp.combined.json at the plugins root and point `--mcp-config` at
that. Per-plugin .mcp.json files are still written by the materializer
because they're part of the documented Claude plugin layout that
`--plugin-dir` consumers expect; the combined file is purely the
handoff target for `--mcp-config`. Caller-supplied plugin order
determines precedence on duplicate server names (later wins), which
the new test locks in.

Tests:
- edge-worker: 7 new tests covering credential detection precedence,
  trim-whitespace, and per-kind env-var forwarding (oauth / apiKey /
  authToken) into the Daytona session config.
- agent-runtime: updated the existing single-plugin assertion to
  expect `--mcp-config .mcp.combined.json`; added two new tests — one
  proving three plugins all reach the harness via one merged config,
  one proving caller plugin order picks the winning shadow on
  duplicate server names.

Verified: pnpm -F cyrus-agent-runtime test:run (59 tests, was 57);
pnpm -F cyrus-edge-worker test:run (631 tests, was 624);
pnpm typecheck + pnpm lint clean.
@Connoropolous
Copy link
Copy Markdown
Contributor Author

Sample usage of createAgentSession — still WIP (no exported examples / no narrative README), but this is the shape today. Three representative cases:

1. Minimal local invocation — packages/agent-runtime/test/runtime.test.ts:54-69

import { createAgentSession } from "cyrus-agent-runtime";

const session = await createAgentSession({
  sessionId: "session-1",
  harness: "codex",                       // shorthand for { kind: "codex" }
  env: { NODE_ENV: "test" },
  secrets: { API_KEY: "secret" },         // string shorthand → { value, redact: true }
});

await session.addMessage("queued");
const result = await session.run("Do it");
// result.events / result.result / result.harnessSessionId / result.destroy()

Default sandbox: { provider: "local", workingDirectory: cwd } is auto-filled by normalizeConfig. No type parameter needed when the harness is unambiguous.

2. Production Daytona Claude invocation — packages/edge-worker/src/AgentChatSessionHandler.ts:485-672

const sessionConfig = {
  sessionId,
  harness: { kind: "claude", command: this.daytonaClaudeCliPath },
  systemPrompt,
  // Discriminated on credential.kind so the three Claude auth modes
  // don't get conflated — only one of these env vars ships through.
  secrets: credential.kind === "oauth"
    ? { CLAUDE_CODE_OAUTH_TOKEN: credential.token }
    : credential.kind === "apiKey"
      ? { ANTHROPIC_API_KEY: credential.token }
      : { ANTHROPIC_AUTH_TOKEN: credential.token },
  permissions: { mode: "bypass" },        // sandbox is the isolation boundary
  packages: { commands: [...this.daytonaSetupCommands] },
  // Chat-session MCP servers wrapped into one anonymous plugin; the
  // materializer fans this out to Claude's native .mcp.json + writes a
  // session-level .mcp.combined.json that --mcp-config points at.
  plugins: [{ name: "chat", mcpServers: toRuntimeMcpServers(mcpServers) }],
  sandbox: {
    provider: "daytona",
    name: `cyrus-slack-${sessionId}`,
    workingDirectory: this.daytonaWorkingDir,
    timeoutMs: 300_000,
    destroyWhileInactive: true,           // stop()/start() between turns
    snapshot: this.daytonaSnapshot,       // pre-installed harness binaries
    metadata: { purpose: "cyrus-slack-chat", threadKey },
  },
};

// Explicit <"claude"> threads SDKMessage typing through to
// session.events / result.events with no cast at consumer sites.
const session = await createAgentSession<"claude">(sessionConfig, {
  callbacks: {
    onTranscriptEvent: (te) => {
      // te.raw is typed `SDKMessage` here.
      logger.debug(`[${sessionId}] transcript event: ${te.kind}`);
    },
  },
});

const result = await session.run(userPrompt);  // first turn
// ...later, same `session` object:
const result2 = await session.run(followUpPrompt);  // resumes via Claude --continue

3. Multi-turn resume across brand-new sandboxes — sandbox.persistentState + resumeHarnessSessionId

// Turn 1 — create
const s1 = await createAgentSession<"claude">({
  sessionId: "turn-1",
  harness: "claude",
  sandbox: {
    provider: "daytona",
    persistentState: {
      // Caller picks backing volume + a stable binding identifier.
      // No knowledge of CLAUDE_CONFIG_DIR or mount paths — the runtime
      // mounts the volume internally and the adapter contributes the
      // right state-env var (Claude→CLAUDE_CONFIG_DIR, Cursor→CURSOR_DATA_DIR,
      // Codex→CODEX_HOME, Gemini→GEMINI_CLI_HOME, OpenCode→XDG_*_HOME).
      volume: { name: "cyrus-prod-vol", kind: "fuse" },
      bindingId: threadKey,
    },
  },
});
const r1 = await s1.run("Tell me a joke");
const harnessId = r1.harnessSessionId;   // capture for later
await r1.destroy();                       // sandbox torn down

// Days later, brand new sandbox — same on-disk state via volume + bindingId:
const s2 = await createAgentSession<"claude">({
  sessionId: "turn-2",
  harness: "claude",
  resumeHarnessSessionId: harnessId,      // Claude --resume <id>
  sandbox: {
    provider: "daytona",
    persistentState: {
      volume: { name: "cyrus-prod-vol", kind: "fuse" },
      bindingId: threadKey,               // same → same state visible
    },
  },
});
const r2 = await s2.run("What was the punchline?");

The pattern in AgentChatSessionHandler keeps the same session object across turns and relies on destroyWhileInactive: true to pause/resume the sandbox between mentions. resumeHarnessSessionId + persistentState is the cross-process variant for cases where the caller is itself stateless between turns (e.g. serverless).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants