Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
8a9fac9
feat(agent-runtime): add sandboxed harness runtime package
Connoropolous May 15, 2026
f9b6150
docs(agent-runtime): record claude daytona validation
Connoropolous May 15, 2026
cabbde4
feat(agent-runtime): live process streaming for local and Daytona san…
Connoropolous May 15, 2026
c91a3cc
feat(agent-runtime): folders and repositories as first-class session …
Connoropolous May 15, 2026
a54d9d8
feat(agent-runtime): add destroy() to AgentSessionResult
Connoropolous May 15, 2026
5e085ae
refactor(agent-runtime): decouple stop() from sandbox destruction
Connoropolous May 15, 2026
30b6a46
feat(edge-worker): replace ChatSessionHandler with AgentChatSessionHa…
Connoropolous May 15, 2026
8262503
fix(edge-worker): hardwire Slack chat sessions to Daytona, not local
Connoropolous May 16, 2026
ef7cda5
fix: add tslib to @daytonaio/sdk via pnpm packageExtensions
Connoropolous May 16, 2026
799019f
fix: harden tslib fix for @daytonaio/sdk via pnpm patch
Connoropolous May 16, 2026
59fef80
feat(agent-runtime): multi-turn run() + per-session state backing
Connoropolous May 17, 2026
839a674
feat(agent-runtime): destroyWhileInactive — pause sandbox between runs
Connoropolous May 17, 2026
082a032
feat(agent-runtime): RuntimePlugin for Claude / Cursor / Codex (MCP +…
Connoropolous May 19, 2026
e423774
docs(agent-runtime): document why codex hooks materializer defers
Connoropolous May 19, 2026
5613c5e
refactor(agent-runtime): drop CreateAgentSessionConfig.mcps
Connoropolous May 19, 2026
bd45903
docs(agent-runtime): correct codex hooks comment — bypass field absen…
Connoropolous May 19, 2026
27a4a4c
Add wokring implementation of Daytona volume mounting in sandbox runtime
PaytonWebber May 19, 2026
05433e7
docs(agent-runtime): cite upstream codex issues blocking hooks materi…
Connoropolous May 19, 2026
935185d
feat(core): add defaultProvider to EdgeConfig
Connoropolous May 19, 2026
926cfa8
feat(edge-worker): AgentChatSessionHandler picks provider from EdgeCo…
Connoropolous May 19, 2026
e38f9bb
fix(edge-worker): use ANTHROPIC_API_KEY (not ANTHROPIC_AUTH_TOKEN); n…
Connoropolous May 19, 2026
0e7779a
feat(edge-worker): wire MCP servers into AgentChatSessionHandler
Connoropolous May 19, 2026
de77070
refactor(agent-runtime): remove pi harness
Connoropolous May 19, 2026
196c187
feat(agent-runtime): typed events per harness via AgentSession<H> gen…
Connoropolous May 19, 2026
9fd940f
feat(agent-runtime): AgentSession.transcript() + typed chat consumer
Connoropolous May 19, 2026
769103d
feat(agent-runtime): vendor a Cursor SDK driver so cursor events are …
Connoropolous May 19, 2026
da016b5
fix(cli): forward EdgeConfig.defaultProvider to EdgeWorkerConfig
PaytonWebber May 19, 2026
7158c99
feat(edge-worker): Daytona chat sandbox snapshot + custom layout + by…
PaytonWebber May 19, 2026
1a810aa
feat: extract Cursor SDK driver as publishable @cyrus/cursor-runner p…
Connoropolous May 20, 2026
7a6a811
chore(agent-runtime): finish removing src/harnesses/cursor-driver.ts
Connoropolous May 20, 2026
58ef6f1
Merge remote-tracking branch 'cyrusagents/feature/add-daytona-base-sn…
Connoropolous May 20, 2026
1604576
feat(agent-runtime): cursor adapter honors harness.command for snapsh…
Connoropolous May 20, 2026
f806e0e
chore(cursor-sdk-runner): rename @cyrus/cursor-runner -> @cyrus-ai/cu…
Connoropolous May 20, 2026
87a20fb
chore: pin all harness SDK type-deps and Claude CLI install version
Connoropolous May 20, 2026
3df1a7d
chore(agent-runtime): bump @google/gemini-cli-core pin to 0.42.0 for …
Connoropolous May 20, 2026
ac63b38
chore: bump @anthropic-ai/claude-agent-sdk to 0.2.141 and @openai/cod…
Connoropolous May 20, 2026
ec82a9c
Merge feature/agent-runtime-daytona-volumes — keep volumes + resume, …
Connoropolous May 20, 2026
35f2a61
feat(agent-runtime): cursor harness honors resumeHarnessSessionId end…
Connoropolous May 20, 2026
19dce79
feat(agent-runtime): persistentState abstraction with per-harness bui…
Connoropolous May 20, 2026
f26bca1
Merge main into claude/agent-runtime-slack-chat-replacement
Connoropolous May 20, 2026
3d97239
style: biome --write on test-scripts (CI lint fix)
Connoropolous May 20, 2026
1e1531d
fix(edge-worker,agent-runtime): codex review followups
Connoropolous May 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CHANGELOG.internal.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,19 @@ This changelog documents internal development changes, refactors, tooling update

## [Unreleased]

### Added
- Added `cyrus-agent-runtime`, a standalone experimental TypeScript package for unified agent session orchestration across harnesses and sandbox providers. It includes normalized session config, transcript envelopes, local and ComputeSDK-backed sandbox abstractions, harness adapters for Claude/Codex/Cursor/Gemini/OpenCode, and focused tests for config, runtime lifecycle, sandbox execution, and transcript parsing.
- Added live process streaming to `cyrus-agent-runtime`. New optional `RunnerSandbox.streamCommand(command, options)` capability surfaces stdout/stderr chunks to callbacks as they arrive, with `signal: AbortSignal` for cancellation and `input: AsyncIterable<string>` for live stdin. Implemented natively in `LocalSandboxProvider` (via `child_process.spawn`) and for Daytona inside `ComputeSdkSandboxProvider` via a pluggable `NativeStreamAdapter` registry that reaches the underlying `@daytonaio/sdk` Sandbox through ComputeSDK's `ProviderSandbox.getInstance()` escape hatch, using async sessions + `getSessionCommandLogs(onStdout, onStderr)`. User-supplied adapters can be registered via `ComputeSdkSandboxProviderOptions.nativeStreamAdapters` for ComputeSDK providers we don't bundle (E2B, Vercel, Blaxel, Modal, Railway, Runloop, Cloudflare, Codesandbox). `RuntimeAgentSession.start()` now prefers `streamCommand` when `capabilities.streamingProcess` is true, line-buffers chunks across packet boundaries, and emits `TranscriptEvent`s live as the harness CLI produces them. New `CreateAgentSessionConfig.interactiveInput` opt-in flag routes `addMessage()` chunks into the running process's stdin (most one-shot CLIs hang on piped-but-never-closed stdin, so this defaults off). Verified end-to-end against real `codex exec` (events emitted ~8.6s before turn end), the local `child_process.spawn` path (chunks landed at the exact 400ms cadence the child produced them), and real Daytona Claude `stream-json` (system event landed 1.7s before result event over a remote sandbox).
- Added `folders` and `repositories` to the `cyrus-agent-runtime` session config — two new materialization concepts that are deliberately distinct from existing `volumes`. `RuntimeFolderConfig` exposes a host filesystem folder inside the sandbox (walks the host tree, uploads each file via `SandboxFilesystem.writeFile`, supports an `exclude` glob list) and with `access: "readwrite"` syncs sandbox edits and any newly-created files back to the host folder after the harness command completes. `RuntimeRepositoryConfig` runs `git clone` inside the sandbox at `mountPath` with optional `branch` checkout and `depth` shallow-clone; local-path sources are converted to `file://...` to preserve git semantics, and shallow clones with a branch use `--branch` on the clone itself (since `git checkout` of a non-default branch fails on a shallow clone). Both emit lifecycle transcript events (`folder.materialize.started/completed/failed`, `folder.syncback.started/completed/failed`, `repository.materialize.started/completed/failed`) and run before the package setup commands so any setup that depends on the cloned tree or the mounted folder sees them ready.
- Added `destroy()` to `AgentSessionResult` in `cyrus-agent-runtime` — equates to ComputeSDK's `ProviderSandbox.destroy()` for ComputeSDK-backed providers (deletes the remote sandbox, releases compute resources) and is a no-op for the local provider. Idempotent. Lets consumers hold only the result, consume the events/result, and tear down without keeping a session reference.
- Decoupled `AgentSession.stop()` from sandbox destruction. `stop()` now cancels the in-flight harness only — aborts the running process, closes the live event stream, closes the input pipe — and leaves the sandbox alive. Sandbox teardown is the sole responsibility of the new `destroy()` method, which exists symmetrically on both `AgentSession` and `AgentSessionResult` (sharing a one-shot internal teardown promise). `AgentSession.destroy()` also implicitly cancels an in-flight run via `stop()` before releasing the sandbox, so callers don't need a two-step. Decoupling enables future workflows that reuse a warm sandbox across runs (per CYPACK-1209) — a single run's `stop()` no longer destroys shared compute.
- Added session resume primitives to `cyrus-agent-runtime`. `CreateAgentSessionConfig.resumeHarnessSessionId` is caller-supplied — Claude adapter translates it into `--resume <id>`; Cursor adapter translates it into `--agent-id <id>` for `@cyrus-ai/cursor-runner`. `AgentSessionResult.harnessSessionId` is the new harness-native id observed in this run, captured by `HarnessAdapter.extractSessionId(events)` (implemented for Claude against `system.init.session_id`, for Cursor against `SDKMessage.agent_id`) and surfaced for callers to persist. The caller owns the mapping between its session records and harness-native ids; the runtime does not persist transcripts itself.
- Added `sandbox.persistentState: { volume, bindingId }` to `cyrus-agent-runtime` — caller-facing abstraction that hides the per-harness state-env-var math. The runtime mounts the caller's volume at a fixed internal path with `bindingId` as the subpath, then calls each adapter's new `buildStateEnv(mountPath)` hook to inject the right env vars so the harness writes its state-dir there. Verified upstream per harness: Claude → `CLAUDE_CONFIG_DIR=${m}/.claude`; Cursor → `CURSOR_DATA_DIR=${m}/.cursor`; Codex → `CODEX_HOME=${m}/.codex`; Gemini → `GEMINI_CLI_HOME=${m}` (CLI appends `.gemini` itself); OpenCode → all four `XDG_*_HOME` dirs under `${m}/.opencode-xdg/{config,data,state,cache}` (no app-specific override exists).
- Extracted `@cyrus-ai/cursor-runner` as a publishable package — a thin CLI wrapper around `@cursor/sdk` that emits `SDKMessage` JSONL. Lets the agent-runtime cursor adapter consume a typed, version-pinned wire format that we own (no schema drift vs `cursor-agent`).
- Added `RuntimePlugin` to `cyrus-agent-runtime` — bundles MCP servers + hooks + skills with per-harness materializers translating one declaration into Claude / Cursor / Codex native filesystem state inside the sandbox. The bundled MCP-config path replaces the old standalone `mcps` field on session config.
- Added Daytona volume mounting (`RuntimeVolumeConfig` with provider-driven `kind: "bind" | "fuse" | "provider"` and `subpath` for per-binding isolation), `destroyWhileInactive` (pauses the underlying sandbox between `run()` calls — Daytona stop/start preserves on-disk state at a few-second resume cost), and Daytona base-snapshot harness binaries (`harness.command` lets adapters spawn snapshot-resident binaries directly; Cursor uses this for `cursor-runner`).
- Wired `cyrus-edge-worker` `AgentChatSessionHandler` to the agent-runtime: provider chosen from `EdgeConfig.defaultProvider`, MCP servers forwarded via the new plugin shape, Daytona chat sandbox snapshot + custom layout + bypass perms threaded through, `ANTHROPIC_API_KEY` vs `ANTHROPIC_AUTH_TOKEN` precedence fixed.

## [0.2.52] - 2026-05-13

_No internal-only changes._
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,15 @@ All notable changes to this project will be documented in this file.
### Added
- **User skills can now be scoped to specific repositories, Linear teams, or Linear labels** — Skills synced from cyrus-hosted with `repositoryIds`, `linearTeamIds`, or `linearLabelIds` are only loaded into sessions whose context matches every populated dimension (AND across dimensions, OR within each list). Unscoped skills continue to load for every session, and old payloads without scope fields keep working as global. Scope is persisted as a `scope.json` sidecar alongside `SKILL.md` and enforced at runtime via the Claude Agent SDK's `skills` option so the model can't see or invoke out-of-scope skills. ([CYPACK-1156](https://linear.app/ceedar/issue/CYPACK-1156), [#1205](https://github.com/cyrusagents/cyrus/pull/1205))
- **Shared auto-memory across Slack chat sessions** — Slack-triggered chat sessions now share a persistent Claude auto-memory directory at `<cyrusHome>/slack-memory/`, so memory built up in one Slack thread carries over to every other Slack thread. ([CYPACK-1190](https://linear.app/ceedar/issue/CYPACK-1190), [#1199](https://github.com/cyrusagents/cyrus/pull/1199))
- **Base snapshot for Daytona chat sandboxes via `DAYTONA_SNAPSHOT`** — When `DAYTONA_SNAPSHOT` is set in the environment, Daytona-backed chat sessions seed their sandboxes from that pre-built Daytona snapshot instead of the default base image. When a snapshot is in use the npm-install bootstrap is skipped (the snapshot is expected to ship Claude Code preinstalled) and the CLI defaults to `claude` on `PATH`. Two companion overrides let snapshots use any home layout: `DAYTONA_WORKING_DIR` (default `/home/daytona`) sets the in-sandbox working directory, and `DAYTONA_CLAUDE_CLI_PATH` overrides the `claude` binary path.

### Fixed
- **Session Stop hook now actually reminds the agent to ship before stopping** — Replaced the broken Stop-hook return shape (`additionalContext` + `continue: true`, which the Claude Agent SDK silently drops) with the SDK's documented `decision: "block"` + `reason` form. The first stop attempt now blocks and feeds the commit/push/PR reminder back into the next turn; a second stop (with `stop_hook_active === true`) proceeds, preventing infinite loops. ([CYPACK-1204](https://linear.app/ceedar/issue/CYPACK-1204), [#1210](https://github.com/cyrusagents/cyrus/pull/1210))
- **Slack chat sessions can now read and edit their shared auto-memory** — The shared auto-memory directory (`<cyrusHome>/slack-memory/`) is now included in `allowedDirectories` for chat sessions. Previously, sessions could create new memory files via shell redirects, but `Read`/`Edit`/`Glob` against existing memory files (including `MEMORY.md`) were denied by the home-directory restriction rules, leaving the auto-memory feature half-working. ([CYPACK-1197](https://linear.app/ceedar/issue/CYPACK-1197), [#1206](https://github.com/cyrusagents/cyrus/pull/1206))

### Changed
- **Slack mention prompt nudges agents toward `linear_agent_give_feedback` for live child sessions** — When responding in Slack, Cyrus is now told to send mid-flight corrections to a running child agent session via `mcp__cyrus-tools__linear_agent_give_feedback` instead of falling back to `mcp__linear__save_comment`. Produces a stronger signal when correcting work that is already in progress. ([CYPACK-1189](https://linear.app/ceedar/issue/CYPACK-1189), [#1198](https://github.com/cyrusagents/cyrus/pull/1198))
- **Daytona chat sessions now bypass Claude's permission prompts** — Since the Daytona sandbox is itself the isolation boundary, blocking on per-tool prompts (which no user can answer) was preventing the agent from running shell commands inside the sandbox. Local sessions still prompt as before.

### Packages

Expand Down
1 change: 1 addition & 0 deletions apps/cli/src/services/WorkerService.ts
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,7 @@ export class WorkerService {
| "codex"
| "cursor"
| undefined) || edgeConfig.defaultRunner,
defaultProvider: edgeConfig.defaultProvider,
issueUpdateTrigger: edgeConfig.issueUpdateTrigger,
promptDefaults: edgeConfig.promptDefaults,
linearWorkspaces: edgeConfig.linearWorkspaces,
Expand Down
27 changes: 21 additions & 6 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,13 @@
"onlyBuiltDependencies": [
"sqlite3"
],
"packageExtensions": {
"@daytonaio/sdk": {
"dependencies": {
"tslib": "^2"
}
}
},
"overrides": {
"jws": ">=4.0.1",
"@modelcontextprotocol/sdk": ">=1.26.0",
Expand All @@ -54,11 +61,6 @@
"vite": ">=7.1.11",
"zod": "4.3.6",
"hono": ">=4.12.18",
"fast-uri": ">=3.1.2",
"ip-address": ">=10.1.1",
"@anthropic-ai/sdk": ">=0.91.1",
"@opentelemetry/sdk-node": ">=0.217.0",
"@opentelemetry/exporter-prometheus": ">=0.217.0",
"@hono/node-server": ">=1.19.10",
"rollup": ">=4.59.0",
"flatted": ">=3.4.0",
Expand All @@ -72,7 +74,20 @@
"diff": ">=8.0.3",
"@tootallnate/once": ">=3.0.1",
"@isaacs/brace-expansion": ">=5.0.1",
"tar": ">=7.5.11"
"tar": ">=7.5.11",
"fast-uri": ">=3.1.2",
"ip-address": ">=10.1.1",
"@opentelemetry/sdk-node": ">=0.217.0",
"@opentelemetry/exporter-prometheus": ">=0.217.0",
"@opentelemetry/otlp-transformer>protobufjs": ">=8.0.2",
"protobufjs": ">=7.5.8",
"ws": ">=8.20.1",
"brace-expansion": ">=5.0.6",
"@anthropic-ai/sdk": ">=0.91.1",
"@daytonaio/sdk": ">=0.175.0"
},
"patchedDependencies": {
"@daytonaio/sdk@0.175.0": "patches/@daytonaio__sdk@0.175.0.patch"
}
},
"lint-staged": {
Expand Down
53 changes: 53 additions & 0 deletions packages/agent-runtime/ASSUMPTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Agent Runtime Assumptions

This package is intentionally built as a new standalone runtime layer with minimal dependency on the existing Cyrus runner packages.

## Product Contract

- The package exposes a TypeScript library API first. It does not ship a daemon or CLI in this iteration.
- A session has one Cyrus-owned `sessionId`. Harness-native session identifiers are represented as transcript metadata when a harness emits them.
- Transcript events preserve raw harness JSON whenever possible and wrap it in a stable runtime envelope.
- `addMessage()` queues messages for harnesses that do not support interactive stdin yet. The queue is visible and testable, but delivery is capability-gated.
- `interrupt()` is a soft user-message interruption when supported. `stop()` is lifecycle cancellation and attempts to terminate the running process.

## Harness Contract

- Claude, Codex, Cursor, Gemini, PI, and OpenCode are represented as harness adapters.
- Claude, Codex, Cursor, and Gemini command-line conventions are modeled from locally available CLIs and existing public behavior.
- PI and OpenCode are provisional adapters. Their commands and JSON formats are assumptions until real CLI transcripts are supplied.
- Harness adapters own command construction and transcript parsing. They do not own sandbox provisioning.

## Sandbox Contract

- Local execution is modeled as a sandbox provider. This keeps local and remote execution behind the same conceptual interface.
- ComputeSDK is the vendor abstraction for remote sandbox providers.
- The common ComputeSDK `runCommand()` API is treated as sufficient for one-shot harness runs.
- Streaming process execution is modeled as a capability, but is not assumed for every ComputeSDK provider. Full interactive harness support requires a provider-specific streaming process implementation.
- Volumes, FUSE mounts, snapshots, ports, and network egress are represented in config types even when a provider cannot enforce them yet.
- `RuntimeVolumeConfig.subpath` carries the provider-defined prefix used to scope a shared volume. The Daytona Volumes pattern is the reference use case; other providers map `subpath` as appropriate.

## Session Resume Contract

- The runtime exposes two resume primitives. The caller (Cyrus's `AgentSessionManager`) owns the mapping between its session records and harness-native session ids.
- `CreateAgentSessionConfig.resumeHarnessSessionId`: caller-supplied prior id. Harness adapters translate it into the right CLI flag (e.g. `--resume <id>` for Claude).
- `AgentSessionResult.harnessSessionId`: the new harness-native id observed in this run's transcript, surfaced for the caller to persist for next time.
- Harness adapters extract the harness-native session id from transcript events via `extractSessionId(events)`. Claude's `system.init.session_id` is the canonical example.
- The runtime does not persist transcripts itself. For the harness to actually see prior conversation on resume, the caller must arrange durable storage for the harness's config dir — for example by attaching a `RuntimeVolumeConfig` (Daytona Volumes are the reference) mounted at the harness's config path and setting the matching env var (`CLAUDE_CONFIG_DIR` for Claude).
- Daytona's ComputeSDK provider was smoke-tested with a remote working directory of `/home/daytona`; `/workspace` should not be assumed portable across providers.
- Cursor Agent was smoke-tested inside Daytona by installing the CLI with `curl https://cursor.com/install -fsS | bash` and running `/home/daytona/.local/bin/cursor-agent` with `CURSOR_API_KEY` provided as a secret environment variable.
- Codex Agent was smoke-tested inside Daytona far enough to authenticate and start a turn by materializing `~/.codex/auth.json` as a sensitive runtime file. Passing only `OPENAI_API_KEY` from the local Codex auth file produced a remote 401. The authenticated Codex turn later hit the account usage limit.
- Claude Code was smoke-tested inside Daytona by installing the CLI with a user-local npm prefix and running `/home/daytona/.npm-global/bin/claude` with `CLAUDE_CODE_OAUTH_TOKEN` provided as a secret environment variable. The remote session emitted `system`/`assistant`/`result` events and completed successfully.

## Security Contract

- `env` is safe-to-log configuration. `secrets` must be redacted from transcript and error metadata.
- Secrets are passed into process environments only at execution time.
- Tool permissions are represented as declarative runtime config and translated into harness-native flags where currently known.
- Network egress policy is a declarative provider option in this iteration. Enforcement depends on the selected sandbox provider.

## Feedback Loops

- Config schema tests prove the public contract accepts and rejects expected shapes.
- Local sandbox tests prove the local provider can write files and execute commands.
- Harness adapter tests prove command construction and transcript parsing.
- Session runtime tests prove event emission, queueing, stop behavior, and result propagation.
Loading
Loading