Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.internal.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,13 @@ This changelog documents internal development changes, refactors, tooling update

## [Unreleased]

### Added
- Added `cyrus-agent-runtime`, a standalone experimental TypeScript package for unified agent session orchestration across harnesses and sandbox providers. It includes normalized session config, transcript envelopes, local and ComputeSDK-backed sandbox abstractions, harness adapters for Claude/Codex/Cursor/Gemini plus provisional PI/OpenCode adapters, and focused tests for config, runtime lifecycle, sandbox execution, and transcript parsing.
- Added live process streaming to `cyrus-agent-runtime`. New optional `RunnerSandbox.streamCommand(command, options)` capability surfaces stdout/stderr chunks to callbacks as they arrive, with `signal: AbortSignal` for cancellation and `input: AsyncIterable<string>` for live stdin. Implemented natively in `LocalSandboxProvider` (via `child_process.spawn`) and for Daytona inside `ComputeSdkSandboxProvider` via a pluggable `NativeStreamAdapter` registry that reaches the underlying `@daytonaio/sdk` Sandbox through ComputeSDK's `ProviderSandbox.getInstance()` escape hatch, using async sessions + `getSessionCommandLogs(onStdout, onStderr)`. User-supplied adapters can be registered via `ComputeSdkSandboxProviderOptions.nativeStreamAdapters` for ComputeSDK providers we don't bundle (E2B, Vercel, Blaxel, Modal, Railway, Runloop, Cloudflare, Codesandbox). `RuntimeAgentSession.start()` now prefers `streamCommand` when `capabilities.streamingProcess` is true, line-buffers chunks across packet boundaries, and emits `TranscriptEvent`s live as the harness CLI produces them. New `CreateAgentSessionConfig.interactiveInput` opt-in flag routes `addMessage()` chunks into the running process's stdin (most one-shot CLIs hang on piped-but-never-closed stdin, so this defaults off). Verified end-to-end against real `codex exec` (events emitted ~8.6s before turn end), the local `child_process.spawn` path (chunks landed at the exact 400ms cadence the child produced them), and real Daytona Claude `stream-json` (system event landed 1.7s before result event over a remote sandbox).
- Added `folders` and `repositories` to the `cyrus-agent-runtime` session config — two new materialization concepts that are deliberately distinct from existing `volumes`. `RuntimeFolderConfig` exposes a host filesystem folder inside the sandbox (walks the host tree, uploads each file via `SandboxFilesystem.writeFile`, supports an `exclude` glob list) and with `access: "readwrite"` syncs sandbox edits and any newly-created files back to the host folder after the harness command completes. `RuntimeRepositoryConfig` runs `git clone` inside the sandbox at `mountPath` with optional `branch` checkout and `depth` shallow-clone; local-path sources are converted to `file://...` to preserve git semantics, and shallow clones with a branch use `--branch` on the clone itself (since `git checkout` of a non-default branch fails on a shallow clone). Both emit lifecycle transcript events (`folder.materialize.started/completed/failed`, `folder.syncback.started/completed/failed`, `repository.materialize.started/completed/failed`) and run before the package setup commands so any setup that depends on the cloned tree or the mounted folder sees them ready.
- Added `destroy()` to `AgentSessionResult` in `cyrus-agent-runtime` — equates to ComputeSDK's `ProviderSandbox.destroy()` for ComputeSDK-backed providers (deletes the remote sandbox, releases compute resources) and is a no-op for the local provider. Idempotent. Lets consumers hold only the result, consume the events/result, and tear down without keeping a session reference.
- Decoupled `AgentSession.stop()` from sandbox destruction. `stop()` now cancels the in-flight harness only — aborts the running process, closes the live event stream, closes the input pipe — and leaves the sandbox alive. Sandbox teardown is the sole responsibility of the new `destroy()` method, which exists symmetrically on both `AgentSession` and `AgentSessionResult` (sharing a one-shot internal teardown promise). `AgentSession.destroy()` also implicitly cancels an in-flight run via `stop()` before releasing the sandbox, so callers don't need a two-step. Decoupling enables future workflows that reuse a warm sandbox across runs (per CYPACK-1209) — a single run's `stop()` no longer destroys shared compute.

## [0.2.50] - 2026-04-30

### Added
Expand Down
11 changes: 9 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
"qs": ">=6.14.2",
"vite": ">=7.1.11",
"zod": "4.3.6",
"hono": ">=4.12.7",
"hono": ">=4.12.18",
"@hono/node-server": ">=1.19.10",
"rollup": ">=4.59.0",
"flatted": ">=3.4.0",
Expand All @@ -67,7 +67,14 @@
"diff": ">=8.0.3",
"@tootallnate/once": ">=3.0.1",
"@isaacs/brace-expansion": ">=5.0.1",
"tar": ">=7.5.11"
"tar": ">=7.5.11",
"fast-uri": ">=3.1.2",
"ip-address": ">=10.1.1",
"@opentelemetry/sdk-node": ">=0.217.0",
"@opentelemetry/exporter-prometheus": ">=0.217.0",
"@opentelemetry/otlp-transformer>protobufjs": ">=8.0.2",
"@anthropic-ai/sdk": ">=0.91.1",
"@daytonaio/sdk": ">=0.175.0"
}
},
"lint-staged": {
Expand Down
44 changes: 44 additions & 0 deletions packages/agent-runtime/ASSUMPTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Agent Runtime Assumptions

This package is intentionally built as a new standalone runtime layer with minimal dependency on the existing Cyrus runner packages.

## Product Contract

- The package exposes a TypeScript library API first. It does not ship a daemon or CLI in this iteration.
- A session has one Cyrus-owned `sessionId`. Harness-native session identifiers are represented as transcript metadata when a harness emits them.
- Transcript events preserve raw harness JSON whenever possible and wrap it in a stable runtime envelope.
- `addMessage()` queues messages for harnesses that do not support interactive stdin yet. The queue is visible and testable, but delivery is capability-gated.
- `interrupt()` is a soft user-message interruption when supported. `stop()` is lifecycle cancellation and attempts to terminate the running process.

## Harness Contract

- Claude, Codex, Cursor, Gemini, PI, and OpenCode are represented as harness adapters.
- Claude, Codex, Cursor, and Gemini command-line conventions are modeled from locally available CLIs and existing public behavior.
- PI and OpenCode are provisional adapters. Their commands and JSON formats are assumptions until real CLI transcripts are supplied.
- Harness adapters own command construction and transcript parsing. They do not own sandbox provisioning.

## Sandbox Contract

- Local execution is modeled as a sandbox provider. This keeps local and remote execution behind the same conceptual interface.
- ComputeSDK is the vendor abstraction for remote sandbox providers.
- The common ComputeSDK `runCommand()` API is treated as sufficient for one-shot harness runs.
- Streaming process execution is modeled as a capability, but is not assumed for every ComputeSDK provider. Full interactive harness support requires a provider-specific streaming process implementation.
- Volumes, FUSE mounts, snapshots, ports, and network egress are represented in config types even when a provider cannot enforce them yet.
- Daytona's ComputeSDK provider was smoke-tested with a remote working directory of `/home/daytona`; `/workspace` should not be assumed portable across providers.
- Cursor Agent was smoke-tested inside Daytona by installing the CLI with `curl https://cursor.com/install -fsS | bash` and running `/home/daytona/.local/bin/cursor-agent` with `CURSOR_API_KEY` provided as a secret environment variable.
- Codex Agent was smoke-tested inside Daytona far enough to authenticate and start a turn by materializing `~/.codex/auth.json` as a sensitive runtime file. Passing only `OPENAI_API_KEY` from the local Codex auth file produced a remote 401. The authenticated Codex turn later hit the account usage limit.
- Claude Code was smoke-tested inside Daytona by installing the CLI with a user-local npm prefix and running `/home/daytona/.npm-global/bin/claude` with `CLAUDE_CODE_OAUTH_TOKEN` provided as a secret environment variable. The remote session emitted `system`/`assistant`/`result` events and completed successfully.

## Security Contract

- `env` is safe-to-log configuration. `secrets` must be redacted from transcript and error metadata.
- Secrets are passed into process environments only at execution time.
- Tool permissions are represented as declarative runtime config and translated into harness-native flags where currently known.
- Network egress policy is a declarative provider option in this iteration. Enforcement depends on the selected sandbox provider.

## Feedback Loops

- Config schema tests prove the public contract accepts and rejects expected shapes.
- Local sandbox tests prove the local provider can write files and execute commands.
- Harness adapter tests prove command construction and transcript parsing.
- Session runtime tests prove event emission, queueing, stop behavior, and result propagation.
223 changes: 223 additions & 0 deletions packages/agent-runtime/VALIDATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
# Agent Runtime Validation

## Automated Checks

Run from the repository root:

```bash
pnpm --filter cyrus-agent-runtime typecheck
pnpm --filter cyrus-agent-runtime test:run
pnpm --filter cyrus-agent-runtime build
```

Current coverage:

- Harness command construction and transcript parsing for Claude, Codex, Cursor, Gemini, PI, and OpenCode.
- Local sandbox filesystem and command execution.
- ComputeSDK sandbox wrapper with fake provider.
- Session lifecycle, queued messages, setup commands, transcript events, and result extraction.

## Real Local Harness Smoke

This validates `AgentRuntime`, the local sandbox provider, real `codex exec --json`, transcript event parsing, and result extraction.

```bash
node --input-type=module -e "
import { createAgentSession } from './packages/agent-runtime/dist/index.js';
const session = await createAgentSession({
sessionId: 'smoke-codex',
harness: { kind: 'codex', model: 'gpt-5.2' },
userPrompt: 'Reply exactly: runtime smoke ok',
sandbox: { provider: 'local', workingDirectory: process.cwd() }
});
const result = await session.start();
console.log(JSON.stringify({
success: result.success,
result: result.result,
eventCount: result.events.length
}));
"
```

Observed result:

```json
{"success":true,"result":"runtime smoke ok","eventCount":4}
```

## Real Daytona Harness Smoke

This validates the full remote path: `AgentRuntime`, real ComputeSDK Daytona provider, remote sandbox create/destroy, declarative setup commands inside the sandbox, remote Cursor Agent install, real `cursor-agent --print --output-format stream-json`, transcript events emitted by the agent session running inside Daytona, and result extraction.

Prerequisites:

- `DAYTONA_API_KEY` in the environment.
- `CURSOR_API_KEY` in the environment.
- The package has been built with `pnpm --filter cyrus-agent-runtime build`.

Run from `packages/agent-runtime`:

```bash
node --input-type=module - <<'JS'
import { daytona } from '@computesdk/daytona';
import { createAgentSession } from './dist/index.js';
import { createComputeSdkSandboxProvider } from './dist/sandbox/compute-sdk.js';

const provider = createComputeSdkSandboxProvider({
compute: daytona({ apiKey: process.env.DAYTONA_API_KEY, timeout: 300000 }),
});
const transcriptKinds = [];
const transcriptRawTypes = [];
let sandboxToDestroy;
const trackingProvider = {
provider: 'daytona',
async create(config) {
const sandbox = await provider.create(config);
sandboxToDestroy = sandbox;
return sandbox;
},
};

try {
const session = await createAgentSession(
{
sessionId: 'daytona-cursor-smoke',
harness: {
kind: 'cursor',
command: '/home/daytona/.local/bin/cursor-agent',
},
userPrompt: 'Reply exactly: daytona cursor event smoke ok',
env: {
PATH: '/home/daytona/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
},
secrets: {
CURSOR_API_KEY: process.env.CURSOR_API_KEY,
},
packages: {
commands: [
'curl https://cursor.com/install -fsS | bash',
'/home/daytona/.local/bin/cursor-agent --version',
],
},
sandbox: {
provider: 'daytona',
name: `agent-runtime-cursor-${Date.now()}`,
workingDirectory: '/home/daytona',
timeoutMs: 300000,
metadata: { purpose: 'agent-runtime-cursor-event-smoke' },
},
},
{
sandboxProviders: { daytona: trackingProvider },
callbacks: {
onTranscriptEvent(event) {
transcriptKinds.push(event.kind);
if (event.raw && typeof event.raw === 'object' && 'type' in event.raw) {
transcriptRawTypes.push(event.raw.type);
}
},
},
},
);

const result = await session.start();
console.log(JSON.stringify({
success: result.success,
result: result.result,
eventCount: result.events.length,
transcriptKinds,
transcriptRawTypes,
sandboxId: sandboxToDestroy?.sandboxId,
}));
} finally {
if (sandboxToDestroy) {
await sandboxToDestroy.destroy();
}
}
JS
```

Observed result:

```json
{
"success": true,
"result": "daytona cursor event smoke ok",
"eventCount": 8,
"transcriptKinds": [
"setup.started",
"setup.completed",
"setup.started",
"setup.completed",
"system",
"user",
"assistant",
"result"
],
"transcriptRawTypes": ["system", "user", "assistant", "result"]
}
```

## Real Daytona Codex Auth Probe

Codex was validated inside Daytona through runtime-managed sensitive file materialization:

- `~/.codex/auth.json` was written with `sensitive: true`, and transcript events redacted the content.
- `@openai/codex` installed successfully inside Daytona.
- `codex exec --json --skip-git-repo-check` emitted `thread.started` and `turn.started`.
- Passing only `OPENAI_API_KEY` from local Codex auth produced a remote 401.
- Using `~/.codex/auth.json` authenticated, but the turn hit the account usage limit before completion.

Observed authenticated-but-limited result:

```json
{
"success": false,
"exitCode": 1,
"events": [
{
"kind": "error",
"raw": {
"type": "error",
"message": "You've hit your usage limit..."
}
},
{
"kind": "turn.failed"
}
]
}
```

## Real Daytona Claude Smoke

Claude Code was validated inside Daytona with an explicit portable Claude Code OAuth token provided as a secret environment variable:

- `@anthropic-ai/claude-code` installed successfully with a user-local npm prefix.
- `claude --version` returned `2.1.142 (Claude Code)`.
- `claude -p ... --output-format stream-json --verbose` emitted `system`, `assistant`, and `result` events inside Daytona.
- The remote Claude session completed successfully with the exact requested result.

Observed runtime result:

```json
{
"success": true,
"exitCode": 0,
"result": "daytona claude event smoke ok",
"eventCount": 9,
"eventKinds": [
"setup.started",
"setup.completed",
"setup.started",
"setup.completed",
"setup.started",
"setup.completed",
"system",
"assistant",
"result",
"stop.requested"
],
"transcriptKinds": ["system", "assistant", "result"]
}
```
33 changes: 33 additions & 0 deletions packages/agent-runtime/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
{
"name": "cyrus-agent-runtime",
"version": "0.2.51",
"description": "Unified agent harness runtime with pluggable sandbox providers",
"type": "module",
"main": "dist/index.js",
"types": "dist/index.d.ts",
"files": [
"dist",
"ASSUMPTIONS.md",
"VALIDATION.md"
],
"scripts": {
"build": "tsc",
"dev": "tsc --watch",
"test": "vitest",
"test:run": "vitest run --passWithNoTests",
"typecheck": "tsc --noEmit"
},
"dependencies": {
"@computesdk/daytona": "^1.7.26",
"computesdk": "^4.0.0",
"zod": "^4.3.6"
},
"devDependencies": {
"@types/node": "^20.0.0",
"typescript": "^5.3.3",
"vitest": "^3.1.4"
},
"publishConfig": {
"access": "public"
}
}
Loading