secure-agent-runner is a standalone execution plane for coding agents, CI
wrappers, MCP tools, eval harnesses, and service gateways. It accepts a
structured RunJobRequest, validates request shape and runner policy,
snapshots a workspace, executes the command through a selected backend,
captures bounded stdout, stderr, artifacts, hashes, and timing metadata, and
returns a deterministic RunJobResult that callers can store, replay, score, or
attach to their own trace systems.
The crate name is secure-agent-runner; the binary is agent-runner.
This version ships:
- Rust library and CLI.
- HTTP API for async job submission, polling, and artifact metadata.
- Bounded HTTP queue with backpressure: full queues return
503. - Local process backend.
- Persistent request, result, stdout, stderr, artifact, snapshot, and status
records under
runs/<job_id>/. - Workspace snapshot copying with protected default exclusions for VCS data, dependency caches, and common secret paths.
- Command allowlists, shell denial by default, environment allowlists, timeouts, capped stdout/stderr, artifact limits, and stable error objects.
- Deny-by-default child environment; env keys must be explicitly allowed by request policy or runner configuration.
- Firecracker backend design plus feature-gated skeleton that reports stable
backend_unavailableon unsupported or unconfigured hosts.
The runner gives one place to govern and record agent command execution. It protects against ambiguous command strings, accidental shell execution, unbounded output, unchecked artifact capture, missing replay metadata, and policy drift between callers.
The local process backend is not a strong sandbox for malicious code. It runs as the host user and cannot prevent a hostile process from using that user's file or network permissions outside the copied workspace. Treat local execution as a policy, reproducibility, and traceability boundary. Use a container backend when it exists, or the planned Firecracker backend on Linux/KVM hosts, for hostile or untrusted workloads.
See docs/threat-model.md for the detailed boundary.
| Backend | Status | Boundary | Notes |
|---|---|---|---|
local_process |
Current | Policy and reproducibility only | Runs argv directly without a shell, snapshots the workspace, scrubs env, caps output, enforces timeout, and records artifacts. It does not enforce network, CPU, memory, or host filesystem isolation. |
| container | Future | Stronger OS boundary when configured correctly | Intended for Docker/Podman-style execution with no host socket, no privileged mounts, default no-network, and image allowlists. Not implemented in this crate yet. |
firecracker |
Planned/skeleton | Preferred production isolation boundary | Stays behind the same RunJobRequest/RunJobResult contract. Current code rejects unsafe request-supplied assets and returns backend_unavailable unless future real microVM execution is implemented and configured. |
Validate an example request:
cargo run --bin agent-runner -- validate \
--request examples/pytest-request.jsonRun a single request directly:
cargo run --bin agent-runner -- run \
--request examples/pytest-request.json \
--store-dir ./agent-runner-runs \
--backend local-process \
--out result.jsonInspect a captured denial or timeout:
cargo run --bin agent-runner -- run \
--request examples/disallowed-shell-request.json
cargo run --bin agent-runner -- run \
--request examples/timeout-request.jsonReplay from stored request metadata:
cargo run --bin agent-runner -- replay \
--run agent-runner-runs/runs/<job_id>run and replay print a RunJobResult JSON document unless --out is
provided. They exit 0 when the runner successfully produces a result,
including captured non-zero command exits, policy denials, and timeouts.
Non-zero CLI exits are reserved for operational failures such as invalid JSON,
an unreadable request file, validation failure in validate, or an unwritable
--out path.
Requests use explicit argv arrays, not shell strings by default:
{
"trace": {
"trace_id": "tr_example",
"agent_id": "example-agent",
"workspace_id": "repo_current"
},
"workspace": {
"source": "local_path",
"path": ".",
"snapshot_strategy": "copy",
"include": ["src/**", "tests/**", "pyproject.toml", "pytest.ini"],
"exclude": [".git/**", ".venv/**", "target/**", "node_modules/**"]
},
"command": {
"argv": ["python", "-m", "pytest", "-q"],
"cwd": ".",
"env": {
"PYTHONPATH": "."
}
},
"policy": {
"allowed_commands": ["python", "pytest"],
"allow_shell": false,
"network": "disabled",
"write_policy": "workspace_only",
"allowed_env": ["PYTHONPATH"],
"artifact_globs": ["coverage.xml", "test-results/**"]
},
"limits": {
"timeout_ms": 120000,
"max_stdout_bytes": 1048576,
"max_stderr_bytes": 1048576,
"max_artifact_bytes": 10485760,
"max_artifact_file_bytes": 5242880,
"max_artifact_count": 128,
"max_run_record_bytes": 2097152,
"memory_mb": 1024,
"cpu_count": 2
},
"backend": {
"kind": "local_process"
}
}The result shape is the same for the library, CLI, and HTTP API:
{
"job_id": "job_7f4f0b7e0b014a9da52e93e2eb44f16a",
"status": "completed",
"backend": "local_process",
"started_at": "2026-05-10T20:00:00Z",
"ended_at": "2026-05-10T20:00:01Z",
"duration_ms": 1000,
"command": {
"argv": ["python", "-m", "pytest", "-q"],
"cwd": "."
},
"exit_code": 0,
"stdout": {
"text": "3 passed\n",
"truncated": false,
"sha256": "..."
},
"stderr": {
"text": "",
"truncated": false,
"sha256": "..."
},
"artifacts": [
{
"path": "coverage.xml",
"kind": "file",
"size_bytes": 1234,
"sha256": "...",
"storage_uri": "file://agent-runner-runs/runs/job_.../artifacts/coverage.xml"
}
],
"replay": {
"request_sha256": "...",
"workspace_snapshot_sha256": "...",
"result_sha256": "..."
},
"policy_decision": {
"action": "allow",
"policy_version": "sha256:...",
"messages": []
}
}Policy denials, setup failures, backend unavailability, and timeouts use the
same result envelope with a stable error object:
{
"status": "policy_denied",
"error": {
"code": "policy.shell_denied",
"message": "shell execution is denied by policy",
"details": {
"argv0": "sh"
}
}
}See docs/api.md for the full contract.
Start the service:
cargo run --bin agent-runner -- serve \
--addr 127.0.0.1:3000 \
--store-dir ./agent-runner-runs \
--queue-capacity 1024Submit and poll a job:
curl -sS -X POST http://127.0.0.1:3000/v1/jobs \
-H 'content-type: application/json' \
-d @examples/pytest-request.json
curl -sS http://127.0.0.1:3000/v1/jobs/<job_id>
curl -sS http://127.0.0.1:3000/v1/jobs/<job_id>/artifactsRoutes:
GET /healthreturns{ "status": "ok" }.POST /v1/jobsreturns202with{ "job_id": "...", "status": "queued" }.GET /v1/jobs/{job_id}returns queued, running, or terminal state.GET /v1/jobs/{job_id}/artifactsreturns artifact metadata for terminal jobs.
Request shape errors return 422. Queue backpressure returns 503. Policy
denials are terminal RunJobResult objects, not HTTP 500s.
By default, agent-runner writes under ./agent-runner-runs:
agent-runner-runs/
+-- runs/
+-- <job_id>/
+-- request.json
+-- result.json
+-- status.json
+-- stdout.txt
+-- stderr.txt
+-- snapshot/
+-- artifacts/
+-- ...
request.json is the normalized request with assigned job id. result.json is
the terminal RunJobResult. stdout.txt and stderr.txt mirror the retained
capped output. snapshot/ is the copied workspace used for execution unless a
separate configured snapshot root is used. artifacts/ contains only accepted
artifact files copied out after execution.
Runner policy is intentionally narrower than caller or gateway policy. The runner enforces:
- command allowlists with optional detailed executable policies;
- shell denial by default;
- relative
command.cwdinside the workspace snapshot; - deny-by-default child environment with explicit
allowed_env; - timeout and output byte caps;
- artifact include, deny, count, per-file byte, and total byte limits;
- backend selection and backend-specific request constraints.
Gateway, orchestrator, or product policy should own identity, authentication,
user approval, budgets, and broader business rules. If a gateway fronts the
runner, record the gateway decision in trace metadata and let the runner still
enforce its own command and artifact policy.
Artifacts are opt-in through policy.artifact_globs. Collection is defensive:
- paths are normalized relative to the snapshot root;
- absolute paths,
.., and symlink escapes are rejected; - default protected paths cover secrets and dependency caches;
- caller deny globs are applied in addition to defaults;
- count, per-file, and total-size limits are enforced;
- every accepted file is copied to
runs/<job_id>/artifacts/and hashed; - artifact metadata is returned through
RunJobResult.artifacts.
Artifact collection errors become stable runner errors such as
policy.artifact_denied or run.artifact_collection_failed.
Firecracker is the preferred production backend target for hostile or untrusted
jobs, but real microVM execution is not implemented in this slice. The current
backend is a design plus skeleton that keeps Firecracker behind the same public
contract, rejects request-supplied kernel/rootfs/image paths, validates cheap
host prerequisites where possible, and returns backend_unavailable with a
stable error on unsupported or unconfigured hosts.
A production Firecracker backend will require:
- Linux on a supported architecture;
- readable and writable
/dev/kvm; - matched
firecrackerandjailerbinaries; - root-owned, non-world-writable kernel, rootfs, image, and jail directories;
- cgroup v1 or v2 support;
- vsock support;
- an immutable rootfs containing a guest runner service;
- server-side profile allowlists for kernel, rootfs, boot args, uid/gid, cgroups, and default no-network behavior.
See docs/firecracker-backend.md for the design, jailer strategy, workspace-drive model, vsock protocol, and cleanup path.
Run the normal local quality bar:
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all-targets
cargo test --docValidate direct RunJobRequest examples:
cargo run --bin agent-runner -- validate --request examples/pytest-request.json
cargo run --bin agent-runner -- validate --request examples/timeout-request.jsonexamples/disallowed-shell-request.json is a runtime policy-denial example, so
run should return status = "policy_denied" with
error.code = "policy.shell_denied". examples/mcp-sandbox-run.json is a
JSON-RPC wrapper; the embedded params.arguments.request object is validated
by the acceptance tests.
Real Firecracker tests are ignored and require a configured Linux/KVM host:
cargo test --features firecracker -- --ignoredUse the runner directly from Rust:
let result = secure_agent_runner::run_request(request, "./agent-runner-runs").await;Or use the CLI/HTTP API from any agent, CI wrapper, eval harness, or MCP server:
agent-runner run --request request.json --store-dir ./agent-runner-runsOnly the RunJobRequest and RunJobResult contract is required. Callers can
put their own ids and audit metadata into trace, then store returned results
wherever their system expects.
cl-agent, gateway, MCP, and other product integrations are optional adapter concerns. They are not required for library, CLI, or HTTP use.
- MCP
sandbox/run: expose a tool that translates tool arguments intoRunJobRequest, callsPOST /v1/jobs, and returns the queued id or terminalRunJobResult. - Generic gateway policy handoff: let the gateway decide identity, auth, approval, budget, protected path, and route policy before forwarding an already-shaped runner request.
- cl-agent verifier consumption: implement cl-agent's injectable
CommandRunnerby calling the runner and mappingRunJobResult.exit_code,stdout.text, andstderr.textback into verifier command results.
See docs/integrations.md and the optional examples in
examples/ for sketches.
src/model.rs- public JSON request/result contract.src/validation.rs- syntactic validation and path rules.src/policy.rs- command, shell, and env policy enforcement.src/snapshot.rs- workspace snapshot creation and hashing.src/output.rs- bounded concurrent output capture.src/artifacts.rs- defensive artifact collection.src/store.rs- persistent run records.src/api.rs- axum HTTP routes and error mapping.src/queue.rs- bounded async queue and HTTP status lifecycle.src/backend/local_process.rs- first working backend.src/backend/firecracker/- production backend skeleton.docs/- API, threat model, Firecracker design, and optional integrations.