secure-agent-runner

secure-agent-runner is a standalone execution plane for coding agents, CI wrappers, MCP tools, eval harnesses, and service gateways. It accepts a structured RunJobRequest, validates request shape and runner policy, snapshots a workspace, executes the command through a selected backend, captures bounded stdout, stderr, artifacts, hashes, and timing metadata, and returns a deterministic RunJobResult that callers can store, replay, score, or attach to their own trace systems.

The crate name is secure-agent-runner; the binary is agent-runner.

Status

This version ships:

Rust library and CLI.
HTTP API for async job submission, polling, and artifact metadata.
Bounded HTTP queue with backpressure: full queues return 503.
Local process backend.
Persistent request, result, stdout, stderr, artifact, snapshot, and status records under runs/<job_id>/.
Workspace snapshot copying with protected default exclusions for VCS data, dependency caches, and common secret paths.
Command allowlists, shell denial by default, environment allowlists, timeouts, capped stdout/stderr, artifact limits, and stable error objects.
Deny-by-default child environment; env keys must be explicitly allowed by request policy or runner configuration.
Firecracker backend design plus feature-gated skeleton that reports stable backend_unavailable on unsupported or unconfigured hosts.

Threat Model

The runner gives one place to govern and record agent command execution. It protects against ambiguous command strings, accidental shell execution, unbounded output, unchecked artifact capture, missing replay metadata, and policy drift between callers.

The local process backend is not a strong sandbox for malicious code. It runs as the host user and cannot prevent a hostile process from using that user's file or network permissions outside the copied workspace. Treat local execution as a policy, reproducibility, and traceability boundary. Use a container backend when it exists, or the planned Firecracker backend on Linux/KVM hosts, for hostile or untrusted workloads.

See docs/threat-model.md for the detailed boundary.

Backend Matrix

Backend	Status	Boundary	Notes
`local_process`	Current	Policy and reproducibility only	Runs argv directly without a shell, snapshots the workspace, scrubs env, caps output, enforces timeout, and records artifacts. It does not enforce network, CPU, memory, or host filesystem isolation.
container	Future	Stronger OS boundary when configured correctly	Intended for Docker/Podman-style execution with no host socket, no privileged mounts, default no-network, and image allowlists. Not implemented in this crate yet.
`firecracker`	Planned/skeleton	Preferred production isolation boundary	Stays behind the same `RunJobRequest`/`RunJobResult` contract. Current code rejects unsafe request-supplied assets and returns `backend_unavailable` unless future real microVM execution is implemented and configured.

CLI Quickstart

Validate an example request:

cargo run --bin agent-runner -- validate \
  --request examples/pytest-request.json

Run a single request directly:

cargo run --bin agent-runner -- run \
  --request examples/pytest-request.json \
  --store-dir ./agent-runner-runs \
  --backend local-process \
  --out result.json

Inspect a captured denial or timeout:

cargo run --bin agent-runner -- run \
  --request examples/disallowed-shell-request.json

cargo run --bin agent-runner -- run \
  --request examples/timeout-request.json

Replay from stored request metadata:

cargo run --bin agent-runner -- replay \
  --run agent-runner-runs/runs/<job_id>

run and replay print a RunJobResult JSON document unless --out is provided. They exit 0 when the runner successfully produces a result, including captured non-zero command exits, policy denials, and timeouts. Non-zero CLI exits are reserved for operational failures such as invalid JSON, an unreadable request file, validation failure in validate, or an unwritable --out path.

Example Request

Requests use explicit argv arrays, not shell strings by default:

{
  "trace": {
    "trace_id": "tr_example",
    "agent_id": "example-agent",
    "workspace_id": "repo_current"
  },
  "workspace": {
    "source": "local_path",
    "path": ".",
    "snapshot_strategy": "copy",
    "include": ["src/**", "tests/**", "pyproject.toml", "pytest.ini"],
    "exclude": [".git/**", ".venv/**", "target/**", "node_modules/**"]
  },
  "command": {
    "argv": ["python", "-m", "pytest", "-q"],
    "cwd": ".",
    "env": {
      "PYTHONPATH": "."
    }
  },
  "policy": {
    "allowed_commands": ["python", "pytest"],
    "allow_shell": false,
    "network": "disabled",
    "write_policy": "workspace_only",
    "allowed_env": ["PYTHONPATH"],
    "artifact_globs": ["coverage.xml", "test-results/**"]
  },
  "limits": {
    "timeout_ms": 120000,
    "max_stdout_bytes": 1048576,
    "max_stderr_bytes": 1048576,
    "max_artifact_bytes": 10485760,
    "max_artifact_file_bytes": 5242880,
    "max_artifact_count": 128,
    "max_run_record_bytes": 2097152,
    "memory_mb": 1024,
    "cpu_count": 2
  },
  "backend": {
    "kind": "local_process"
  }
}

Example Result

The result shape is the same for the library, CLI, and HTTP API:

{
  "job_id": "job_7f4f0b7e0b014a9da52e93e2eb44f16a",
  "status": "completed",
  "backend": "local_process",
  "started_at": "2026-05-10T20:00:00Z",
  "ended_at": "2026-05-10T20:00:01Z",
  "duration_ms": 1000,
  "command": {
    "argv": ["python", "-m", "pytest", "-q"],
    "cwd": "."
  },
  "exit_code": 0,
  "stdout": {
    "text": "3 passed\n",
    "truncated": false,
    "sha256": "..."
  },
  "stderr": {
    "text": "",
    "truncated": false,
    "sha256": "..."
  },
  "artifacts": [
    {
      "path": "coverage.xml",
      "kind": "file",
      "size_bytes": 1234,
      "sha256": "...",
      "storage_uri": "file://agent-runner-runs/runs/job_.../artifacts/coverage.xml"
    }
  ],
  "replay": {
    "request_sha256": "...",
    "workspace_snapshot_sha256": "...",
    "result_sha256": "..."
  },
  "policy_decision": {
    "action": "allow",
    "policy_version": "sha256:...",
    "messages": []
  }
}

Policy denials, setup failures, backend unavailability, and timeouts use the same result envelope with a stable error object:

{
  "status": "policy_denied",
  "error": {
    "code": "policy.shell_denied",
    "message": "shell execution is denied by policy",
    "details": {
      "argv0": "sh"
    }
  }
}

See docs/api.md for the full contract.

HTTP API Quickstart

Start the service:

cargo run --bin agent-runner -- serve \
  --addr 127.0.0.1:3000 \
  --store-dir ./agent-runner-runs \
  --queue-capacity 1024

Submit and poll a job:

curl -sS -X POST http://127.0.0.1:3000/v1/jobs \
  -H 'content-type: application/json' \
  -d @examples/pytest-request.json

curl -sS http://127.0.0.1:3000/v1/jobs/<job_id>
curl -sS http://127.0.0.1:3000/v1/jobs/<job_id>/artifacts

Routes:

GET /health returns { "status": "ok" }.
POST /v1/jobs returns 202 with { "job_id": "...", "status": "queued" }.
GET /v1/jobs/{job_id} returns queued, running, or terminal state.
GET /v1/jobs/{job_id}/artifacts returns artifact metadata for terminal jobs.

Request shape errors return 422. Queue backpressure returns 503. Policy denials are terminal RunJobResult objects, not HTTP 500s.

Run Storage Layout

By default, agent-runner writes under ./agent-runner-runs:

agent-runner-runs/
+-- runs/
    +-- <job_id>/
        +-- request.json
        +-- result.json
        +-- status.json
        +-- stdout.txt
        +-- stderr.txt
        +-- snapshot/
        +-- artifacts/
            +-- ...

request.json is the normalized request with assigned job id. result.json is the terminal RunJobResult. stdout.txt and stderr.txt mirror the retained capped output. snapshot/ is the copied workspace used for execution unless a separate configured snapshot root is used. artifacts/ contains only accepted artifact files copied out after execution.

Policy Model

Runner policy is intentionally narrower than caller or gateway policy. The runner enforces:

command allowlists with optional detailed executable policies;
shell denial by default;
relative command.cwd inside the workspace snapshot;
deny-by-default child environment with explicit allowed_env;
timeout and output byte caps;
artifact include, deny, count, per-file byte, and total byte limits;
backend selection and backend-specific request constraints.

Gateway, orchestrator, or product policy should own identity, authentication, user approval, budgets, and broader business rules. If a gateway fronts the runner, record the gateway decision in trace metadata and let the runner still enforce its own command and artifact policy.

Artifact Handling

Artifacts are opt-in through policy.artifact_globs. Collection is defensive:

paths are normalized relative to the snapshot root;
absolute paths, .., and symlink escapes are rejected;
default protected paths cover secrets and dependency caches;
caller deny globs are applied in addition to defaults;
count, per-file, and total-size limits are enforced;
every accepted file is copied to runs/<job_id>/artifacts/ and hashed;
artifact metadata is returned through RunJobResult.artifacts.

Artifact collection errors become stable runner errors such as policy.artifact_denied or run.artifact_collection_failed.

Firecracker Support

Firecracker is the preferred production backend target for hostile or untrusted jobs, but real microVM execution is not implemented in this slice. The current backend is a design plus skeleton that keeps Firecracker behind the same public contract, rejects request-supplied kernel/rootfs/image paths, validates cheap host prerequisites where possible, and returns backend_unavailable with a stable error on unsupported or unconfigured hosts.

A production Firecracker backend will require:

Linux on a supported architecture;
readable and writable /dev/kvm;
matched firecracker and jailer binaries;
root-owned, non-world-writable kernel, rootfs, image, and jail directories;
cgroup v1 or v2 support;
vsock support;
an immutable rootfs containing a guest runner service;
server-side profile allowlists for kernel, rootfs, boot args, uid/gid, cgroups, and default no-network behavior.

See docs/firecracker-backend.md for the design, jailer strategy, workspace-drive model, vsock protocol, and cleanup path.

Testing

Run the normal local quality bar:

cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all-targets
cargo test --doc

Validate direct RunJobRequest examples:

cargo run --bin agent-runner -- validate --request examples/pytest-request.json
cargo run --bin agent-runner -- validate --request examples/timeout-request.json

examples/disallowed-shell-request.json is a runtime policy-denial example, so run should return status = "policy_denied" with error.code = "policy.shell_denied". examples/mcp-sandbox-run.json is a JSON-RPC wrapper; the embedded params.arguments.request object is validated by the acceptance tests.

Real Firecracker tests are ignored and require a configured Linux/KVM host:

cargo test --features firecracker -- --ignored

Standalone Use

Use the runner directly from Rust:

let result = secure_agent_runner::run_request(request, "./agent-runner-runs").await;

Or use the CLI/HTTP API from any agent, CI wrapper, eval harness, or MCP server:

agent-runner run --request request.json --store-dir ./agent-runner-runs

Only the RunJobRequest and RunJobResult contract is required. Callers can put their own ids and audit metadata into trace, then store returned results wherever their system expects.

Optional Integrations

cl-agent, gateway, MCP, and other product integrations are optional adapter concerns. They are not required for library, CLI, or HTTP use.

MCP sandbox/run: expose a tool that translates tool arguments into RunJobRequest, calls POST /v1/jobs, and returns the queued id or terminal RunJobResult.
Generic gateway policy handoff: let the gateway decide identity, auth, approval, budget, protected path, and route policy before forwarding an already-shaped runner request.
cl-agent verifier consumption: implement cl-agent's injectable CommandRunner by calling the runner and mapping RunJobResult.exit_code, stdout.text, and stderr.text back into verifier command results.

See docs/integrations.md and the optional examples in examples/ for sketches.

Repository Layout

src/model.rs - public JSON request/result contract.
src/validation.rs - syntactic validation and path rules.
src/policy.rs - command, shell, and env policy enforcement.
src/snapshot.rs - workspace snapshot creation and hashing.
src/output.rs - bounded concurrent output capture.
src/artifacts.rs - defensive artifact collection.
src/store.rs - persistent run records.
src/api.rs - axum HTTP routes and error mapping.
src/queue.rs - bounded async queue and HTTP status lifecycle.
src/backend/local_process.rs - first working backend.
src/backend/firecracker/ - production backend skeleton.
docs/ - API, threat model, Firecracker design, and optional integrations.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
examples		examples
src		src
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

secure-agent-runner

Status

Threat Model

Backend Matrix

CLI Quickstart

Example Request

Example Result

HTTP API Quickstart

Run Storage Layout

Policy Model

Artifact Handling

Firecracker Support

Testing

Standalone Use

Optional Integrations

Repository Layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

secure-agent-runner

Status

Threat Model

Backend Matrix

CLI Quickstart

Example Request

Example Result

HTTP API Quickstart

Run Storage Layout

Policy Model

Artifact Handling

Firecracker Support

Testing

Standalone Use

Optional Integrations

Repository Layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages