[Feature] Isolated containers for each conversation

The legacy "Local GUI" by default ran each conversation in its own container. This was slow and resource-intensive, and not where the industry went. But it remains a helpful option for the security-conscious!

The bulk of this document is AI-generated but is the fruit of many hours of kicking ideas around with an agent that could access the relevant codebases.

---

# Per-conversation container isolation — options

Goal: give each conversation its own container so tool calls in one can't affect
another, while keeping one backend and one conversation list in the frontend.

## Where we already are

- **Local backend**: one long-lived agent-server. All conversations share its
  process, filesystem, and `/workspace/project`. No isolation.
- **Cloud backend**: a control plane (`app.all-hands.dev`) owns the conversation
  index; each conversation gets a sandbox pod via a Runtime API. Frontend talks
  through `/api/cloud-proxy` and routes per-conversation traffic to the sandbox
  via `conversation_url` + `X-Session-API-Key`.

Local and cloud already sit at opposite corners of two orthogonal axes:
(a) where the conversation index lives, and (b) whether the agent-server is
long-lived or ephemeral per conversation. Real isolation puts you in cloud's
corner regardless of how you get there.

## Option 1 — Use OpenHands OSS as the backend

OpenHands OSS (`openhands/app_server/` in `OpenHands/OpenHands`) is,
structurally, the open-source cloud control plane. Its `/api/v1` surface —
`app-conversations`, `sandboxes` (with `SandboxStatus`, `ExposedUrl`,
`session_api_key`, `conversation_url`), `settings`, `secrets`, `git`, `events`
— uses the same model names, field names, and status enum values that
agent-canvas's "cloud" branches already speak. Its `DockerSandboxService`
already provisions a per-conversation agent-server container, allocates host
ports, sends event webhooks back to the control plane, and tracks sandbox
lifecycle (`STARTING` → `RUNNING` → `PAUSED` → `ERROR` → `MISSING`).

- **Frontend:** registers as `kind: "cloud"`. One small fix likely needed:
  cloud-proxy auth toggle. OSS uses `X-Session-API-Key` (set via
  `SESSION_API_KEY` env var); agent-canvas's `callCloudProxy` defaults to
  `Authorization: Bearer`. Per-backend auth-mode is a tiny change.
- **Reuses:** ~everything. Most of the work was already done by the OpenHands
  team. An OSS deployment registers as a `kind: "cloud"` backend in
  agent-canvas's "Manage backends" UI.
- **Adds:** nothing on the backend side that doesn't already exist.
- **Remote-VM story:** [OpenHands/OpenHands#13516](https://github.com/OpenHands/OpenHands/pull/13516)
  adds a WebSocket gateway that routes all sandbox traffic through the single
  app-server port. With it enabled (`ENABLE_WEBSOCKET_GATEWAY=true`), the
  browser only ever talks to the OpenHands app-server's URL — no per-sandbox
  ports to expose, no wildcard DNS, no ingress work. That solves the
  hardest piece of "run this on a VM behind one URL." The sibling
  [#13551](https://github.com/OpenHands/OpenHands/pull/13551) adds the same
  treatment for VS Code, and [#12591](https://github.com/OpenHands/OpenHands/pull/12591)
  is an earlier take on the K8s side.
- **Trade-off:** the WS-gateway PR is open with requested changes and not yet
  merged. Until it lands (or we ship a fork), VM deployments either expose
  random sandbox ports (OSS default, `OH_SANDBOX_CONTAINER_URL_PATTERN`) or
  add their own ingress.

## Option 2 — Standalone control plane (k8s "cloud-shaped" backend)

Ship a small FastAPI service that mimics the cloud control plane: owns the
conversation index in `meta.json`/`sandbox.json` on a PVC, provisions a pod per
conversation via `kube-apiserver`, exposes `sandbox_status` + `conversation_url`
to the frontend. Pods stream events back via the agent-server's existing
`webhooks` config.

- **Frontend:** registers as `kind: "cloud"`. No widening of `BackendKind`.
- **Reuses:** the frontend's existing cloud branches (~88 sites).
- **Adds:** a new component (the control plane) + helm chart.
- **Persistence:** mirror of local agent-server's JSON-on-disk layout
  (`FileSettingsStore`, `EventLog`, `meta.json` per conv) on the control-plane
  PVC. No SQLite. Events flow control-plane-ward via webhooks so they survive
  pod death.
- **Trade-off:** structurally clean and matches cloud, but it's a separate
  service to build, deploy, and maintain — and it's mostly re-implementing
  what Option 1 (OpenHands OSS) already provides.

## Option 3 — Thin "tool runner" image

Build a stripped agent-server image that exposes only the tool surface
(`/api/bash/*`, `/api/files/*`, `/api/git/*`) — no `/api/conversations/*`, no
`/sockets`, no agent loop. Outer agent-server keeps owning the conversation
list and event store, but its tools dispatch over HTTP into per-conversation
runner containers.

- **Frontend:** zero changes. Stays in local mode.
- **Reuses:** outer agent-server's existing persistence verbatim.
- **Adds:** a new server image + a new `RemoteWorkspace`-shaped client in the
  SDK that doesn't depend on `/api/conversations/*`.
- **Trade-off:** smallest per-conversation footprint, but real SDK work and a
  new image to maintain. Doesn't exist today.

## Option 4 — Nest agent-servers

Make the agent-server pluggable enough to host conversations that live on
*other* agent-servers. One aggregator process speaks the existing agent-server
HTTP API to the frontend; internally it spawns a downstream agent-server per
conversation (Docker, k8s, pre-warmed pool, whatever) and forwards
per-conversation HTTP + WebSocket traffic to it. Settings/secrets live on the
aggregator; downstreams pull them via the existing `LookupSecret` URL pattern.

- **Frontend:** zero changes. One backend, one conversation list.
- **Reuses:** all existing agent-server persistence code, the `cloud_proxy`
  forwarding skeleton, the per-conversation `PubSub` as a free event-mirror
  tap, `RemoteConversation`'s attach-by-ID resume path.
- **Adds (inside the agent-server):**
  1. A pluggable `ConversationService` interface with `Local` + `Remote` +
     `Composite` implementations.
  2. A conversation-aware HTTP forwarder for `/api/conversations/{id}/*`.
  3. A WebSocket proxy for `/sockets/conversation/{id}/events` (the only
     genuinely new code — frame-pumping with reconnect + handshake replay).
  4. A `ServerProvisioner` abstraction (`Docker`, `K8s`, `Pool`, `Existing`).
- **Trade-off:** the cloud/local divide collapses in the frontend, but the
  real work is inside the agent-server itself, including one piece (WS proxy)
  with non-trivial bug surface. Note: OpenHands OSS's WS gateway PR
  ([#13516](https://github.com/OpenHands/OpenHands/pull/13516)) is roughly
  the same primitive at the app-server layer.

## Option 5 — Per-tool-call ephemeral containers

Instead of a long-lived container per conversation, spin up a fresh container
per *tool invocation* (one bash command, one file edit, one git op). Container
lives milliseconds-to-seconds, dies, next call gets a fresh one. Workspace dir
is a host-mounted volume per conversation so writes persist across tool calls.

- **Frontend:** zero changes.
- **Isolation property:** tools can't even see each other's side effects in
  memory; every call is a hermetic environment. Strongest per-call isolation
  of any option here.
- **Trade-off:** cold-start latency on *every* tool call, even with image
  caching and warm pools. Stateful bash sessions (`source venv/bin/activate`
  then `python …` in the next call) get awkward — you either lose state or
  reimplement session continuity over container boundaries. Mostly interesting
  for security-critical deployments where re-execution cost is acceptable.

## Option 6 — VM-level isolation (Firecracker / Kata / gVisor)

Same architecture as Option 1, 2, or 4, but the per-conversation sandbox is a
microVM (Firecracker, Kata) or a gVisor-wrapped container instead of a plain
container. Provisioner picks the runtime; nothing above it cares.

- **Frontend:** zero changes.
- **Isolation property:** real kernel boundary, not just namespaces. Defends
  against container-escape CVEs and kernel-syscall bugs in a way containers
  can't.
- **Trade-off:** seconds of cold start, hundreds of MB resident per
  conversation, more infrastructure (image build pipeline, snapshot tooling).
  Mostly "Option 1 / 2 / 4 with a different provisioner" rather than its own
  thing.

## Option 7 — OS-level sandboxing in one process (seccomp / landlock / AppArmor)

Don't spawn anything. Keep the single-process local agent-server, but run each
conversation's tool calls under a per-conversation seccomp filter + landlock
ruleset that restricts syscalls and filesystem access to that conversation's
working directory.

- **Frontend:** zero changes.
- **Isolation property:** stops accidental and most casual-malicious
  cross-contamination at the syscall layer. Cheap — no per-conversation
  process or container.
- **Trade-off:** Linux-only; seccomp/landlock policies are tricky to write
  correctly and easy to under-restrict; doesn't isolate memory or network
  effects between conversations; non-trivial to compose with browser/MCP tools
  that themselves spawn subprocesses. Realistic as a *complement* to the
  status quo (raise the bar without a structural redesign), not a replacement
  for container-level isolation.

## Option 8 — Copy-on-write workspace per conversation, shared process

Use a COW filesystem (overlayfs, btrfs subvolumes, ZFS clones) to give each
conversation its own snapshot of `/workspace/project` while the agent-server
itself stays single-process. Deleting a conversation rolls back its snapshot.

- **Frontend:** zero changes.
- **Isolation property:** persistent filesystem effects can't bleed between
  conversations, and you get free "reset workspace" semantics. But this
  isolates *files*, not *processes* — a runaway bash command can still affect
  global system state, kill sibling processes, exhaust memory, etc.
- **Trade-off:** mostly addresses "I committed to the wrong conversation's
  workspace" rather than "this conversation could attack the host." Honest as
  a low-cost ergonomic improvement, not as a security boundary.

## Network topology (remote / VM deployment)

When the isolated-container backend runs on a remote VM (not the user's
laptop), the *browser* needs to reach two things:

1. The control-plane / aggregator API (HTTP).
2. The per-conversation sandbox, for at least the WebSocket
   `/sockets/conversation/<id>/events` and the sandbox-direct REST calls
   (`hostOverride` calls in `callCloudProxy`).

Item 1 is easy — agent-canvas already proxies cloud-mode HTTP through the
local bundled agent-server's `/api/cloud-proxy`. The hard one is item 2:
sandbox URLs must be browser-reachable somehow.

| Approach | Infra on the VM | Browser-facing URLs | Cost |
|---|---|---|---|
| **Expose random ports** (OSS OpenHands' OOTB) | Open port range in firewall | `http://vm:30001`, `http://vm:30002`, … | No HTTPS without per-port termination; CORS per port; mixed-content blocks if frontend is HTTPS |
| **Subdomain per sandbox** | Wildcard DNS + reverse proxy + wildcard TLS | `https://abc.sandbox.vm.example.com` | Cleanest browser story; needs DNS infra and a wildcard cert |
| **Path-prefix ingress** | Reverse proxy (nginx/caddy/traefik) with dynamic routing | `https://vm.example.com/sandbox/abc/…` | One cert, one origin; routing config has to update as sandboxes come/go |
| **Proxy through the app-server / aggregator** | None beyond the app-server | `https://vm.example.com/…` only | Needs an HTTP+WS reverse proxy inside the app-server |

The last row is exactly what [OpenHands/OpenHands#13516](https://github.com/OpenHands/OpenHands/pull/13516)
ships for the WebSocket side, and what [#13551](https://github.com/OpenHands/OpenHands/pull/13551)
ships for VS Code. Once those land (or are picked up via fork/cherry-pick),
Option 1 — Use OpenHands OSS — gets the clean one-port one-origin story
out of the box. Until then, OSS-on-a-VM still needs one of the ingress
patterns above.

The same proxy primitive is also Option 4's load-bearing piece of new code.
For VM deployments, that work is no longer "extra cost" — it's the answer.

## Non-options we ruled out

- **`DockerWorkspace` + `Conversation` per conversation in-process.** Looked
  like a one-line config flip but isn't: wrapping a conversation in a
  `DockerWorkspace` produces a `RemoteConversation`, which means the entire
  conversation (agent loop, LLM calls, events) lives in the container. The
  outer process becomes a thin RPC client — structurally identical to Option 1
  but assembled ad-hoc.
- **A new `BackendKind` in agent-canvas.** Widening `"local" | "cloud"` would
  fan out to ~88 branch sites for no gain. Whatever ships should reuse one of
  the two existing kinds.
- **Bind-mounting host directories into pods.** Only works in single-node
  KIND with `extraMounts`, doesn't generalise, and runs into the agent-server's
  own "flock does not work reliably on NFS" warning if you push it.

## Comparison

Options 1–4 are architectural — they answer "where does conversation state
live and what runs where." Options 5–8 are mostly isolation-primitive choices
that compose with the architecture above.

| Option                              | Isolation strength | Per-conv overhead | Frontend changes | New components | VM-deploy story |
|-------------------------------------|--------------------|-------------------|------------------|----------------|-----------------|
| 1 — Use OpenHands OSS               | Container          | Container         | Small (auth)     | 0 (use OSS)    | Native once #13516 lands; ingress otherwise |
| 2 — Standalone control plane        | Container          | Pod               | Cloud branches   | 1 service      | Needs ingress or own WS proxy |
| 3 — Thin tool-runner image          | Container (tools)  | Tools container   | None             | 1 image        | Native (outer owns origin) |
| 4 — Nested agent-servers            | Container          | Container         | None             | 0              | Native (aggregator owns origin) |
| 5 — Per-tool-call containers        | Container per call | Per call          | None             | 0–1            | Inherits from 1/2/4 |
| 6 — VM / Firecracker / gVisor       | Kernel boundary    | MicroVM           | None             | 0 (provisioner)| Inherits from 1/2/4 |
| 7 — seccomp / landlock in-process   | Syscall filter     | None              | None             | 0              | n/a (no extra origin) |
| 8 — COW workspace, shared process   | Filesystem only    | None              | None             | 0              | n/a (no extra origin) |

5–8 are largely substitutable inside 1, 2, or 4: Option 6 is just "pick the
provisioner," Option 5 is a different granularity choice for the same
provisioner, Options 7–8 are complementary hardening that can layer on top of
any architecture (including the status quo).

## Recommendation

**Option 1 — use OpenHands OSS as the backend.** It's the open-source version
of the cloud control plane, it speaks the API surface agent-canvas already
talks to, and the per-conversation Docker isolation is already implemented
and shipped. The only known gap is the WebSocket gateway for VM deployment,
and there's an active community PR
([OpenHands/OpenHands#13516](https://github.com/OpenHands/OpenHands/pull/13516))
that solves exactly that. Concretely:

1. Stand up OpenHands OSS on the VM with `ENABLE_WEBSOCKET_GATEWAY=true`
   (after #13516 lands, or by cherry-picking the branch).
2. Add a session-API-key auth toggle to agent-canvas's `callCloudProxy` so
   cloud-mode backends can authenticate with `X-Session-API-Key` instead of
   bearer.
3. Register the OSS instance as a `kind: "cloud"` backend in agent-canvas.

That's the path with the least new code. Option 4 (nest agent-servers) is
the cleanest long-term architectural answer if we ever outgrow OSS, but
nothing about Option 1 forecloses going there later — both surface the same
API to the frontend. Options 7 and 8 are worth keeping in mind as cheap
hardening layers, but neither is a substitute for the structural change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Isolated containers for each conversation #1269

Per-conversation container isolation — options

Where we already are

Option 1 — Use OpenHands OSS as the backend

Option 2 — Standalone control plane (k8s "cloud-shaped" backend)

Option 3 — Thin "tool runner" image

Option 4 — Nest agent-servers

Option 5 — Per-tool-call ephemeral containers

Option 6 — VM-level isolation (Firecracker / Kata / gVisor)

Option 7 — OS-level sandboxing in one process (seccomp / landlock / AppArmor)

Option 8 — Copy-on-write workspace per conversation, shared process

Network topology (remote / VM deployment)

Non-options we ruled out

Comparison

Recommendation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Approach	Infra on the VM	Browser-facing URLs	Cost
Expose random ports (OSS OpenHands' OOTB)	Open port range in firewall	`http://vm:30001`, `http://vm:30002`, …	No HTTPS without per-port termination; CORS per port; mixed-content blocks if frontend is HTTPS
Subdomain per sandbox	Wildcard DNS + reverse proxy + wildcard TLS	`https://abc.sandbox.vm.example.com`	Cleanest browser story; needs DNS infra and a wildcard cert
Path-prefix ingress	Reverse proxy (nginx/caddy/traefik) with dynamic routing	`https://vm.example.com/sandbox/abc/…`	One cert, one origin; routing config has to update as sandboxes come/go
Proxy through the app-server / aggregator	None beyond the app-server	`https://vm.example.com/…` only	Needs an HTTP+WS reverse proxy inside the app-server

Option	Isolation strength	Per-conv overhead	Frontend changes	New components	VM-deploy story
1 — Use OpenHands OSS	Container	Container	Small (auth)	0 (use OSS)	Native once #13516 lands; ingress otherwise
2 — Standalone control plane	Container	Pod	Cloud branches	1 service	Needs ingress or own WS proxy
3 — Thin tool-runner image	Container (tools)	Tools container	None	1 image	Native (outer owns origin)
4 — Nested agent-servers	Container	Container	None	0	Native (aggregator owns origin)
5 — Per-tool-call containers	Container per call	Per call	None	0–1	Inherits from 1/2/4
6 — VM / Firecracker / gVisor	Kernel boundary	MicroVM	None	0 (provisioner)	Inherits from 1/2/4
7 — seccomp / landlock in-process	Syscall filter	None	None	0	n/a (no extra origin)
8 — COW workspace, shared process	Filesystem only	None	None	0	n/a (no extra origin)

[Feature] Isolated containers for each conversation #1269

Description

Per-conversation container isolation — options

Where we already are

Option 1 — Use OpenHands OSS as the backend

Option 2 — Standalone control plane (k8s "cloud-shaped" backend)

Option 3 — Thin "tool runner" image

Option 4 — Nest agent-servers

Option 5 — Per-tool-call ephemeral containers

Option 6 — VM-level isolation (Firecracker / Kata / gVisor)

Option 7 — OS-level sandboxing in one process (seccomp / landlock / AppArmor)

Option 8 — Copy-on-write workspace per conversation, shared process

Network topology (remote / VM deployment)

Non-options we ruled out

Comparison

Recommendation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions