Skip to content

[Feature] Isolated containers for each conversation #1269

@rbren

Description

@rbren

The legacy "Local GUI" by default ran each conversation in its own container. This was slow and resource-intensive, and not where the industry went. But it remains a helpful option for the security-conscious!

The bulk of this document is AI-generated but is the fruit of many hours of kicking ideas around with an agent that could access the relevant codebases.


Per-conversation container isolation — options

Goal: give each conversation its own container so tool calls in one can't affect
another, while keeping one backend and one conversation list in the frontend.

Where we already are

  • Local backend: one long-lived agent-server. All conversations share its
    process, filesystem, and /workspace/project. No isolation.
  • Cloud backend: a control plane (app.all-hands.dev) owns the conversation
    index; each conversation gets a sandbox pod via a Runtime API. Frontend talks
    through /api/cloud-proxy and routes per-conversation traffic to the sandbox
    via conversation_url + X-Session-API-Key.

Local and cloud already sit at opposite corners of two orthogonal axes:
(a) where the conversation index lives, and (b) whether the agent-server is
long-lived or ephemeral per conversation. Real isolation puts you in cloud's
corner regardless of how you get there.

Option 1 — Use OpenHands OSS as the backend

OpenHands OSS (openhands/app_server/ in OpenHands/OpenHands) is,
structurally, the open-source cloud control plane. Its /api/v1 surface —
app-conversations, sandboxes (with SandboxStatus, ExposedUrl,
session_api_key, conversation_url), settings, secrets, git, events
— uses the same model names, field names, and status enum values that
agent-canvas's "cloud" branches already speak. Its DockerSandboxService
already provisions a per-conversation agent-server container, allocates host
ports, sends event webhooks back to the control plane, and tracks sandbox
lifecycle (STARTINGRUNNINGPAUSEDERRORMISSING).

  • Frontend: registers as kind: "cloud". One small fix likely needed:
    cloud-proxy auth toggle. OSS uses X-Session-API-Key (set via
    SESSION_API_KEY env var); agent-canvas's callCloudProxy defaults to
    Authorization: Bearer. Per-backend auth-mode is a tiny change.
  • Reuses: ~everything. Most of the work was already done by the OpenHands
    team. An OSS deployment registers as a kind: "cloud" backend in
    agent-canvas's "Manage backends" UI.
  • Adds: nothing on the backend side that doesn't already exist.
  • Remote-VM story: OpenHands/OpenHands#13516
    adds a WebSocket gateway that routes all sandbox traffic through the single
    app-server port. With it enabled (ENABLE_WEBSOCKET_GATEWAY=true), the
    browser only ever talks to the OpenHands app-server's URL — no per-sandbox
    ports to expose, no wildcard DNS, no ingress work. That solves the
    hardest piece of "run this on a VM behind one URL." The sibling
    #13551 adds the same
    treatment for VS Code, and #12591
    is an earlier take on the K8s side.
  • Trade-off: the WS-gateway PR is open with requested changes and not yet
    merged. Until it lands (or we ship a fork), VM deployments either expose
    random sandbox ports (OSS default, OH_SANDBOX_CONTAINER_URL_PATTERN) or
    add their own ingress.

Option 2 — Standalone control plane (k8s "cloud-shaped" backend)

Ship a small FastAPI service that mimics the cloud control plane: owns the
conversation index in meta.json/sandbox.json on a PVC, provisions a pod per
conversation via kube-apiserver, exposes sandbox_status + conversation_url
to the frontend. Pods stream events back via the agent-server's existing
webhooks config.

  • Frontend: registers as kind: "cloud". No widening of BackendKind.
  • Reuses: the frontend's existing cloud branches (~88 sites).
  • Adds: a new component (the control plane) + helm chart.
  • Persistence: mirror of local agent-server's JSON-on-disk layout
    (FileSettingsStore, EventLog, meta.json per conv) on the control-plane
    PVC. No SQLite. Events flow control-plane-ward via webhooks so they survive
    pod death.
  • Trade-off: structurally clean and matches cloud, but it's a separate
    service to build, deploy, and maintain — and it's mostly re-implementing
    what Option 1 (OpenHands OSS) already provides.

Option 3 — Thin "tool runner" image

Build a stripped agent-server image that exposes only the tool surface
(/api/bash/*, /api/files/*, /api/git/*) — no /api/conversations/*, no
/sockets, no agent loop. Outer agent-server keeps owning the conversation
list and event store, but its tools dispatch over HTTP into per-conversation
runner containers.

  • Frontend: zero changes. Stays in local mode.
  • Reuses: outer agent-server's existing persistence verbatim.
  • Adds: a new server image + a new RemoteWorkspace-shaped client in the
    SDK that doesn't depend on /api/conversations/*.
  • Trade-off: smallest per-conversation footprint, but real SDK work and a
    new image to maintain. Doesn't exist today.

Option 4 — Nest agent-servers

Make the agent-server pluggable enough to host conversations that live on
other agent-servers. One aggregator process speaks the existing agent-server
HTTP API to the frontend; internally it spawns a downstream agent-server per
conversation (Docker, k8s, pre-warmed pool, whatever) and forwards
per-conversation HTTP + WebSocket traffic to it. Settings/secrets live on the
aggregator; downstreams pull them via the existing LookupSecret URL pattern.

  • Frontend: zero changes. One backend, one conversation list.
  • Reuses: all existing agent-server persistence code, the cloud_proxy
    forwarding skeleton, the per-conversation PubSub as a free event-mirror
    tap, RemoteConversation's attach-by-ID resume path.
  • Adds (inside the agent-server):
    1. A pluggable ConversationService interface with Local + Remote +
      Composite implementations.
    2. A conversation-aware HTTP forwarder for /api/conversations/{id}/*.
    3. A WebSocket proxy for /sockets/conversation/{id}/events (the only
      genuinely new code — frame-pumping with reconnect + handshake replay).
    4. A ServerProvisioner abstraction (Docker, K8s, Pool, Existing).
  • Trade-off: the cloud/local divide collapses in the frontend, but the
    real work is inside the agent-server itself, including one piece (WS proxy)
    with non-trivial bug surface. Note: OpenHands OSS's WS gateway PR
    (#13516) is roughly
    the same primitive at the app-server layer.

Option 5 — Per-tool-call ephemeral containers

Instead of a long-lived container per conversation, spin up a fresh container
per tool invocation (one bash command, one file edit, one git op). Container
lives milliseconds-to-seconds, dies, next call gets a fresh one. Workspace dir
is a host-mounted volume per conversation so writes persist across tool calls.

  • Frontend: zero changes.
  • Isolation property: tools can't even see each other's side effects in
    memory; every call is a hermetic environment. Strongest per-call isolation
    of any option here.
  • Trade-off: cold-start latency on every tool call, even with image
    caching and warm pools. Stateful bash sessions (source venv/bin/activate
    then python … in the next call) get awkward — you either lose state or
    reimplement session continuity over container boundaries. Mostly interesting
    for security-critical deployments where re-execution cost is acceptable.

Option 6 — VM-level isolation (Firecracker / Kata / gVisor)

Same architecture as Option 1, 2, or 4, but the per-conversation sandbox is a
microVM (Firecracker, Kata) or a gVisor-wrapped container instead of a plain
container. Provisioner picks the runtime; nothing above it cares.

  • Frontend: zero changes.
  • Isolation property: real kernel boundary, not just namespaces. Defends
    against container-escape CVEs and kernel-syscall bugs in a way containers
    can't.
  • Trade-off: seconds of cold start, hundreds of MB resident per
    conversation, more infrastructure (image build pipeline, snapshot tooling).
    Mostly "Option 1 / 2 / 4 with a different provisioner" rather than its own
    thing.

Option 7 — OS-level sandboxing in one process (seccomp / landlock / AppArmor)

Don't spawn anything. Keep the single-process local agent-server, but run each
conversation's tool calls under a per-conversation seccomp filter + landlock
ruleset that restricts syscalls and filesystem access to that conversation's
working directory.

  • Frontend: zero changes.
  • Isolation property: stops accidental and most casual-malicious
    cross-contamination at the syscall layer. Cheap — no per-conversation
    process or container.
  • Trade-off: Linux-only; seccomp/landlock policies are tricky to write
    correctly and easy to under-restrict; doesn't isolate memory or network
    effects between conversations; non-trivial to compose with browser/MCP tools
    that themselves spawn subprocesses. Realistic as a complement to the
    status quo (raise the bar without a structural redesign), not a replacement
    for container-level isolation.

Option 8 — Copy-on-write workspace per conversation, shared process

Use a COW filesystem (overlayfs, btrfs subvolumes, ZFS clones) to give each
conversation its own snapshot of /workspace/project while the agent-server
itself stays single-process. Deleting a conversation rolls back its snapshot.

  • Frontend: zero changes.
  • Isolation property: persistent filesystem effects can't bleed between
    conversations, and you get free "reset workspace" semantics. But this
    isolates files, not processes — a runaway bash command can still affect
    global system state, kill sibling processes, exhaust memory, etc.
  • Trade-off: mostly addresses "I committed to the wrong conversation's
    workspace" rather than "this conversation could attack the host." Honest as
    a low-cost ergonomic improvement, not as a security boundary.

Network topology (remote / VM deployment)

When the isolated-container backend runs on a remote VM (not the user's
laptop), the browser needs to reach two things:

  1. The control-plane / aggregator API (HTTP).
  2. The per-conversation sandbox, for at least the WebSocket
    /sockets/conversation/<id>/events and the sandbox-direct REST calls
    (hostOverride calls in callCloudProxy).

Item 1 is easy — agent-canvas already proxies cloud-mode HTTP through the
local bundled agent-server's /api/cloud-proxy. The hard one is item 2:
sandbox URLs must be browser-reachable somehow.

Approach Infra on the VM Browser-facing URLs Cost
Expose random ports (OSS OpenHands' OOTB) Open port range in firewall http://vm:30001, http://vm:30002, … No HTTPS without per-port termination; CORS per port; mixed-content blocks if frontend is HTTPS
Subdomain per sandbox Wildcard DNS + reverse proxy + wildcard TLS https://abc.sandbox.vm.example.com Cleanest browser story; needs DNS infra and a wildcard cert
Path-prefix ingress Reverse proxy (nginx/caddy/traefik) with dynamic routing https://vm.example.com/sandbox/abc/… One cert, one origin; routing config has to update as sandboxes come/go
Proxy through the app-server / aggregator None beyond the app-server https://vm.example.com/… only Needs an HTTP+WS reverse proxy inside the app-server

The last row is exactly what OpenHands/OpenHands#13516
ships for the WebSocket side, and what #13551
ships for VS Code. Once those land (or are picked up via fork/cherry-pick),
Option 1 — Use OpenHands OSS — gets the clean one-port one-origin story
out of the box. Until then, OSS-on-a-VM still needs one of the ingress
patterns above.

The same proxy primitive is also Option 4's load-bearing piece of new code.
For VM deployments, that work is no longer "extra cost" — it's the answer.

Non-options we ruled out

  • DockerWorkspace + Conversation per conversation in-process. Looked
    like a one-line config flip but isn't: wrapping a conversation in a
    DockerWorkspace produces a RemoteConversation, which means the entire
    conversation (agent loop, LLM calls, events) lives in the container. The
    outer process becomes a thin RPC client — structurally identical to Option 1
    but assembled ad-hoc.
  • A new BackendKind in agent-canvas. Widening "local" | "cloud" would
    fan out to ~88 branch sites for no gain. Whatever ships should reuse one of
    the two existing kinds.
  • Bind-mounting host directories into pods. Only works in single-node
    KIND with extraMounts, doesn't generalise, and runs into the agent-server's
    own "flock does not work reliably on NFS" warning if you push it.

Comparison

Options 1–4 are architectural — they answer "where does conversation state
live and what runs where." Options 5–8 are mostly isolation-primitive choices
that compose with the architecture above.

Option Isolation strength Per-conv overhead Frontend changes New components VM-deploy story
1 — Use OpenHands OSS Container Container Small (auth) 0 (use OSS) Native once #13516 lands; ingress otherwise
2 — Standalone control plane Container Pod Cloud branches 1 service Needs ingress or own WS proxy
3 — Thin tool-runner image Container (tools) Tools container None 1 image Native (outer owns origin)
4 — Nested agent-servers Container Container None 0 Native (aggregator owns origin)
5 — Per-tool-call containers Container per call Per call None 0–1 Inherits from 1/2/4
6 — VM / Firecracker / gVisor Kernel boundary MicroVM None 0 (provisioner) Inherits from 1/2/4
7 — seccomp / landlock in-process Syscall filter None None 0 n/a (no extra origin)
8 — COW workspace, shared process Filesystem only None None 0 n/a (no extra origin)

5–8 are largely substitutable inside 1, 2, or 4: Option 6 is just "pick the
provisioner," Option 5 is a different granularity choice for the same
provisioner, Options 7–8 are complementary hardening that can layer on top of
any architecture (including the status quo).

Recommendation

Option 1 — use OpenHands OSS as the backend. It's the open-source version
of the cloud control plane, it speaks the API surface agent-canvas already
talks to, and the per-conversation Docker isolation is already implemented
and shipped. The only known gap is the WebSocket gateway for VM deployment,
and there's an active community PR
(OpenHands/OpenHands#13516)
that solves exactly that. Concretely:

  1. Stand up OpenHands OSS on the VM with ENABLE_WEBSOCKET_GATEWAY=true
    (after #13516 lands, or by cherry-picking the branch).
  2. Add a session-API-key auth toggle to agent-canvas's callCloudProxy so
    cloud-mode backends can authenticate with X-Session-API-Key instead of
    bearer.
  3. Register the OSS instance as a kind: "cloud" backend in agent-canvas.

That's the path with the least new code. Option 4 (nest agent-servers) is
the cleanest long-term architectural answer if we ever outgrow OSS, but
nothing about Option 1 forecloses going there later — both surface the same
API to the frontend. Options 7 and 8 are worth keeping in mind as cheap
hardening layers, but neither is a substitute for the structural change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Shaping

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions