The legacy "Local GUI" by default ran each conversation in its own container. This was slow and resource-intensive, and not where the industry went. But it remains a helpful option for the security-conscious!
The bulk of this document is AI-generated but is the fruit of many hours of kicking ideas around with an agent that could access the relevant codebases.
Per-conversation container isolation — options
Goal: give each conversation its own container so tool calls in one can't affect
another, while keeping one backend and one conversation list in the frontend.
Where we already are
- Local backend: one long-lived agent-server. All conversations share its
process, filesystem, and /workspace/project. No isolation.
- Cloud backend: a control plane (
app.all-hands.dev) owns the conversation
index; each conversation gets a sandbox pod via a Runtime API. Frontend talks
through /api/cloud-proxy and routes per-conversation traffic to the sandbox
via conversation_url + X-Session-API-Key.
Local and cloud already sit at opposite corners of two orthogonal axes:
(a) where the conversation index lives, and (b) whether the agent-server is
long-lived or ephemeral per conversation. Real isolation puts you in cloud's
corner regardless of how you get there.
Option 1 — Use OpenHands OSS as the backend
OpenHands OSS (openhands/app_server/ in OpenHands/OpenHands) is,
structurally, the open-source cloud control plane. Its /api/v1 surface —
app-conversations, sandboxes (with SandboxStatus, ExposedUrl,
session_api_key, conversation_url), settings, secrets, git, events
— uses the same model names, field names, and status enum values that
agent-canvas's "cloud" branches already speak. Its DockerSandboxService
already provisions a per-conversation agent-server container, allocates host
ports, sends event webhooks back to the control plane, and tracks sandbox
lifecycle (STARTING → RUNNING → PAUSED → ERROR → MISSING).
- Frontend: registers as
kind: "cloud". One small fix likely needed:
cloud-proxy auth toggle. OSS uses X-Session-API-Key (set via
SESSION_API_KEY env var); agent-canvas's callCloudProxy defaults to
Authorization: Bearer. Per-backend auth-mode is a tiny change.
- Reuses: ~everything. Most of the work was already done by the OpenHands
team. An OSS deployment registers as a kind: "cloud" backend in
agent-canvas's "Manage backends" UI.
- Adds: nothing on the backend side that doesn't already exist.
- Remote-VM story: OpenHands/OpenHands#13516
adds a WebSocket gateway that routes all sandbox traffic through the single
app-server port. With it enabled (ENABLE_WEBSOCKET_GATEWAY=true), the
browser only ever talks to the OpenHands app-server's URL — no per-sandbox
ports to expose, no wildcard DNS, no ingress work. That solves the
hardest piece of "run this on a VM behind one URL." The sibling
#13551 adds the same
treatment for VS Code, and #12591
is an earlier take on the K8s side.
- Trade-off: the WS-gateway PR is open with requested changes and not yet
merged. Until it lands (or we ship a fork), VM deployments either expose
random sandbox ports (OSS default, OH_SANDBOX_CONTAINER_URL_PATTERN) or
add their own ingress.
Option 2 — Standalone control plane (k8s "cloud-shaped" backend)
Ship a small FastAPI service that mimics the cloud control plane: owns the
conversation index in meta.json/sandbox.json on a PVC, provisions a pod per
conversation via kube-apiserver, exposes sandbox_status + conversation_url
to the frontend. Pods stream events back via the agent-server's existing
webhooks config.
- Frontend: registers as
kind: "cloud". No widening of BackendKind.
- Reuses: the frontend's existing cloud branches (~88 sites).
- Adds: a new component (the control plane) + helm chart.
- Persistence: mirror of local agent-server's JSON-on-disk layout
(FileSettingsStore, EventLog, meta.json per conv) on the control-plane
PVC. No SQLite. Events flow control-plane-ward via webhooks so they survive
pod death.
- Trade-off: structurally clean and matches cloud, but it's a separate
service to build, deploy, and maintain — and it's mostly re-implementing
what Option 1 (OpenHands OSS) already provides.
Option 3 — Thin "tool runner" image
Build a stripped agent-server image that exposes only the tool surface
(/api/bash/*, /api/files/*, /api/git/*) — no /api/conversations/*, no
/sockets, no agent loop. Outer agent-server keeps owning the conversation
list and event store, but its tools dispatch over HTTP into per-conversation
runner containers.
- Frontend: zero changes. Stays in local mode.
- Reuses: outer agent-server's existing persistence verbatim.
- Adds: a new server image + a new
RemoteWorkspace-shaped client in the
SDK that doesn't depend on /api/conversations/*.
- Trade-off: smallest per-conversation footprint, but real SDK work and a
new image to maintain. Doesn't exist today.
Option 4 — Nest agent-servers
Make the agent-server pluggable enough to host conversations that live on
other agent-servers. One aggregator process speaks the existing agent-server
HTTP API to the frontend; internally it spawns a downstream agent-server per
conversation (Docker, k8s, pre-warmed pool, whatever) and forwards
per-conversation HTTP + WebSocket traffic to it. Settings/secrets live on the
aggregator; downstreams pull them via the existing LookupSecret URL pattern.
- Frontend: zero changes. One backend, one conversation list.
- Reuses: all existing agent-server persistence code, the
cloud_proxy
forwarding skeleton, the per-conversation PubSub as a free event-mirror
tap, RemoteConversation's attach-by-ID resume path.
- Adds (inside the agent-server):
- A pluggable
ConversationService interface with Local + Remote +
Composite implementations.
- A conversation-aware HTTP forwarder for
/api/conversations/{id}/*.
- A WebSocket proxy for
/sockets/conversation/{id}/events (the only
genuinely new code — frame-pumping with reconnect + handshake replay).
- A
ServerProvisioner abstraction (Docker, K8s, Pool, Existing).
- Trade-off: the cloud/local divide collapses in the frontend, but the
real work is inside the agent-server itself, including one piece (WS proxy)
with non-trivial bug surface. Note: OpenHands OSS's WS gateway PR
(#13516) is roughly
the same primitive at the app-server layer.
Option 5 — Per-tool-call ephemeral containers
Instead of a long-lived container per conversation, spin up a fresh container
per tool invocation (one bash command, one file edit, one git op). Container
lives milliseconds-to-seconds, dies, next call gets a fresh one. Workspace dir
is a host-mounted volume per conversation so writes persist across tool calls.
- Frontend: zero changes.
- Isolation property: tools can't even see each other's side effects in
memory; every call is a hermetic environment. Strongest per-call isolation
of any option here.
- Trade-off: cold-start latency on every tool call, even with image
caching and warm pools. Stateful bash sessions (source venv/bin/activate
then python … in the next call) get awkward — you either lose state or
reimplement session continuity over container boundaries. Mostly interesting
for security-critical deployments where re-execution cost is acceptable.
Option 6 — VM-level isolation (Firecracker / Kata / gVisor)
Same architecture as Option 1, 2, or 4, but the per-conversation sandbox is a
microVM (Firecracker, Kata) or a gVisor-wrapped container instead of a plain
container. Provisioner picks the runtime; nothing above it cares.
- Frontend: zero changes.
- Isolation property: real kernel boundary, not just namespaces. Defends
against container-escape CVEs and kernel-syscall bugs in a way containers
can't.
- Trade-off: seconds of cold start, hundreds of MB resident per
conversation, more infrastructure (image build pipeline, snapshot tooling).
Mostly "Option 1 / 2 / 4 with a different provisioner" rather than its own
thing.
Option 7 — OS-level sandboxing in one process (seccomp / landlock / AppArmor)
Don't spawn anything. Keep the single-process local agent-server, but run each
conversation's tool calls under a per-conversation seccomp filter + landlock
ruleset that restricts syscalls and filesystem access to that conversation's
working directory.
- Frontend: zero changes.
- Isolation property: stops accidental and most casual-malicious
cross-contamination at the syscall layer. Cheap — no per-conversation
process or container.
- Trade-off: Linux-only; seccomp/landlock policies are tricky to write
correctly and easy to under-restrict; doesn't isolate memory or network
effects between conversations; non-trivial to compose with browser/MCP tools
that themselves spawn subprocesses. Realistic as a complement to the
status quo (raise the bar without a structural redesign), not a replacement
for container-level isolation.
Option 8 — Copy-on-write workspace per conversation, shared process
Use a COW filesystem (overlayfs, btrfs subvolumes, ZFS clones) to give each
conversation its own snapshot of /workspace/project while the agent-server
itself stays single-process. Deleting a conversation rolls back its snapshot.
- Frontend: zero changes.
- Isolation property: persistent filesystem effects can't bleed between
conversations, and you get free "reset workspace" semantics. But this
isolates files, not processes — a runaway bash command can still affect
global system state, kill sibling processes, exhaust memory, etc.
- Trade-off: mostly addresses "I committed to the wrong conversation's
workspace" rather than "this conversation could attack the host." Honest as
a low-cost ergonomic improvement, not as a security boundary.
Network topology (remote / VM deployment)
When the isolated-container backend runs on a remote VM (not the user's
laptop), the browser needs to reach two things:
- The control-plane / aggregator API (HTTP).
- The per-conversation sandbox, for at least the WebSocket
/sockets/conversation/<id>/events and the sandbox-direct REST calls
(hostOverride calls in callCloudProxy).
Item 1 is easy — agent-canvas already proxies cloud-mode HTTP through the
local bundled agent-server's /api/cloud-proxy. The hard one is item 2:
sandbox URLs must be browser-reachable somehow.
| Approach |
Infra on the VM |
Browser-facing URLs |
Cost |
| Expose random ports (OSS OpenHands' OOTB) |
Open port range in firewall |
http://vm:30001, http://vm:30002, … |
No HTTPS without per-port termination; CORS per port; mixed-content blocks if frontend is HTTPS |
| Subdomain per sandbox |
Wildcard DNS + reverse proxy + wildcard TLS |
https://abc.sandbox.vm.example.com |
Cleanest browser story; needs DNS infra and a wildcard cert |
| Path-prefix ingress |
Reverse proxy (nginx/caddy/traefik) with dynamic routing |
https://vm.example.com/sandbox/abc/… |
One cert, one origin; routing config has to update as sandboxes come/go |
| Proxy through the app-server / aggregator |
None beyond the app-server |
https://vm.example.com/… only |
Needs an HTTP+WS reverse proxy inside the app-server |
The last row is exactly what OpenHands/OpenHands#13516
ships for the WebSocket side, and what #13551
ships for VS Code. Once those land (or are picked up via fork/cherry-pick),
Option 1 — Use OpenHands OSS — gets the clean one-port one-origin story
out of the box. Until then, OSS-on-a-VM still needs one of the ingress
patterns above.
The same proxy primitive is also Option 4's load-bearing piece of new code.
For VM deployments, that work is no longer "extra cost" — it's the answer.
Non-options we ruled out
DockerWorkspace + Conversation per conversation in-process. Looked
like a one-line config flip but isn't: wrapping a conversation in a
DockerWorkspace produces a RemoteConversation, which means the entire
conversation (agent loop, LLM calls, events) lives in the container. The
outer process becomes a thin RPC client — structurally identical to Option 1
but assembled ad-hoc.
- A new
BackendKind in agent-canvas. Widening "local" | "cloud" would
fan out to ~88 branch sites for no gain. Whatever ships should reuse one of
the two existing kinds.
- Bind-mounting host directories into pods. Only works in single-node
KIND with extraMounts, doesn't generalise, and runs into the agent-server's
own "flock does not work reliably on NFS" warning if you push it.
Comparison
Options 1–4 are architectural — they answer "where does conversation state
live and what runs where." Options 5–8 are mostly isolation-primitive choices
that compose with the architecture above.
| Option |
Isolation strength |
Per-conv overhead |
Frontend changes |
New components |
VM-deploy story |
| 1 — Use OpenHands OSS |
Container |
Container |
Small (auth) |
0 (use OSS) |
Native once #13516 lands; ingress otherwise |
| 2 — Standalone control plane |
Container |
Pod |
Cloud branches |
1 service |
Needs ingress or own WS proxy |
| 3 — Thin tool-runner image |
Container (tools) |
Tools container |
None |
1 image |
Native (outer owns origin) |
| 4 — Nested agent-servers |
Container |
Container |
None |
0 |
Native (aggregator owns origin) |
| 5 — Per-tool-call containers |
Container per call |
Per call |
None |
0–1 |
Inherits from 1/2/4 |
| 6 — VM / Firecracker / gVisor |
Kernel boundary |
MicroVM |
None |
0 (provisioner) |
Inherits from 1/2/4 |
| 7 — seccomp / landlock in-process |
Syscall filter |
None |
None |
0 |
n/a (no extra origin) |
| 8 — COW workspace, shared process |
Filesystem only |
None |
None |
0 |
n/a (no extra origin) |
5–8 are largely substitutable inside 1, 2, or 4: Option 6 is just "pick the
provisioner," Option 5 is a different granularity choice for the same
provisioner, Options 7–8 are complementary hardening that can layer on top of
any architecture (including the status quo).
Recommendation
Option 1 — use OpenHands OSS as the backend. It's the open-source version
of the cloud control plane, it speaks the API surface agent-canvas already
talks to, and the per-conversation Docker isolation is already implemented
and shipped. The only known gap is the WebSocket gateway for VM deployment,
and there's an active community PR
(OpenHands/OpenHands#13516)
that solves exactly that. Concretely:
- Stand up OpenHands OSS on the VM with
ENABLE_WEBSOCKET_GATEWAY=true
(after #13516 lands, or by cherry-picking the branch).
- Add a session-API-key auth toggle to agent-canvas's
callCloudProxy so
cloud-mode backends can authenticate with X-Session-API-Key instead of
bearer.
- Register the OSS instance as a
kind: "cloud" backend in agent-canvas.
That's the path with the least new code. Option 4 (nest agent-servers) is
the cleanest long-term architectural answer if we ever outgrow OSS, but
nothing about Option 1 forecloses going there later — both surface the same
API to the frontend. Options 7 and 8 are worth keeping in mind as cheap
hardening layers, but neither is a substitute for the structural change.
The legacy "Local GUI" by default ran each conversation in its own container. This was slow and resource-intensive, and not where the industry went. But it remains a helpful option for the security-conscious!
The bulk of this document is AI-generated but is the fruit of many hours of kicking ideas around with an agent that could access the relevant codebases.
Per-conversation container isolation — options
Goal: give each conversation its own container so tool calls in one can't affect
another, while keeping one backend and one conversation list in the frontend.
Where we already are
process, filesystem, and
/workspace/project. No isolation.app.all-hands.dev) owns the conversationindex; each conversation gets a sandbox pod via a Runtime API. Frontend talks
through
/api/cloud-proxyand routes per-conversation traffic to the sandboxvia
conversation_url+X-Session-API-Key.Local and cloud already sit at opposite corners of two orthogonal axes:
(a) where the conversation index lives, and (b) whether the agent-server is
long-lived or ephemeral per conversation. Real isolation puts you in cloud's
corner regardless of how you get there.
Option 1 — Use OpenHands OSS as the backend
OpenHands OSS (
openhands/app_server/inOpenHands/OpenHands) is,structurally, the open-source cloud control plane. Its
/api/v1surface —app-conversations,sandboxes(withSandboxStatus,ExposedUrl,session_api_key,conversation_url),settings,secrets,git,events— uses the same model names, field names, and status enum values that
agent-canvas's "cloud" branches already speak. Its
DockerSandboxServicealready provisions a per-conversation agent-server container, allocates host
ports, sends event webhooks back to the control plane, and tracks sandbox
lifecycle (
STARTING→RUNNING→PAUSED→ERROR→MISSING).kind: "cloud". One small fix likely needed:cloud-proxy auth toggle. OSS uses
X-Session-API-Key(set viaSESSION_API_KEYenv var); agent-canvas'scallCloudProxydefaults toAuthorization: Bearer. Per-backend auth-mode is a tiny change.team. An OSS deployment registers as a
kind: "cloud"backend inagent-canvas's "Manage backends" UI.
adds a WebSocket gateway that routes all sandbox traffic through the single
app-server port. With it enabled (
ENABLE_WEBSOCKET_GATEWAY=true), thebrowser only ever talks to the OpenHands app-server's URL — no per-sandbox
ports to expose, no wildcard DNS, no ingress work. That solves the
hardest piece of "run this on a VM behind one URL." The sibling
#13551 adds the same
treatment for VS Code, and #12591
is an earlier take on the K8s side.
merged. Until it lands (or we ship a fork), VM deployments either expose
random sandbox ports (OSS default,
OH_SANDBOX_CONTAINER_URL_PATTERN) oradd their own ingress.
Option 2 — Standalone control plane (k8s "cloud-shaped" backend)
Ship a small FastAPI service that mimics the cloud control plane: owns the
conversation index in
meta.json/sandbox.jsonon a PVC, provisions a pod perconversation via
kube-apiserver, exposessandbox_status+conversation_urlto the frontend. Pods stream events back via the agent-server's existing
webhooksconfig.kind: "cloud". No widening ofBackendKind.(
FileSettingsStore,EventLog,meta.jsonper conv) on the control-planePVC. No SQLite. Events flow control-plane-ward via webhooks so they survive
pod death.
service to build, deploy, and maintain — and it's mostly re-implementing
what Option 1 (OpenHands OSS) already provides.
Option 3 — Thin "tool runner" image
Build a stripped agent-server image that exposes only the tool surface
(
/api/bash/*,/api/files/*,/api/git/*) — no/api/conversations/*, no/sockets, no agent loop. Outer agent-server keeps owning the conversationlist and event store, but its tools dispatch over HTTP into per-conversation
runner containers.
RemoteWorkspace-shaped client in theSDK that doesn't depend on
/api/conversations/*.new image to maintain. Doesn't exist today.
Option 4 — Nest agent-servers
Make the agent-server pluggable enough to host conversations that live on
other agent-servers. One aggregator process speaks the existing agent-server
HTTP API to the frontend; internally it spawns a downstream agent-server per
conversation (Docker, k8s, pre-warmed pool, whatever) and forwards
per-conversation HTTP + WebSocket traffic to it. Settings/secrets live on the
aggregator; downstreams pull them via the existing
LookupSecretURL pattern.cloud_proxyforwarding skeleton, the per-conversation
PubSubas a free event-mirrortap,
RemoteConversation's attach-by-ID resume path.ConversationServiceinterface withLocal+Remote+Compositeimplementations./api/conversations/{id}/*./sockets/conversation/{id}/events(the onlygenuinely new code — frame-pumping with reconnect + handshake replay).
ServerProvisionerabstraction (Docker,K8s,Pool,Existing).real work is inside the agent-server itself, including one piece (WS proxy)
with non-trivial bug surface. Note: OpenHands OSS's WS gateway PR
(#13516) is roughly
the same primitive at the app-server layer.
Option 5 — Per-tool-call ephemeral containers
Instead of a long-lived container per conversation, spin up a fresh container
per tool invocation (one bash command, one file edit, one git op). Container
lives milliseconds-to-seconds, dies, next call gets a fresh one. Workspace dir
is a host-mounted volume per conversation so writes persist across tool calls.
memory; every call is a hermetic environment. Strongest per-call isolation
of any option here.
caching and warm pools. Stateful bash sessions (
source venv/bin/activatethen
python …in the next call) get awkward — you either lose state orreimplement session continuity over container boundaries. Mostly interesting
for security-critical deployments where re-execution cost is acceptable.
Option 6 — VM-level isolation (Firecracker / Kata / gVisor)
Same architecture as Option 1, 2, or 4, but the per-conversation sandbox is a
microVM (Firecracker, Kata) or a gVisor-wrapped container instead of a plain
container. Provisioner picks the runtime; nothing above it cares.
against container-escape CVEs and kernel-syscall bugs in a way containers
can't.
conversation, more infrastructure (image build pipeline, snapshot tooling).
Mostly "Option 1 / 2 / 4 with a different provisioner" rather than its own
thing.
Option 7 — OS-level sandboxing in one process (seccomp / landlock / AppArmor)
Don't spawn anything. Keep the single-process local agent-server, but run each
conversation's tool calls under a per-conversation seccomp filter + landlock
ruleset that restricts syscalls and filesystem access to that conversation's
working directory.
cross-contamination at the syscall layer. Cheap — no per-conversation
process or container.
correctly and easy to under-restrict; doesn't isolate memory or network
effects between conversations; non-trivial to compose with browser/MCP tools
that themselves spawn subprocesses. Realistic as a complement to the
status quo (raise the bar without a structural redesign), not a replacement
for container-level isolation.
Option 8 — Copy-on-write workspace per conversation, shared process
Use a COW filesystem (overlayfs, btrfs subvolumes, ZFS clones) to give each
conversation its own snapshot of
/workspace/projectwhile the agent-serveritself stays single-process. Deleting a conversation rolls back its snapshot.
conversations, and you get free "reset workspace" semantics. But this
isolates files, not processes — a runaway bash command can still affect
global system state, kill sibling processes, exhaust memory, etc.
workspace" rather than "this conversation could attack the host." Honest as
a low-cost ergonomic improvement, not as a security boundary.
Network topology (remote / VM deployment)
When the isolated-container backend runs on a remote VM (not the user's
laptop), the browser needs to reach two things:
/sockets/conversation/<id>/eventsand the sandbox-direct REST calls(
hostOverridecalls incallCloudProxy).Item 1 is easy — agent-canvas already proxies cloud-mode HTTP through the
local bundled agent-server's
/api/cloud-proxy. The hard one is item 2:sandbox URLs must be browser-reachable somehow.
http://vm:30001,http://vm:30002, …https://abc.sandbox.vm.example.comhttps://vm.example.com/sandbox/abc/…https://vm.example.com/…onlyThe last row is exactly what OpenHands/OpenHands#13516
ships for the WebSocket side, and what #13551
ships for VS Code. Once those land (or are picked up via fork/cherry-pick),
Option 1 — Use OpenHands OSS — gets the clean one-port one-origin story
out of the box. Until then, OSS-on-a-VM still needs one of the ingress
patterns above.
The same proxy primitive is also Option 4's load-bearing piece of new code.
For VM deployments, that work is no longer "extra cost" — it's the answer.
Non-options we ruled out
DockerWorkspace+Conversationper conversation in-process. Lookedlike a one-line config flip but isn't: wrapping a conversation in a
DockerWorkspaceproduces aRemoteConversation, which means the entireconversation (agent loop, LLM calls, events) lives in the container. The
outer process becomes a thin RPC client — structurally identical to Option 1
but assembled ad-hoc.
BackendKindin agent-canvas. Widening"local" | "cloud"wouldfan out to ~88 branch sites for no gain. Whatever ships should reuse one of
the two existing kinds.
KIND with
extraMounts, doesn't generalise, and runs into the agent-server'sown "flock does not work reliably on NFS" warning if you push it.
Comparison
Options 1–4 are architectural — they answer "where does conversation state
live and what runs where." Options 5–8 are mostly isolation-primitive choices
that compose with the architecture above.
5–8 are largely substitutable inside 1, 2, or 4: Option 6 is just "pick the
provisioner," Option 5 is a different granularity choice for the same
provisioner, Options 7–8 are complementary hardening that can layer on top of
any architecture (including the status quo).
Recommendation
Option 1 — use OpenHands OSS as the backend. It's the open-source version
of the cloud control plane, it speaks the API surface agent-canvas already
talks to, and the per-conversation Docker isolation is already implemented
and shipped. The only known gap is the WebSocket gateway for VM deployment,
and there's an active community PR
(OpenHands/OpenHands#13516)
that solves exactly that. Concretely:
ENABLE_WEBSOCKET_GATEWAY=true(after #13516 lands, or by cherry-picking the branch).
callCloudProxysocloud-mode backends can authenticate with
X-Session-API-Keyinstead ofbearer.
kind: "cloud"backend in agent-canvas.That's the path with the least new code. Option 4 (nest agent-servers) is
the cleanest long-term architectural answer if we ever outgrow OSS, but
nothing about Option 1 forecloses going there later — both surface the same
API to the frontend. Options 7 and 8 are worth keeping in mind as cheap
hardening layers, but neither is a substitute for the structural change.