diff --git a/docs/docker-isolation.md b/docs/docker-isolation.md index 75253f355..65f057a4d 100644 --- a/docs/docker-isolation.md +++ b/docs/docker-isolation.md @@ -1,6 +1,8 @@ -# Docker Security Isolation +# Security Isolation (Docker · Sandbox · None) -MCPProxy provides Docker isolation for stdio MCP servers to enhance security by running each server in its own isolated container. +MCPProxy can confine stdio MCP servers so a malicious or buggy server cannot freely touch the host. There are **three isolation modes** — `docker`, `sandbox`, and `none` — selected by `docker_isolation.mode` (global) or `isolation.mode` (per-server). This document covers all three; most of it describes the **Docker** mode (the default and most capable), with the **Sandbox** mode and the **scanner behaviour under each mode** in [Isolation Modes](#isolation-modes) below. + +> **Naming note:** the global config key is still `docker_isolation` for backward compatibility, but its `mode` field selects any of the three modes — it is not Docker-only. > **New installs:** Docker isolation is turned on automatically when mcpproxy creates its initial `mcp_config.json` and a Docker daemon is reachable (`docker info` responds within 2 seconds). If Docker isn't available at first run, isolation stays off so stdio servers still work — you can enable it later from the **Security** page in the Web UI or by editing the config below. > @@ -26,6 +28,58 @@ Docker isolation automatically wraps stdio-based MCP servers in Docker container - **Resource Limits**: Memory and CPU limits prevent resource exhaustion - **Automatic Runtime Detection**: Maps commands to appropriate Docker images +## Isolation Modes + +MCPProxy resolves an **isolation mode** for every stdio server. Set it globally with `docker_isolation.mode` and override per-server with `isolation.mode`: + +| Mode | What it does | Where it works | uid/gid drop | +|------|--------------|----------------|--------------| +| `docker` | Wraps the server in a Docker container (process/FS/network isolation, resource limits). The default and most capable mode. | Any host with a working Docker daemon. | Yes (container user) | +| `sandbox` | Runs the server **natively** under a Linux [Landlock](https://docs.kernel.org/userspace-api/landlock.html) filesystem allowlist + `setrlimit` resource caps — **no Docker required**. For hosts where Docker isolation is unavailable or broken (e.g. snap-docker + AppArmor). | Linux 5.13+ only (Landlock). Best-effort downgrade across ABI 1–5. macOS/Windows: documented no-op ⇒ behaves like `none`. | **No** — see [Honest limitations](#honest-limitations) | +| `none` | No confinement; the server runs directly on the host. | Everywhere. | n/a | + +```json +{ + "docker_isolation": { + "mode": "sandbox" + }, + "mcpServers": [ + { "name": "trusted-local", "command": "uvx", "args": ["x"], "isolation": { "mode": "none" } } + ] +} +``` + +### Back-compat with the legacy `enabled` flag + +The older boolean `docker_isolation.enabled` (and per-server `isolation.enabled`) still works and is mapped to a mode: + +- an explicit `mode` always wins; +- otherwise `enabled: true` ⇒ `docker`, `enabled: false` ⇒ `none`; +- a missing/`nil` isolation config ⇒ `none`. + +Per-server precedence: explicit per-server `mode` → per-server legacy `enabled` → global `mode` → global legacy `enabled`. A per-server `mode` (e.g. `none` for a trusted server) overrides the global gate. + +### Sandbox mode (Landlock) + +`sandbox` mode confines a stdio server **without Docker** by applying a Linux Landlock LSM ruleset (a writable-path allowlist) plus `setrlimit` resource caps to the process before it `exec`s, then preserving the raw stdin/stdout JSON-RPC pipes. It is unaffected by `kernel.apparmor_restrict_unprivileged_userns=1` (it needs no user namespaces), which is exactly why it works where bubblewrap/userns-based sandboxes are blocked. See the spike write-up in [docs/development/sandbox-spike-mcp-34.md](development/sandbox-spike-mcp-34.md) for the mechanism comparison and PoC. + +### Scanner behaviour under each mode (MCP-34.4) + +The security **scanner plugins** (Spec 039) are Docker-based. Under a non-Docker isolation mode they cannot run, so MCPProxy **degrades cleanly and surfaces it** rather than failing silently: + +| Mode | Docker scanner plugins | In-process scanner (`tpa-descriptions`) | Scan result for a server with only Docker scanners | +|------|------------------------|------------------------------------------|----------------------------------------------------| +| `docker` | Run normally | Runs | As scanned | +| `sandbox` / `none` | **Skipped** with an honest, mode-specific reason pointing at [`MCPX_DOCKER_SNAP_APPARMOR`](errors/MCPX_DOCKER_SNAP_APPARMOR.md) | **Still runs** | `security_scan.status: "degraded"` (a low/zero risk score from incomplete coverage is not reported as a trustworthy all-clear) | + +This is **decision D3 option (b)** from the [MCP-34 spike](development/sandbox-spike-mcp-34.md#recommendation-for-the-d3-scanner-question): clean, surfaced degradation. A native (non-Docker) scanner runtime — option (a) — is a larger follow-up and is not yet implemented. To run the full Docker-based scanner fleet, use `mode: docker` on a host with a working Docker daemon, or replace snap-docker with a distro Docker package (see the error doc). + +The skip is also logged at startup: + +``` +WARN Isolation mode runs no Docker for scanner plugins; Docker-based scanners will be skipped … {"isolation_mode": "sandbox"} +``` + ## Configuration ### Global Docker Isolation @@ -297,6 +351,34 @@ docker stats exists at the bundle path above, or pre-pull the image with `docker pull `. +## Snap-docker (AppArmor) failure mode + +On Ubuntu hosts where Docker is installed via **snap**, AppArmor's profile transition fights the security flags the scanner sandbox requires (`--security-opt no-new-privileges` + a pinned AppArmor profile), so in-container commands fail with *operation not permitted*. This is the original driver for non-Docker `sandbox` mode. Symptoms, root cause, and fixes are documented in [`docs/errors/MCPX_DOCKER_SNAP_APPARMOR.md`](errors/MCPX_DOCKER_SNAP_APPARMOR.md). The related systemd/snap-confine variant for *upstream* docker servers is detected by `mcpproxy doctor` (issue #457). + +Your options on such a host: + +1. Replace snap Docker with a distro/upstream Docker package (full Docker mode works). +2. Set `docker_isolation.mode: "sandbox"` — stdio servers are confined natively with Landlock; Docker-based scanners degrade cleanly (see [Scanner behaviour](#scanner-behaviour-under-each-mode-mcp-344)). +3. Set `security.scanner_disable_no_new_privileges: true` to drop the `no-new-privileges` flag from scanner containers (weakens scanner hardening; prefer 1 or 2). + +## Honest limitations + +`sandbox` mode is deliberately scoped. Known limitations: + +- **No uid/gid drop.** Dropping to an unprivileged uid/gid requires `CAP_SETUID`/`CAP_SETGID` (i.e. running as root). When mcpproxy runs unprivileged, the uid/gid drop is **best-effort and typically a no-op** — the sandboxed process keeps the launching user's identity. Landlock (filesystem) and `setrlimit` (resource caps) still apply. Docker mode does drop to a container user. This is an honest trade-off, not a bug. +- **Linux-only.** Landlock is a Linux 5.13+ feature. On older kernels the launcher degrades best-effort (fewer access-right bits enforced on ABI 1). On macOS/Windows `sandbox` is a documented **no-op** and behaves like `none`. +- **Filesystem + resources only.** Landlock confines the filesystem write-allowlist; it does not provide network namespacing. Pair with care for network-sensitive servers, or use `docker` mode with `network_mode: none`. +- **Docker-based scanners do not run under `sandbox`/`none`.** They are skipped (the scan reports `degraded`). A native scanner runtime is a future enhancement (D3 option a). + +## Platform support matrix + +| Platform | `docker` | `sandbox` | `none` | Docker scanner plugins | +|----------|----------|-----------|--------|------------------------| +| Linux (kernel ≥ 5.13) | ✅ (needs Docker daemon) | ✅ Landlock + rlimits (no uid/gid drop) | ✅ | ✅ under `docker`; skipped+degraded under `sandbox`/`none` | +| Linux (kernel < 5.13) | ✅ (needs Docker daemon) | ⚠️ best-effort: rlimits apply, Landlock partial/unavailable | ✅ | same as above | +| macOS | ✅ (Docker Desktop) | ⚠️ no-op ⇒ effectively `none` | ✅ | ✅ under `docker`; n/a otherwise | +| Windows | ✅ (Docker Desktop) | ⚠️ no-op ⇒ effectively `none` | ✅ | ✅ under `docker`; n/a otherwise | + ## Security Considerations Docker isolation provides strong security boundaries but consider: diff --git a/docs/errors/MCPX_DOCKER_SNAP_APPARMOR.md b/docs/errors/MCPX_DOCKER_SNAP_APPARMOR.md index 4b3c20cb6..1252e8c92 100644 --- a/docs/errors/MCPX_DOCKER_SNAP_APPARMOR.md +++ b/docs/errors/MCPX_DOCKER_SNAP_APPARMOR.md @@ -24,7 +24,7 @@ Docker isolation works fine. ## How to fix -You have three options: +You have four options: ### 1. Switch to non-snap Docker (recommended) @@ -34,7 +34,25 @@ sudo snap remove docker # https://docs.docker.com/engine/install/ubuntu/ ``` -### 2. Disable the scanner for this server (dry-run shown by default) +### 2. Use native `sandbox` isolation instead of Docker (no Docker daemon needed) + +If the real goal is to confine stdio servers on a snap-docker host, switch the +isolation **mode** to `sandbox`. Servers are confined natively with a Linux +Landlock filesystem allowlist + `setrlimit` (kernel 5.13+), which is unaffected +by the snap-docker/AppArmor conflict because it needs no Docker and no user +namespaces: + +```json +{ "docker_isolation": { "mode": "sandbox" } } +``` + +Trade-off: the Docker-based scanner plugins cannot run under `sandbox`, so they +are **skipped** and the affected server's `security_scan.status` becomes +`degraded` (the always-on in-process `tpa-descriptions` scanner still runs). +This is MCP-34.4 / D3 option (b) — clean, surfaced degradation. See +[Security Isolation → Scanner behaviour](../features/docker-isolation.md#scanner-behaviour-under-each-mode-mcp-344). + +### 3. Disable the scanner for this server (dry-run shown by default) The error panel includes a **Disable scanner for this server** fix-step. The CLI equivalent: @@ -46,7 +64,7 @@ mcpproxy upstream patch --no-scanner --dry-run Drop `--dry-run` to apply. The server will still run with isolation, but without TPA pre-flight scanning. -### 3. Run mcpproxy without isolation for that server +### 4. Run mcpproxy without isolation for that server If you trust the upstream and don't need isolation: diff --git a/docs/features/docker-isolation.md b/docs/features/docker-isolation.md index 9a5b820ab..5f0ffa284 100644 --- a/docs/features/docker-isolation.md +++ b/docs/features/docker-isolation.md @@ -1,15 +1,19 @@ --- id: docker-isolation -title: Docker Security Isolation -sidebar_label: Docker Isolation +title: Security Isolation (Docker · Sandbox · None) +sidebar_label: Security Isolation sidebar_position: 1 -description: Run MCP servers in isolated Docker containers for enhanced security -keywords: [docker, isolation, security, containers] +description: Confine MCP servers with Docker containers, native Landlock sandboxing, or no isolation +keywords: [docker, isolation, sandbox, landlock, security, containers] --- -# Docker Security Isolation +# Security Isolation (Docker · Sandbox · None) -MCPProxy provides Docker isolation for stdio MCP servers to enhance security by running each server in its own isolated container. +MCPProxy can confine stdio MCP servers so a malicious or buggy server cannot freely touch the host. There are **three isolation modes** — `docker`, `sandbox`, and `none` — selected by `docker_isolation.mode` (global) or `isolation.mode` (per-server). Most of this page describes **Docker** mode (the default and most capable); see [Isolation Modes](#isolation-modes) for the native **Sandbox** mode and the scanner behaviour under each mode. + +:::note +The global config key is still `docker_isolation` for backward compatibility, but its `mode` field selects any of the three modes — it is not Docker-only. +::: ## Overview @@ -21,6 +25,54 @@ Docker isolation automatically wraps stdio-based MCP servers in Docker container - **Resource Limits**: Memory and CPU limits prevent resource exhaustion - **Automatic Runtime Detection**: Maps commands to appropriate Docker images +## Isolation Modes + +MCPProxy resolves an **isolation mode** for every stdio server. Set it globally with `docker_isolation.mode` and override per-server with `isolation.mode`: + +| Mode | What it does | Where it works | uid/gid drop | +|------|--------------|----------------|--------------| +| `docker` | Wraps the server in a Docker container (process/FS/network isolation, resource limits). The default and most capable mode. | Any host with a working Docker daemon. | Yes (container user) | +| `sandbox` | Runs the server **natively** under a Linux [Landlock](https://docs.kernel.org/userspace-api/landlock.html) filesystem allowlist + `setrlimit` resource caps — **no Docker required**. For hosts where Docker isolation is unavailable or broken (e.g. snap-docker + AppArmor). | Linux 5.13+ only. macOS/Windows: documented no-op ⇒ behaves like `none`. | **No** — see [Honest limitations](#honest-limitations) | +| `none` | No confinement; the server runs directly on the host. | Everywhere. | n/a | + +```json +{ + "docker_isolation": { "mode": "sandbox" }, + "mcpServers": [ + { "name": "trusted-local", "command": "uvx", "args": ["x"], "isolation": { "mode": "none" } } + ] +} +``` + +### Back-compat with the legacy `enabled` flag + +The older boolean `docker_isolation.enabled` (and per-server `isolation.enabled`) still works and is mapped to a mode: + +- an explicit `mode` always wins; +- otherwise `enabled: true` ⇒ `docker`, `enabled: false` ⇒ `none`; +- a missing isolation config ⇒ `none`. + +Per-server precedence: explicit per-server `mode` → per-server legacy `enabled` → global `mode` → global legacy `enabled`. + +### Sandbox mode (Landlock) + +`sandbox` mode confines a stdio server **without Docker** by applying a Linux Landlock LSM ruleset (a writable-path allowlist) plus `setrlimit` resource caps to the process before it `exec`s, then preserving the raw stdin/stdout JSON-RPC pipes. It needs no user namespaces, so it is unaffected by `kernel.apparmor_restrict_unprivileged_userns=1` — which is exactly why it works where bubblewrap/userns-based sandboxes are blocked (e.g. Ubuntu 24.04 with snap-docker). + +### Scanner behaviour under each mode (MCP-34.4) + +The security **scanner plugins** are Docker-based. Under a non-Docker isolation mode they cannot run, so MCPProxy **degrades cleanly and surfaces it** rather than failing silently: + +| Mode | Docker scanner plugins | In-process scanner (`tpa-descriptions`) | Scan result for a server with only Docker scanners | +|------|------------------------|------------------------------------------|----------------------------------------------------| +| `docker` | Run normally | Runs | As scanned | +| `sandbox` / `none` | **Skipped** with an honest, mode-specific reason pointing at [`MCPX_DOCKER_SNAP_APPARMOR`](/errors/MCPX_DOCKER_SNAP_APPARMOR) | **Still runs** | `security_scan.status: "degraded"` (a low/zero risk score from incomplete coverage is not reported as a trustworthy all-clear) | + +This is **decision D3 option (b)**: clean, surfaced degradation. A native (non-Docker) scanner runtime — option (a) — is a larger follow-up and is not yet implemented. To run the full Docker-based scanner fleet, use `mode: docker` on a host with a working Docker daemon, or replace snap-docker with a distro Docker package (see the error doc). The skip is also logged at startup: + +``` +WARN Isolation mode runs no Docker for scanner plugins; Docker-based scanners will be skipped … {"isolation_mode": "sandbox"} +``` + ## Configuration ### Global Docker Isolation @@ -259,6 +311,24 @@ docker rm -f $(docker ps -q --filter "label=mcpproxy.managed=true") See [Shutdown Behavior](/operations/shutdown-behavior) for detailed subprocess lifecycle documentation. +## Honest limitations + +`sandbox` mode is deliberately scoped. Known limitations: + +- **No uid/gid drop.** Dropping to an unprivileged uid/gid requires `CAP_SETUID`/`CAP_SETGID` (i.e. running as root). When mcpproxy runs unprivileged, the uid/gid drop is **best-effort and typically a no-op** — the sandboxed process keeps the launching user's identity. Landlock (filesystem) and `setrlimit` (resource caps) still apply. Docker mode does drop to a container user. This is an honest trade-off, not a bug. +- **Linux-only.** Landlock is a Linux 5.13+ feature. On older kernels the launcher degrades best-effort (fewer access-right bits enforced). On macOS/Windows `sandbox` is a documented **no-op** and behaves like `none`. +- **Filesystem + resources only.** Landlock confines the filesystem write-allowlist; it does not provide network namespacing. For network-sensitive servers, use `docker` mode with `network_mode: none`. +- **Docker-based scanners do not run under `sandbox`/`none`.** They are skipped (the scan reports `degraded`). A native scanner runtime is a future enhancement. + +## Platform support matrix + +| Platform | `docker` | `sandbox` | `none` | Docker scanner plugins | +|----------|----------|-----------|--------|------------------------| +| Linux (kernel ≥ 5.13) | ✅ (needs Docker daemon) | ✅ Landlock + rlimits (no uid/gid drop) | ✅ | ✅ under `docker`; skipped+degraded under `sandbox`/`none` | +| Linux (kernel < 5.13) | ✅ (needs Docker daemon) | ⚠️ best-effort: rlimits apply, Landlock partial/unavailable | ✅ | same as above | +| macOS | ✅ (Docker Desktop) | ⚠️ no-op ⇒ effectively `none` | ✅ | ✅ under `docker`; n/a otherwise | +| Windows | ✅ (Docker Desktop) | ⚠️ no-op ⇒ effectively `none` | ✅ | ✅ under `docker`; n/a otherwise | + ## Security Considerations Docker isolation provides strong security boundaries but consider: diff --git a/internal/security/scanner/engine.go b/internal/security/scanner/engine.go index 174e9c4d2..ee28bf5fd 100644 --- a/internal/security/scanner/engine.go +++ b/internal/security/scanner/engine.go @@ -32,6 +32,17 @@ type Engine struct { // docker × AppArmor transition would otherwise deny entrypoint exec. disableNoNewPrivileges bool + // isolationMode is the engine-wide DEFAULT isolation mode ("docker", + // "sandbox", "none", or "" == docker for back-compat), used only when a + // ScanRequest does not carry a per-server IsolationMode. The per-server + // resolved mode on the request takes precedence (MCP-34.4) so a server with + // an explicit isolation.mode override gets the right scanner behaviour. + // Scanner *plugins* are Docker-based (Spec 039); under "sandbox"/"none" the + // host runs no Docker for scanners, so Docker-based scanners are cleanly + // skipped (prefailed with an honest reason) while in-process scanners still + // run — D3 option (b). Set via Service.SetIsolationMode. + isolationMode string + // Track active scans (one per server) mu sync.Mutex activeScans map[string]*ScanJob // keyed by server name @@ -58,6 +69,14 @@ type ScanRequest struct { Env map[string]string // Additional environment variables ScanContext *ScanContext // Context metadata (set by service) ScanPass int // 1 = security scan (fast), 2 = supply chain audit (background) + // IsolationMode is THIS server's resolved isolation mode ("docker", + // "sandbox", "none", or "" to fall back to the engine's service-wide + // default). Set per-server by the Service so the Docker-scanner skip + // (MCP-34.4) follows the scanned server's effective mode — a per-server + // isolation.mode override beats the global mode (internal/upstream/core + // IsolationManager.ResolveMode). Under "sandbox"/"none" Docker scanner + // plugins are skipped; under "docker" they run. + IsolationMode string // PeerTools is a cross-server snapshot (other servers' current tool // definitions, keyed by server name) used by the in-process tpa-descriptions // scanner to build a multi-server RegistryView so the deterministic @@ -100,8 +119,11 @@ func (e *Engine) StartScan(ctx context.Context, req ScanRequest, callback ScanCa return existing, fmt.Errorf("scan already in progress for server %s (job %s)", req.ServerName, existing.ID) } - // Determine which scanners to use - resolved, err := e.resolveScanners(req.ScannerIDs) + // Determine which scanners to use. The Docker-scanner skip (MCP-34.4) + // follows THIS server's resolved isolation mode (per-server override beats + // the global default); fall back to the engine-wide default when the + // request didn't carry one. + resolved, err := e.resolveScanners(req.ScannerIDs, e.effectiveIsolationMode(req.IsolationMode)) if err != nil { e.mu.Unlock() return nil, err @@ -186,7 +208,30 @@ type resolvedScanner struct { // executeScan loop records it as a failed scanner in the job. That way the // aggregated scan report shows "X of Y scanners failed" instead of pretending // nothing is wrong. -func (e *Engine) resolveScanners(requestedIDs []string) ([]resolvedScanner, error) { +// effectiveIsolationMode resolves the isolation mode to use for a scan: the +// per-server mode carried on the ScanRequest when present, otherwise the +// engine-wide default set via Service.SetIsolationMode. This is what lets the +// Docker-scanner skip (MCP-34.4) honour a per-server isolation.mode override. +func (e *Engine) effectiveIsolationMode(requestMode string) string { + if requestMode != "" { + return requestMode + } + return e.isolationMode +} + +// resolveScanners determines which scanners to use. +// +// Unlike the old behaviour (silently dropping scanners with missing Docker +// images), this now includes every enabled scanner in the result. When a +// scanner's image is absent locally it is flagged with prefail so the +// executeScan loop records it as a failed scanner in the job. That way the +// aggregated scan report shows "X of Y scanners failed" instead of pretending +// nothing is wrong. +// +// isolationMode is THIS server's resolved isolation mode (per-server override +// already applied by the caller); under "sandbox"/"none" Docker scanner plugins +// are skipped (MCP-34.4). +func (e *Engine) resolveScanners(requestedIDs []string, isolationMode string) ([]resolvedScanner, error) { all := e.registry.List() // Helper: check whether a scanner's image is present locally. Returns a @@ -198,6 +243,21 @@ func (e *Engine) resolveScanners(requestedIDs []string) ([]resolvedScanner, erro if s.InProcess { return "" } + // MCP-34.4 / D3 option (b): under a non-Docker isolation mode the host + // runs no Docker for scanner plugins, so Docker-based scanners cannot + // run at all. Skip them with an honest, mode-specific reason (pointing + // at the snap-docker/AppArmor failure doc) instead of the misleading + // "pull the image locally" guidance below — there is nothing to pull. + // They stay in the resolved set (recorded as failed), which downgrades + // the scan summary to "degraded" rather than a silent all-clear. + // `isolationMode` is per-server resolved, so a server pinned to + // isolation.mode:docker still runs Docker scanners under a global + // sandbox default, and vice versa. + if mode := isolationMode; mode == "sandbox" || mode == "none" { + return fmt.Sprintf("Docker-based scanner %s skipped: isolation mode %q runs no Docker containers, so Docker scanner plugins cannot run for this server. "+ + "In-process scanners still ran. To run Docker scanners, set isolation.mode to \"docker\" for this server on a host with a working Docker daemon. "+ + "See docs/errors/MCPX_DOCKER_SNAP_APPARMOR.md.", s.ID, mode) + } if e.docker == nil { return "" } diff --git a/internal/security/scanner/engine_test.go b/internal/security/scanner/engine_test.go index 2812c695d..64f4e6bd5 100644 --- a/internal/security/scanner/engine_test.go +++ b/internal/security/scanner/engine_test.go @@ -433,7 +433,7 @@ func TestEngineResolveScanners(t *testing.T) { // Resolve all installed. The Docker scanner we just enabled plus the // always-installed in-process scanner (tpa-descriptions) should both // resolve (MCP-2082). - scanners, err := engine.resolveScanners(nil) + scanners, err := engine.resolveScanners(nil, "") if err != nil { t.Fatalf("resolveScanners: %v", err) } @@ -452,7 +452,7 @@ func TestEngineResolveScanners(t *testing.T) { } // Resolve specific - scanners, err = engine.resolveScanners([]string{"mcp-scan"}) + scanners, err = engine.resolveScanners([]string{"mcp-scan"}, "") if err != nil { t.Fatalf("resolveScanners specific: %v", err) } @@ -461,13 +461,13 @@ func TestEngineResolveScanners(t *testing.T) { } // Resolve non-installed - _, err = engine.resolveScanners([]string{"cisco-mcp-scanner"}) + _, err = engine.resolveScanners([]string{"cisco-mcp-scanner"}, "") if err == nil { t.Error("expected error for non-installed scanner") } // Resolve nonexistent - _, err = engine.resolveScanners([]string{"nonexistent"}) + _, err = engine.resolveScanners([]string{"nonexistent"}, "") if err == nil { t.Error("expected error for nonexistent scanner") } @@ -586,7 +586,7 @@ func TestEngineInProcessScannerAlwaysAvailable(t *testing.T) { docker := NewDockerRunner(logger) engine := NewEngine(docker, registry, dir, logger) - resolved, err := engine.resolveScanners(nil) + resolved, err := engine.resolveScanners(nil, "") if err != nil { t.Fatalf("resolveScanners: %v", err) } @@ -600,6 +600,117 @@ func TestEngineInProcessScannerAlwaysAvailable(t *testing.T) { } } +// TestEngineResolveScannersSkipsDockerUnderSandbox verifies D3 option (b) +// (MCP-34.4): when the resolved isolation mode is "sandbox" (or "none") — i.e. +// no Docker is available to run scanner containers — Docker-based scanner +// plugins are cleanly skipped with an honest, mode-specific prefail message +// (referencing MCPX_DOCKER_SNAP_APPARMOR) instead of the misleading +// "pull the image" message, while the always-on in-process scanner still runs. +// The skipped Docker scanners are still surfaced (recorded as failed) so the +// scan summary degrades rather than silently pretending all-clear. +func TestEngineResolveScannersSkipsDockerUnderSandbox(t *testing.T) { + for _, mode := range []string{"sandbox", "none"} { + t.Run(mode, func(t *testing.T) { + dir := t.TempDir() + logger := zap.NewNop() + registry := NewRegistry(dir, logger) + registry.scanners["mcp-scan"].Status = ScannerStatusInstalled + + // A non-nil docker runner with a real image present would normally + // let the Docker scanner pass checkImage; the sandbox/none gate must + // short-circuit before any Docker probe. The per-server resolved mode + // is passed directly into resolveScanners (MCP-34.4). + engine := NewEngine(nil, registry, dir, logger) + + resolved, err := engine.resolveScanners(nil, mode) + if err != nil { + t.Fatalf("resolveScanners: %v", err) + } + + byID := make(map[string]resolvedScanner) + for _, rs := range resolved { + byID[rs.plugin.ID] = rs + } + + // Docker scanner is still surfaced (visible in report) but prefailed. + dockerScanner, ok := byID["mcp-scan"] + if !ok { + t.Fatalf("expected mcp-scan to remain in resolved set under %s mode, got %v", mode, byID) + } + if dockerScanner.prefail == "" { + t.Fatalf("expected Docker scanner mcp-scan to be prefailed (skipped) under %s mode", mode) + } + if !strings.Contains(dockerScanner.prefail, mode) { + t.Errorf("prefail should name the isolation mode %q; got %q", mode, dockerScanner.prefail) + } + if !strings.Contains(dockerScanner.prefail, "MCPX_DOCKER_SNAP_APPARMOR") { + t.Errorf("prefail should reference the MCPX_DOCKER_SNAP_APPARMOR error doc; got %q", dockerScanner.prefail) + } + // The misleading "pull the image" guidance must NOT appear — you + // cannot pull/run Docker scanners at all under sandbox/none mode. + if strings.Contains(dockerScanner.prefail, "docker pull") { + t.Errorf("prefail must not tell the user to docker pull under %s mode; got %q", mode, dockerScanner.prefail) + } + + // In-process scanner still runs (never prefailed). + inProc, ok := byID[inProcessTPAScannerID] + if !ok { + t.Fatalf("expected in-process scanner to remain available under %s mode", mode) + } + if inProc.prefail != "" { + t.Errorf("in-process scanner must not be prefailed under %s mode; got %q", mode, inProc.prefail) + } + }) + } +} + +// TestEngineResolveScannersDockerModeUnaffected is the back-compat / per-server +// guard: with mode "docker" (or "" — fall back to engine default) the sandbox +// skip path is inert, so a Docker scanner with a nil docker runner resolves +// without a prefail. The "docker" case is exactly the per-server override that +// must keep running Docker scanners even under a global sandbox default +// (MCP-34.4). +func TestEngineResolveScannersDockerModeUnaffected(t *testing.T) { + for _, mode := range []string{"", "docker"} { + t.Run("mode="+mode, func(t *testing.T) { + dir := t.TempDir() + logger := zap.NewNop() + registry := NewRegistry(dir, logger) + registry.scanners["mcp-scan"].Status = ScannerStatusInstalled + + engine := NewEngine(nil, registry, dir, logger) + + resolved, err := engine.resolveScanners([]string{"mcp-scan"}, mode) + if err != nil { + t.Fatalf("resolveScanners: %v", err) + } + if len(resolved) != 1 { + t.Fatalf("expected 1 scanner, got %d", len(resolved)) + } + if resolved[0].prefail != "" { + t.Errorf("Docker scanner unexpectedly prefailed under mode %q (nil docker runner skips image check): %q", mode, resolved[0].prefail) + } + }) + } +} + +// TestEngineEffectiveIsolationMode locks the precedence used by StartScan: the +// per-request (per-server) mode wins; an empty request falls back to the +// engine-wide default (MCP-34.4). +func TestEngineEffectiveIsolationMode(t *testing.T) { + e := &Engine{isolationMode: "sandbox"} + if got := e.effectiveIsolationMode("docker"); got != "docker" { + t.Errorf("per-server 'docker' must win over engine default 'sandbox'; got %q", got) + } + if got := e.effectiveIsolationMode(""); got != "sandbox" { + t.Errorf("empty request must fall back to engine default 'sandbox'; got %q", got) + } + e2 := &Engine{isolationMode: ""} + if got := e2.effectiveIsolationMode("none"); got != "none" { + t.Errorf("per-server 'none' must win; got %q", got) + } +} + func TestEngineParseResultsSARIF(t *testing.T) { dir := t.TempDir() logger := zap.NewNop() diff --git a/internal/security/scanner/service.go b/internal/security/scanner/service.go index 93d860df5..656cfc31c 100644 --- a/internal/security/scanner/service.go +++ b/internal/security/scanner/service.go @@ -123,6 +123,13 @@ type Service struct { pulls *pullManager logger *zap.Logger + // isolationModeResolver returns a server's resolved isolation mode + // ("docker"/"sandbox"/"none", or "" if unknown). Injected by the wiring + // layer so the scanner honours per-server isolation.mode overrides + // (MCP-34.4) without the scanner package depending on the isolation + // resolver. Nil ⇒ fall back to the engine-wide default. + isolationModeResolver func(serverName string) string + // In-memory scan summary cache — avoids expensive BBolt reads per server summaryCache map[string]*ScanSummary summaryCacheMu sync.RWMutex @@ -172,6 +179,53 @@ func (s *Service) SetScannerDisableNoNewPrivileges(disable bool) { } } +// SetIsolationMode records the engine-wide DEFAULT isolation mode ("docker", +// "sandbox", "none", or "" == docker), used when a scan has no per-server mode +// (no resolver wired, or the resolver returns ""). Per-server resolution via +// SetIsolationModeResolver takes precedence. Under "sandbox"/"none" the host +// runs no Docker for scanners, so Docker scanner plugins are cleanly skipped +// (the scan summary degrades) while in-process scanners still run — MCP-34.4 / +// D3 option (b): clean, surfaced degradation on snap-docker / non-Docker hosts. +func (s *Service) SetIsolationMode(mode string) { + if s.engine == nil { + return + } + s.engine.isolationMode = mode + if mode == "sandbox" || mode == "none" { + s.logger.Warn("Default isolation mode runs no Docker for scanner plugins; Docker-based scanners "+ + "will be skipped and the security scan will report 'degraded' for affected servers "+ + "(per-server isolation.mode:docker overrides). In-process scanners (e.g. tpa-descriptions) "+ + "still run. See docs/errors/MCPX_DOCKER_SNAP_APPARMOR.md.", + zap.String("isolation_mode", mode)) + } +} + +// SetIsolationModeResolver injects a per-server isolation-mode resolver so the +// Docker-scanner skip (MCP-34.4) follows each scanned server's RESOLVED mode — +// a per-server isolation.mode override beats the global default. The resolver +// returns "docker"/"sandbox"/"none", or "" to fall back to the engine-wide +// default. Wired in the server layer from the upstream IsolationManager so the +// scanner package stays decoupled from the resolver. +func (s *Service) SetIsolationModeResolver(resolver func(serverName string) string) { + s.isolationModeResolver = resolver +} + +// resolveIsolationMode returns the isolation mode to apply to a scan of +// serverName: the per-server resolver result when it yields a concrete mode, +// otherwise the engine-wide default. Falls back gracefully when no resolver is +// wired (e.g. unit tests) so behaviour matches SetIsolationMode alone. +func (s *Service) resolveIsolationMode(serverName string) string { + if s.isolationModeResolver != nil { + if mode := s.isolationModeResolver(serverName); mode != "" { + return mode + } + } + if s.engine != nil { + return s.engine.isolationMode + } + return "" +} + // SetFetchPackageSource toggles whether the source resolver may fetch the // published source of package-runner servers (npx/uvx) for scanning. See // SecurityConfig.ScannerFetchPackageSource (MCP-2206). Default is enabled. @@ -662,11 +716,12 @@ func (a *scanCallbackAdapter) OnScanFailed(job *ScanJob, err error) { // After Pass 1 completes, Pass 2 (supply chain audit) is auto-started in the background. func (s *Service) StartScan(ctx context.Context, serverName string, dryRun bool, scannerIDs []string, sourceDir string) (*ScanJob, error) { req := ScanRequest{ - ServerName: serverName, - DryRun: dryRun, - ScannerIDs: scannerIDs, - SourceDir: sourceDir, - ScanPass: ScanPassSecurityScan, + ServerName: serverName, + DryRun: dryRun, + ScannerIDs: scannerIDs, + SourceDir: sourceDir, + ScanPass: ScanPassSecurityScan, + IsolationMode: s.resolveIsolationMode(serverName), } // Build scan context for transparency @@ -873,9 +928,10 @@ func (s *Service) startPass2(serverName string, serverInfo *ServerInfo) { ctx := context.Background() req := ScanRequest{ - ServerName: serverName, - DryRun: false, - ScanPass: ScanPassSupplyChainAudit, + ServerName: serverName, + DryRun: false, + ScanPass: ScanPassSupplyChainAudit, + IsolationMode: s.resolveIsolationMode(serverName), } // Build scan context diff --git a/internal/security/scanner/service_test.go b/internal/security/scanner/service_test.go index 93d00c492..bc91de2fd 100644 --- a/internal/security/scanner/service_test.go +++ b/internal/security/scanner/service_test.go @@ -369,6 +369,65 @@ func newTestService(t *testing.T) (*Service, *mockStorage, *mockEmitter) { return svc, store, emitter } +// TestServiceSetIsolationMode verifies the setter propagates the resolved +// isolation mode to the scan engine (MCP-34.4 / D3 option b), which is what +// gates the Docker-scanner skip path. Default ("") leaves Docker behaviour +// intact. +func TestServiceSetIsolationMode(t *testing.T) { + svc, _, _ := newTestService(t) + + if svc.engine.isolationMode != "" { + t.Fatalf("expected default isolation mode to be empty, got %q", svc.engine.isolationMode) + } + + svc.SetIsolationMode("sandbox") + if svc.engine.isolationMode != "sandbox" { + t.Errorf("expected engine isolation mode 'sandbox', got %q", svc.engine.isolationMode) + } + + svc.SetIsolationMode("docker") + if svc.engine.isolationMode != "docker" { + t.Errorf("expected engine isolation mode 'docker', got %q", svc.engine.isolationMode) + } +} + +// TestServiceResolveIsolationModePerServer verifies the per-server resolver +// (MCP-34.4 review fix): a per-server resolved mode takes precedence over the +// engine-wide default, so a server pinned to isolation.mode:docker keeps +// running Docker scanners under a global sandbox default, and a server resolved +// to sandbox/none skips them under a global docker default. A "" resolver +// result (or no resolver) falls back to the engine-wide default. +func TestServiceResolveIsolationModePerServer(t *testing.T) { + svc, _, _ := newTestService(t) + svc.SetIsolationMode("sandbox") // engine-wide default + + // No resolver wired yet → fall back to the engine default. + if got := svc.resolveIsolationMode("any"); got != "sandbox" { + t.Errorf("with no resolver, expected fallback to engine default 'sandbox', got %q", got) + } + + perServer := map[string]string{ + "pinned-docker": "docker", // overrides the global sandbox default + "pinned-none": "none", + "pinned-sandbox": "sandbox", + "inherits": "", // resolver yields "" → fall back to default + } + svc.SetIsolationModeResolver(func(serverName string) string { return perServer[serverName] }) + + cases := map[string]string{ + "pinned-docker": "docker", + "pinned-none": "none", + "pinned-sandbox": "sandbox", + "inherits": "sandbox", // "" → engine default + "unknown-server": "sandbox", // not in map → "" → engine default + } + for server, want := range cases { + if got := svc.resolveIsolationMode(server); got != want { + t.Errorf("resolveIsolationMode(%q) = %q, want %q", server, got, want) + } + } +} + func TestServiceListScannersEmpty(t *testing.T) { svc, _, _ := newTestService(t) diff --git a/internal/server/server.go b/internal/server/server.go index c2844f6e1..bad4b8573 100644 --- a/internal/server/server.go +++ b/internal/server/server.go @@ -39,6 +39,7 @@ import ( "github.com/smart-mcp-proxy/mcpproxy-go/internal/tlslocal" "github.com/smart-mcp-proxy/mcpproxy-go/internal/transport" "github.com/smart-mcp-proxy/mcpproxy-go/internal/updatecheck" + "github.com/smart-mcp-proxy/mcpproxy-go/internal/upstream/core" "github.com/smart-mcp-proxy/mcpproxy-go/internal/upstream/types" "github.com/smart-mcp-proxy/mcpproxy-go/web" ) @@ -1961,6 +1962,36 @@ func (s *Server) startCustomHTTPServer(ctx context.Context, streamableServer *se if cfg != nil && cfg.Security != nil && cfg.Security.ScannerDisableNoNewPrivileges { secService.SetScannerDisableNoNewPrivileges(true) } + // MCP-34.4 / D3 option (b): tell the scanner which isolation mode is + // active. Under "sandbox"/"none" the host runs no Docker for scanner + // plugins, so they degrade cleanly (skip + "degraded" scan summary) + // instead of failing with a misleading "pull the image" message. + // + // Two layers: SetIsolationMode is the engine-wide default; the resolver + // is per-server so a server pinned to isolation.mode:docker still runs + // Docker scanners under a global sandbox default (and vice versa), + // matching the per-server isolation resolver and the docs. + if cfg != nil && cfg.DockerIsolation != nil { + secService.SetIsolationMode(string(cfg.DockerIsolation.ResolvedMode())) + } + secService.SetIsolationModeResolver(func(serverName string) string { + liveCfg := s.runtime.Config() + if liveCfg == nil || liveCfg.DockerIsolation == nil { + return "" + } + var sc *config.ServerConfig + for _, candidate := range liveCfg.Servers { + if candidate != nil && candidate.Name == serverName { + sc = candidate + break + } + } + if sc == nil { + return "" // unknown server → fall back to the engine-wide default + } + im := core.NewIsolationManager(liveCfg.DockerIsolation) + return string(im.ResolveMode(sc)) + }) // Published-package-source fetch is enabled by default; only an explicit // false in config disables it (MCP-2206). if cfg != nil && cfg.Security != nil && cfg.Security.ScannerFetchPackageSource != nil && !*cfg.Security.ScannerFetchPackageSource {