Skip to content

harden docker agent serve api: warn on non-loopback, fix runtime race, block SSRF#2604

Open
dgageot wants to merge 6 commits intodocker:mainfrom
dgageot:board/security-review-of-docker-agent-serve-ap-c9b90a8c
Open

harden docker agent serve api: warn on non-loopback, fix runtime race, block SSRF#2604
dgageot wants to merge 6 commits intodocker:mainfrom
dgageot:board/security-review-of-docker-agent-serve-ap-c9b90a8c

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented Apr 30, 2026

Hardens the docker agent serve api command and the URL-based agent loading
path. This addresses three of the issues surfaced by a security review of
serve api (warn-on-public-bind, runtime race, SSRF/HTTPS); the rest of the
review (auth/authz, CSRF, rate limiting, etc.) remains open.

Commits

Commit Summary
1 49e91c8fd warn when serve api binds to a non-loopback address The default --listen is already 127.0.0.1:8080. When the operator is explicit about exposing the API (e.g. --listen 0.0.0.0:...), print a WARNING: block and slog.Warn so the lack of authentication is not silent.
2 de2e8671e fix race in SessionManager.runtimeForSession runtimeForSession used to do its own Load+Store on sm.runtimeSessions, then RunSession did a second Store with a cancel func. Between the two stores, the map briefly held a half-initialised activeRuntimes (no cancel) observable by Steer/FollowUp (which do not hold sm.mux). runtimeForSession is now a pure constructor; the caller is the single source of truth for the map mutation.
3 69412c042 reject non-public addresses and require https for url agent sources Two layered defences in urlSource.Read: pre-flight validateAgentURL rejects non-https schemes, and an ssrfDialControl hook on net.Dialer rejects any dial whose resolved IP is loopback / RFC1918 / link-local / multicast / unspecified. Running at dial time also defeats DNS rebinding. Closes the SSRF vector against AWS/GCP/Azure metadata (169.254.169.254) and internal services.
4 53aff9a9f block https-to-http redirects when fetching url agent sources ssrfSafeHTTPClient was missing CheckRedirect. An https origin under attacker control could 302 to http://... and silently downgrade. Adds a named ssrfCheckRedirect (testable) that rejects non-https redirect targets and bounds the chain at 10 hops.
5 9ceae882e move newURLSourceForTest into the _test.go file The test-only constructor that disables the SSRF/HTTPS checks no longer ships in release binaries.
6 e36e54ba1 fix lint issues in url agent source gocritic/paramTypeCombine, gocritic/importShadow (local url shadowed net/url), and gci/gofmt comment alignment.

Tests

New tests added:

  • TestURLSource_Read_RejectsHTTPhttp:// URLs rejected.
  • TestURLSource_Read_RejectsLocalAddresses — loopback (v4 + v6), RFC1918, 169.254.169.254, 0.0.0.0 all rejected at dial time.
  • TestIsPublicIP — table-driven classifier coverage.
  • TestSSRFCheckRedirect — http/file/javascript/ftp redirect targets rejected; 10-hop limit enforced.
  • TestURLSource_Read_RejectsHTTPRedirect — end-to-end: httptest.NewTLSServer 302→http:// is refused.

Existing httptest-based tests migrated to the test-only newURLSourceForTest constructor.

Validation

  • mise lint — clean (golangci-lint, custom ./lint, go mod tidy).
  • mise test — green (go test ./...).
  • go test -race ./pkg/config/... ./pkg/server/... ./cmd/root/... — green.
  • Smoke-tested serve api manually:
    • --listen 127.0.0.1:8080 (default) — no warning.
    • --listen 0.0.0.0:18080 — explicit, prints the WARNING block and serves.

Out of scope

The original review identified ~15 issues. This PR addresses 3 of them (#11 non-loopback binding, #13 race, #3 SSRF). Still open and worth follow-up PRs:

  • No authentication / authorization (critical)
  • Arbitrary working directory (critical)
  • Arbitrary agent code execution / sandboxing (critical)
  • No CSRF / CORS protection
  • No rate limiting / body-size limits
  • No TLS support
  • Sensitive data in logs (header propagation)
  • Missing security headers
  • Audit logging

dgageot added 6 commits April 30, 2026 10:01
The API server has no authentication, so binding it to a routable interface is a remote-code-execution risk. The default --listen value is already 127.0.0.1, so reaching the warning means the operator was explicit; we just remind them and log at WARN.

Assisted-By: docker-agent
runtimeForSession used to do its own Load+Store on sm.runtimeSessions and then RunSession did a second Store with the same key plus a cancel func. Between the two stores, the map briefly held a half-initialised activeRuntimes (no cancel) that other map readers (Steer/FollowUp, which do not hold sm.mux) could observe.

Make runtimeForSession a pure constructor: the caller, which already holds sm.mux and has already checked existence, is the single source of truth for the map mutation.

Assisted-By: docker-agent
Loading an agent from an arbitrary URL was an SSRF vector: any host running the API could be coerced into hitting RFC1918, loopback or 169.254.169.254 (cloud metadata). Apply two layered defences:

- pre-flight: reject non-https schemes in validateAgentURL.

- dial-time: ssrfDialControl rejects any dial whose resolved IP is loopback / private / link-local / multicast / unspecified. Running after DNS resolution but before the TCP handshake also defeats DNS rebinding.

Tests that fetch from httptest.NewServer (plain http on 127.0.0.1) are migrated to a new test-only constructor newURLSourceForTest that bypasses the checks. New tests cover http rejection, loopback / RFC1918 / metadata IP rejection, and the public-IP classifier.

Assisted-By: docker-agent
ssrfSafeHTTPClient was missing a CheckRedirect hook, so an https origin under attacker control could 302 to http://... and the request would be silently downgraded to plain text. SSRF on the redirect target was already covered by the dialer's Control hook, but the scheme bypass let an attacker observe and tamper with the response body and headers.

Add ssrfCheckRedirect (extracted as a named function so it is unit-testable) which rejects any redirect whose scheme is not https and bounds the chain at 10 hops, matching the convention used by pkg/tools/builtin/fetch.go.

Assisted-By: docker-agent
newURLSourceForTest constructs a urlSource with the SSRF and HTTPS checks disabled. Defining it next to NewURLSource shipped that bypass into release binaries (dead code, but a sharp edge for anyone reading the package). Move it to sources_test.go where it is only compiled under "go test".

Assisted-By: docker-agent
- combine consecutive string params in ssrfDialControl (gocritic paramTypeCombine)

- rename local 'url' that shadowed the imported net/url package (gocritic importShadow)

- align table-driven test comments per gofmt/gci

Assisted-By: docker-agent
@dgageot dgageot requested a review from a team as a code owner April 30, 2026 10:29
Copy link
Copy Markdown

@docker-agent docker-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🔴 CRITICAL

This PR makes meaningful progress on the SSRF hardening work. However, two gaps in the new security code were found:

**1. [HIGH] Proxy bypass in ** — routes requests through any configured proxy, which bypasses the hook (dial control only fires for this process's direct TCP connections, not for connections the proxy makes on behalf of this process). If is set to a legitimate corp proxy, an attacker-controlled URL can reach through that proxy.

**2. [MEDIUM] CGNAT range not blocked in ** — Go's covers RFC 1918 + RFC 4193 but not RFC 6598 (100.64.0.0/10 Shared Address Space). In cloud/ISP environments, CGNAT addresses can reach internal-only services.

Everything else looks well-implemented: the DNS-rebinding protection is correct, properly blocks https→http downgrade, the session manager race fix is sound, and the non-loopback warning is straightforward.

Comment thread pkg/config/sources.go
return &http.Client{
Timeout: 60 * time.Second,
Transport: &http.Transport{
Proxy: http.ProxyFromEnvironment,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] SSRF bypass: http.ProxyFromEnvironment defeats dial-level protection

ssrfSafeHTTPClient sets Proxy: http.ProxyFromEnvironment, which means if HTTP_PROXY or HTTPS_PROXY is configured in the environment, all requests are routed through that proxy. The ssrfDialControl hook only fires for TCP dials made by this process — once a legitimate external proxy is connected, the proxy makes its own outbound connection to the target, entirely bypassing the dial-control SSRF protection.

Attack scenario: an operator's environment has HTTPS_PROXY=https://corp-proxy.example.com; an attacker-controlled agent YAML specifies url: https://169.254.169.254/latest/meta-data/.... The dial to corp-proxy.example.com passes (public IP), and the proxy faithfully fetches the metadata endpoint.

Fix: disable proxy support for this SSRF-sensitive client by setting Proxy: nil (or simply omitting Proxy — the zero value for the field disables proxying):

Transport: &http.Transport{
    // Proxy: nil  ← no proxy; ssrfDialControl only works for direct dials
    DialContext: dialer.DialContext,
    ...
}

Comment thread pkg/config/sources.go
// loopback (127/8, ::1), RFC1918 private ranges, link-local (incl. the
// 169.254.169.254 cloud metadata endpoint), multicast and the unspecified
// address (0.0.0.0, ::).
func isPublicIP(ip net.IP) bool {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] isPublicIP does not block 100.64.0.0/10 (RFC 6598 Shared Address Space / CGNAT)

Go's net.IP.IsPrivate() (Go 1.17+) covers only RFC 1918 (10/8, 172.16/12, 192.168/16) and RFC 4193 (fc00::/7). It does not include the RFC 6598 Carrier-Grade NAT range 100.64.0.0/10. In cloud and ISP environments, addresses in this range are often used for internal-only services (e.g., hypervisor APIs, internal load-balancer health endpoints) that are unreachable from the public internet but reachable from within the hosting environment.

An agent URL pointing to 100.64.x.x would pass isPublicIP, allowing the SSRF protection to be bypassed in those environments.

Fix: add an explicit check for the CGNAT range:

var cgnatRange = func() *net.IPNet {
    _, n, _ := net.ParseCIDR("100.64.0.0/10")
    return n
}()

func isPublicIP(ip net.IP) bool {
    return !ip.IsLoopback() &&
        !ip.IsPrivate() &&
        !cgnatRange.Contains(ip) &&  // RFC 6598
        !ip.IsLinkLocalUnicast() &&
        !ip.IsLinkLocalMulticast() &&
        !ip.IsMulticast() &&
        !ip.IsUnspecified()
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant