harden docker agent serve api: warn on non-loopback, fix runtime race, block SSRF#2604
Conversation
The API server has no authentication, so binding it to a routable interface is a remote-code-execution risk. The default --listen value is already 127.0.0.1, so reaching the warning means the operator was explicit; we just remind them and log at WARN. Assisted-By: docker-agent
runtimeForSession used to do its own Load+Store on sm.runtimeSessions and then RunSession did a second Store with the same key plus a cancel func. Between the two stores, the map briefly held a half-initialised activeRuntimes (no cancel) that other map readers (Steer/FollowUp, which do not hold sm.mux) could observe. Make runtimeForSession a pure constructor: the caller, which already holds sm.mux and has already checked existence, is the single source of truth for the map mutation. Assisted-By: docker-agent
Loading an agent from an arbitrary URL was an SSRF vector: any host running the API could be coerced into hitting RFC1918, loopback or 169.254.169.254 (cloud metadata). Apply two layered defences: - pre-flight: reject non-https schemes in validateAgentURL. - dial-time: ssrfDialControl rejects any dial whose resolved IP is loopback / private / link-local / multicast / unspecified. Running after DNS resolution but before the TCP handshake also defeats DNS rebinding. Tests that fetch from httptest.NewServer (plain http on 127.0.0.1) are migrated to a new test-only constructor newURLSourceForTest that bypasses the checks. New tests cover http rejection, loopback / RFC1918 / metadata IP rejection, and the public-IP classifier. Assisted-By: docker-agent
ssrfSafeHTTPClient was missing a CheckRedirect hook, so an https origin under attacker control could 302 to http://... and the request would be silently downgraded to plain text. SSRF on the redirect target was already covered by the dialer's Control hook, but the scheme bypass let an attacker observe and tamper with the response body and headers. Add ssrfCheckRedirect (extracted as a named function so it is unit-testable) which rejects any redirect whose scheme is not https and bounds the chain at 10 hops, matching the convention used by pkg/tools/builtin/fetch.go. Assisted-By: docker-agent
newURLSourceForTest constructs a urlSource with the SSRF and HTTPS checks disabled. Defining it next to NewURLSource shipped that bypass into release binaries (dead code, but a sharp edge for anyone reading the package). Move it to sources_test.go where it is only compiled under "go test". Assisted-By: docker-agent
- combine consecutive string params in ssrfDialControl (gocritic paramTypeCombine) - rename local 'url' that shadowed the imported net/url package (gocritic importShadow) - align table-driven test comments per gofmt/gci Assisted-By: docker-agent
There was a problem hiding this comment.
Assessment: 🔴 CRITICAL
This PR makes meaningful progress on the SSRF hardening work. However, two gaps in the new security code were found:
**1. [HIGH] Proxy bypass in ** — routes requests through any configured proxy, which bypasses the hook (dial control only fires for this process's direct TCP connections, not for connections the proxy makes on behalf of this process). If is set to a legitimate corp proxy, an attacker-controlled URL can reach through that proxy.
**2. [MEDIUM] CGNAT range not blocked in ** — Go's covers RFC 1918 + RFC 4193 but not RFC 6598 (100.64.0.0/10 Shared Address Space). In cloud/ISP environments, CGNAT addresses can reach internal-only services.
Everything else looks well-implemented: the DNS-rebinding protection is correct, properly blocks https→http downgrade, the session manager race fix is sound, and the non-loopback warning is straightforward.
| return &http.Client{ | ||
| Timeout: 60 * time.Second, | ||
| Transport: &http.Transport{ | ||
| Proxy: http.ProxyFromEnvironment, |
There was a problem hiding this comment.
[HIGH] SSRF bypass: http.ProxyFromEnvironment defeats dial-level protection
ssrfSafeHTTPClient sets Proxy: http.ProxyFromEnvironment, which means if HTTP_PROXY or HTTPS_PROXY is configured in the environment, all requests are routed through that proxy. The ssrfDialControl hook only fires for TCP dials made by this process — once a legitimate external proxy is connected, the proxy makes its own outbound connection to the target, entirely bypassing the dial-control SSRF protection.
Attack scenario: an operator's environment has HTTPS_PROXY=https://corp-proxy.example.com; an attacker-controlled agent YAML specifies url: https://169.254.169.254/latest/meta-data/.... The dial to corp-proxy.example.com passes (public IP), and the proxy faithfully fetches the metadata endpoint.
Fix: disable proxy support for this SSRF-sensitive client by setting Proxy: nil (or simply omitting Proxy — the zero value for the field disables proxying):
Transport: &http.Transport{
// Proxy: nil ← no proxy; ssrfDialControl only works for direct dials
DialContext: dialer.DialContext,
...
}| // loopback (127/8, ::1), RFC1918 private ranges, link-local (incl. the | ||
| // 169.254.169.254 cloud metadata endpoint), multicast and the unspecified | ||
| // address (0.0.0.0, ::). | ||
| func isPublicIP(ip net.IP) bool { |
There was a problem hiding this comment.
[MEDIUM] isPublicIP does not block 100.64.0.0/10 (RFC 6598 Shared Address Space / CGNAT)
Go's net.IP.IsPrivate() (Go 1.17+) covers only RFC 1918 (10/8, 172.16/12, 192.168/16) and RFC 4193 (fc00::/7). It does not include the RFC 6598 Carrier-Grade NAT range 100.64.0.0/10. In cloud and ISP environments, addresses in this range are often used for internal-only services (e.g., hypervisor APIs, internal load-balancer health endpoints) that are unreachable from the public internet but reachable from within the hosting environment.
An agent URL pointing to 100.64.x.x would pass isPublicIP, allowing the SSRF protection to be bypassed in those environments.
Fix: add an explicit check for the CGNAT range:
var cgnatRange = func() *net.IPNet {
_, n, _ := net.ParseCIDR("100.64.0.0/10")
return n
}()
func isPublicIP(ip net.IP) bool {
return !ip.IsLoopback() &&
!ip.IsPrivate() &&
!cgnatRange.Contains(ip) && // RFC 6598
!ip.IsLinkLocalUnicast() &&
!ip.IsLinkLocalMulticast() &&
!ip.IsMulticast() &&
!ip.IsUnspecified()
}
Hardens the
docker agent serve apicommand and the URL-based agent loadingpath. This addresses three of the issues surfaced by a security review of
serve api(warn-on-public-bind, runtime race, SSRF/HTTPS); the rest of thereview (auth/authz, CSRF, rate limiting, etc.) remains open.
Commits
49e91c8fdwarn when serve api binds to a non-loopback address--listenis already127.0.0.1:8080. When the operator is explicit about exposing the API (e.g.--listen 0.0.0.0:...), print aWARNING:block andslog.Warnso the lack of authentication is not silent.de2e8671efix race inSessionManager.runtimeForSessionruntimeForSessionused to do its ownLoad+Storeonsm.runtimeSessions, thenRunSessiondid a secondStorewith acancelfunc. Between the two stores, the map briefly held a half-initialisedactiveRuntimes(nocancel) observable bySteer/FollowUp(which do not holdsm.mux).runtimeForSessionis now a pure constructor; the caller is the single source of truth for the map mutation.69412c042reject non-public addresses and require https for url agent sourcesurlSource.Read: pre-flightvalidateAgentURLrejects non-httpsschemes, and anssrfDialControlhook onnet.Dialerrejects any dial whose resolved IP is loopback / RFC1918 / link-local / multicast / unspecified. Running at dial time also defeats DNS rebinding. Closes the SSRF vector against AWS/GCP/Azure metadata (169.254.169.254) and internal services.53aff9a9fblock https-to-http redirects when fetching url agent sourcesssrfSafeHTTPClientwas missingCheckRedirect. An https origin under attacker control could 302 tohttp://...and silently downgrade. Adds a namedssrfCheckRedirect(testable) that rejects non-https redirect targets and bounds the chain at 10 hops.9ceae882emovenewURLSourceForTestinto the_test.gofilee36e54ba1fix lint issues in url agent sourcegocritic/paramTypeCombine,gocritic/importShadow(localurlshadowednet/url), andgci/gofmt comment alignment.Tests
New tests added:
TestURLSource_Read_RejectsHTTP—http://URLs rejected.TestURLSource_Read_RejectsLocalAddresses— loopback (v4 + v6), RFC1918,169.254.169.254,0.0.0.0all rejected at dial time.TestIsPublicIP— table-driven classifier coverage.TestSSRFCheckRedirect— http/file/javascript/ftp redirect targets rejected; 10-hop limit enforced.TestURLSource_Read_RejectsHTTPRedirect— end-to-end:httptest.NewTLSServer302→http://is refused.Existing httptest-based tests migrated to the test-only
newURLSourceForTestconstructor.Validation
mise lint— clean (golangci-lint, custom./lint,go mod tidy).mise test— green (go test ./...).go test -race ./pkg/config/... ./pkg/server/... ./cmd/root/...— green.serve apimanually:--listen 127.0.0.1:8080(default) — no warning.--listen 0.0.0.0:18080— explicit, prints the WARNING block and serves.Out of scope
The original review identified ~15 issues. This PR addresses 3 of them (#11 non-loopback binding, #13 race, #3 SSRF). Still open and worth follow-up PRs: