Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions agent-schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -1402,24 +1402,26 @@
},
"allowed_domains": {
"type": "array",
"description": "Allow-list of domains the fetch tool is permitted to fetch (only valid for type 'fetch'). A pattern matches the host exactly (case-insensitive) and any of its subdomains; e.g. 'example.com' matches 'example.com' and 'docs.example.com' but not 'badexample.com'. A leading dot ('.example.com') restricts the match to strict subdomains. Mutually exclusive with 'blocked_domains'.",
"description": "Allow-list of domains the fetch tool is permitted to fetch (only valid for type 'fetch'). Patterns are case-insensitive. A bare host ('example.com') matches the apex AND any subdomain. A leading dot ('.example.com') or wildcard ('*.example.com') matches strict subdomains only. CIDR ranges ('10.0.0.0/8', '::1/128') match when the URL host parses as an IP inside the network. Mutually exclusive with 'blocked_domains'.",
"items": {
"type": "string"
},
"examples": [
["docker.com", "docs.docker.com"],
["github.com", "raw.githubusercontent.com"]
["github.com", "raw.githubusercontent.com"],
["*.example.com"]
]
},
"blocked_domains": {
"type": "array",
"description": "Deny-list of domains the fetch tool is forbidden to fetch (only valid for type 'fetch'). Uses the same matching rules as 'allowed_domains'. Mutually exclusive with 'allowed_domains'.",
"description": "Deny-list of domains the fetch tool is forbidden to fetch (only valid for type 'fetch'). Uses the same matching rules as 'allowed_domains' (bare host, leading-dot or '*.' subdomain wildcard, and CIDR ranges for IP hosts). Mutually exclusive with 'allowed_domains'.",
"items": {
"type": "string"
},
"examples": [
["internal.example.com"],
["169.254.169.254"]
["169.254.169.254"],
["169.254.0.0/16", "10.0.0.0/8"]
]
},
"url": {
Expand Down
11 changes: 8 additions & 3 deletions docs/tools/fetch/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,9 @@ Domain patterns in `allowed_domains` and `blocked_domains` use the following rul

- **Bare domain** — `example.com` matches the host `example.com` _and_ any subdomain such as `docs.example.com`. It does **not** match unrelated hosts that share a suffix (e.g. `badexample.com`).
- **Leading dot** — `.example.com` matches **only** strict subdomains (`docs.example.com`, `a.b.example.com`), not the apex `example.com`.
- **Wildcard glob** — `*.example.com` is an alias for the leading-dot form; the apex is excluded. The `*` is only valid as a leading `*.` token (entries like `foo.*`, `*.*.example.com`, or a bare `*` are rejected at config-load time).
- **IP literal** — IP addresses are matched exactly (`169.254.169.254`).
- **CIDR range** — `169.254.0.0/16`, `10.0.0.0/8`, `::1/128`, `fc00::/7`. Matches when the URL's host parses as an IP inside the network. Hostname hosts never match a CIDR pattern. Malformed CIDRs are rejected at config-load time.
- **Trailing dots** in FQDN-form URLs (`http://example.com./`) are stripped before matching, so they cannot bypass a deny-list entry.

The lists are mutually exclusive: a single fetch toolset may set either `allowed_domains` or `blocked_domains`, but not both.
Expand All @@ -50,7 +52,7 @@ When a list is configured, every redirect target is re-checked against the same
<div class="callout callout-warning" markdown="1">
<div class="callout-title">⚠️ Limitations
</div>
<p>Matching is purely string-based on the URL host. It does <strong>not</strong> perform DNS resolution and does <strong>not</strong> normalise alternative IP encodings (decimal <code>2852039166</code>, hex <code>0xa9.0xfe.0xa9.0xfe</code>, octal, IPv4-mapped IPv6, etc.). If you need to deny access to a specific IP, also list its alternative encodings, or block at the network layer.</p>
<p>Matching is purely string-based on the URL host. It does <strong>not</strong> perform DNS resolution and does <strong>not</strong> normalise alternative IP encodings (decimal <code>2852039166</code>, hex <code>0xa9.0xfe.0xa9.0xfe</code>, octal, etc. IPv4-mapped IPv6 addresses ARE normalized to their IPv4 form). If you need to deny access to a specific IP, also list its alternative encodings, or block at the network layer.</p>
</div>

### Custom Timeout
Expand Down Expand Up @@ -78,8 +80,11 @@ toolsets:
toolsets:
- type: fetch
blocked_domains:
- 169.254.169.254 # cloud metadata endpoint
- internal.example.com # internal corporate hostnames
- 169.254.169.254 # cloud metadata endpoint (literal IP)
- 169.254.0.0/16 # entire link-local range (CIDR)
- 10.0.0.0/8 # RFC1918 private range
- "*.internal.example.com" # any subdomain (wildcard)
- internal.example.com # internal corporate hostname
```

## Tool Interface
Expand Down
7 changes: 5 additions & 2 deletions examples/fetch_domain_filtering.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ agents:
toolsets:
- type: fetch
blocked_domains:
- 169.254.169.254 # cloud instance metadata
- 100.100.100.200 # alibaba/oracle metadata
- 169.254.169.254 # cloud instance metadata (literal IP)
- 169.254.0.0/16 # whole link-local range (CIDR)
- 10.0.0.0/8 # RFC1918 private network
- 100.100.100.200 # alibaba/oracle metadata
- metadata.google.internal
- "*.internal.example.com" # any subdomain (wildcard glob)
40 changes: 36 additions & 4 deletions pkg/config/latest/validate.go
Original file line number Diff line number Diff line change
Expand Up @@ -215,14 +215,46 @@ func (t *Toolset) validate() error {
return nil
}

// validateDomainPatterns rejects empty / whitespace-only entries in a fetch
// allow- or block-list, since they silently match nothing and turn the list
// into a foot-gun (e.g. allowed_domains: [""] would reject every URL).
// validateDomainPatterns rejects empty / whitespace-only entries and
// malformed wildcard or CIDR patterns in a fetch allow- or block-list.
//
// Catching these at config-load time turns silent foot-guns (e.g.
// `allowed_domains: [""]` rejecting every URL, `*.foo.*` matching nothing)
// into actionable errors. Plain hostnames and the leading-dot subdomain form
// are intentionally not validated for syntax — the matcher is purely
// string-based and any non-conforming entry simply never matches.
func validateDomainPatterns(field string, patterns []string) error {
for i, p := range patterns {
if strings.TrimSpace(p) == "" {
trimmed := strings.TrimSpace(p)
if trimmed == "" {
return fmt.Errorf("%s[%d] must not be empty", field, i)
}
if err := validateDomainPattern(trimmed); err != nil {
return fmt.Errorf("%s[%d] %q is invalid: %w", field, i, p, err)
}
}
return nil
}

// validateDomainPattern checks a single (already trimmed, non-empty) entry.
func validateDomainPattern(p string) error {
// CIDR notation: must parse cleanly. We deliberately accept any /-bearing
// string as "intended to be a CIDR" so a typo like "10.0.0.0/33" is
// reported instead of being silently treated as a hostname.
if strings.Contains(p, "/") {
if _, _, err := net.ParseCIDR(p); err != nil {
return fmt.Errorf("not a valid CIDR: %w", err)
}
return nil
}
// Wildcards: only the leading "*." form is supported. Anything else
// ("foo.*", "*foo*", "**.example.com") would silently match nothing
// under the current matcher, which is almost never what the user wants.
if strings.Contains(p, "*") {
rest, ok := strings.CutPrefix(p, "*.")
if !ok || rest == "" || strings.Contains(rest, "*") {
return errors.New("'*' is only allowed as a leading '*.' wildcard, e.g. '*.example.com'")
}
}
return nil
}
Expand Down
100 changes: 100 additions & 0 deletions pkg/config/latest/validate_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,106 @@ agents:
`,
wantErr: "blocked_domains[0] must not be empty",
},
{
name: "fetch with wildcard subdomain pattern",
config: `
version: "8"
agents:
root:
model: "openai/gpt-4"
toolsets:
- type: fetch
allowed_domains:
- "*.example.com"
`,
wantErr: "",
},
{
name: "fetch with ipv4 CIDR pattern",
config: `
version: "8"
agents:
root:
model: "openai/gpt-4"
toolsets:
- type: fetch
blocked_domains:
- 169.254.0.0/16
- 10.0.0.0/8
`,
wantErr: "",
},
{
name: "fetch with ipv6 CIDR pattern",
config: `
version: "8"
agents:
root:
model: "openai/gpt-4"
toolsets:
- type: fetch
blocked_domains:
- "fc00::/7"
- "::1/128"
`,
wantErr: "",
},
{
name: "malformed CIDR is rejected",
config: `
version: "8"
agents:
root:
model: "openai/gpt-4"
toolsets:
- type: fetch
blocked_domains:
- 10.0.0.0/33
`,
wantErr: "not a valid CIDR",
},
{
name: "interior wildcard is rejected",
config: `
version: "8"
agents:
root:
model: "openai/gpt-4"
toolsets:
- type: fetch
allowed_domains:
- "foo.*"
`,
wantErr: "'*' is only allowed as a leading '*.' wildcard",
},
{
name: "double wildcard is rejected",
config: `
version: "8"
agents:
root:
model: "openai/gpt-4"
toolsets:
- type: fetch
allowed_domains:
- "*.*.example.com"
`,
wantErr: "'*' is only allowed as a leading '*.' wildcard",
},
{
name: "bare star is rejected",
config: `
version: "8"
agents:
root:
model: "openai/gpt-4"
toolsets:
- type: fetch
allowed_domains:
- "*"
`,
wantErr: "'*' is only allowed as a leading '*.' wildcard",
},
}

for _, tt := range tests {
Expand Down
82 changes: 75 additions & 7 deletions pkg/tools/builtin/fetch.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
"errors"
"fmt"
"io"
"net"
"net/http"
"net/url"
"slices"
Expand Down Expand Up @@ -296,18 +297,85 @@ func (h *fetchHandler) checkDomainAllowed(u *url.URL) error {

// matchesDomain reports whether host matches pattern (case-insensitive).
//
// A bare pattern ("example.com") matches the host exactly or any subdomain
// ("docs.example.com"); it does NOT match unrelated hosts that share a suffix
// ("badexample.com"). A pattern with a leading dot (".example.com") matches
// strict subdomains only — the apex "example.com" is excluded.
// Supported pattern shapes:
//
// - **Bare domain** ("example.com") matches the host exactly _or_ any
// subdomain ("docs.example.com"); it does NOT match unrelated hosts that
// share a suffix ("badexample.com").
// - **Leading dot** (".example.com") matches strict subdomains only — the
// apex "example.com" is excluded.
// - **Wildcard glob** ("*.example.com") is an alias for the leading-dot
// form: it matches strict subdomains only. No other use of "*" is
// supported (e.g. "foo.*", "*foo*" are rejected by validation and would
// never match here).
// - **CIDR** ("10.0.0.0/8", "169.254.0.0/16", "::1/128", "fc00::/7")
// matches when the host parses as an IP address inside the network.
// Hostname hosts never match a CIDR pattern.
//
// Trailing dots used in FQDN form ("example.com.") are stripped from both
// host and pattern before matching, so a URL like http://example.com./ cannot
// be used to bypass a deny-list entry for example.com.
func matchesDomain(host, pattern string) bool {
host = strings.TrimSuffix(strings.ToLower(strings.TrimSpace(host)), ".")
pattern = strings.TrimSuffix(strings.ToLower(strings.TrimSpace(pattern)), ".")
if host == "" || pattern == "" || pattern == "." {
host = strings.TrimSpace(host)
pattern = strings.TrimSpace(pattern)
if host == "" || pattern == "" {
return false
}

// CIDR pattern: the host must parse as an IP address inside the network.
// CIDRs always contain '/', so we can detect them cheaply before any other
// normalisation. Hostname-style hosts never match a CIDR pattern.
if strings.Contains(pattern, "/") {
if _, ipNet, err := net.ParseCIDR(pattern); err == nil {
// url.Hostname() already strips IPv6 brackets, but be defensive.
ipStr := strings.TrimSuffix(strings.Trim(host, "[]"), ".")
if ip := net.ParseIP(ipStr); ip != nil {
// Normalize IPv4-mapped IPv6 addresses (::ffff:a.b.c.d) to their
// IPv4 form before checking CIDR membership. Without this, an
// attacker can bypass an IPv4 deny-list like "169.254.0.0/16" by
// using the IPv6-mapped form "::ffff:169.254.169.254".
//
// net.IP.To4() returns nil for "true" IPv6 addresses and the
// 4-byte IPv4 form for IPv4 or IPv4-mapped-IPv6.
if ipv4 := ip.To4(); ipv4 != nil {
return ipNet.Contains(ipv4)
}
return ipNet.Contains(ip)
}
return false
}
// Malformed CIDRs are rejected at config-load time; if one slips
// through (e.g. via the programmatic API), fall through to the
// string matcher below, which will never match a host.
}

// Normalize IPv4-mapped IPv6 addresses to their IPv4 form for string
// comparison. This ensures that "::ffff:169.254.169.254" matches a
// literal pattern "169.254.169.254" (and vice versa).
if ip := net.ParseIP(strings.Trim(host, "[]")); ip != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Inconsistency: non-CIDR IP normalisation path does not strip trailing dot before net.ParseIP

The CIDR path (line 331) correctly chains both bracket-stripping and trailing-dot stripping before calling net.ParseIP:

// CIDR path — correct
ipStr := strings.TrimSuffix(strings.Trim(host, "[]"), ".")
if ip := net.ParseIP(ipStr); ip != nil {

The non-CIDR normalisation path (line 355) only strips brackets, not the trailing dot:

// Non-CIDR path — inconsistent
if ip := net.ParseIP(strings.Trim(host, "[]")); ip != nil {

For a host like [::ffff:169.254.169.254]. (bracketed IPv6 with FQDN trailing dot), strings.Trim(host, "[]") yields ::ffff:169.254.169.254]. (the trailing . is not in the cutset "[]"), so net.ParseIP returns nil and the host falls through un-normalised to the string comparison stage. The bypass is theoretical in practice since url.Hostname() strips brackets and would not return a host in this form — but the inconsistency between the two paths in the changed code is real.

Suggested fix: Apply the same pattern as the CIDR path:

if ip := net.ParseIP(strings.TrimSuffix(strings.Trim(host, "[]"), ".")); ip != nil {

if ipv4 := ip.To4(); ipv4 != nil {
host = ipv4.String()
} else {
host = ip.String()
}
}
if ip := net.ParseIP(strings.Trim(pattern, "[]")); ip != nil {
if ipv4 := ip.To4(); ipv4 != nil {
pattern = ipv4.String()
} else {
pattern = ip.String()
}
}

host = strings.TrimSuffix(strings.ToLower(host), ".")
pattern = strings.TrimSuffix(strings.ToLower(pattern), ".")

// Wildcard glob "*.example.com" is an alias for ".example.com".
if rest, ok := strings.CutPrefix(pattern, "*."); ok {
pattern = "." + rest
}

if pattern == "" || pattern == "." {
return false
}
if strings.HasPrefix(pattern, ".") {
Expand Down
Loading
Loading