diff --git a/README.md b/README.md index d6186e9..0f9240d 100644 --- a/README.md +++ b/README.md @@ -60,6 +60,7 @@ Skills are contextual and auto-loaded based on your conversation. When a request | sandbox-sdk | Secure code execution for AI code execution, code interpreters, CI/CD systems, and interactive dev environments | | wrangler | Deploying and managing Workers, KV, R2, D1, Vectorize, Queues, Workflows | | web-perf | Auditing Core Web Vitals (FCP, LCP, TBT, CLS), render-blocking resources, network chains | +| agent-ready | Making a Cloudflare-hosted site discoverable by AI agents — Link headers, RFC 9727 api-catalog, MCP/A2A cards, agent-skills index, llms.txt, Content-Signal, Markdown for Agents, OAuth discovery, DNS-AID | | building-mcp-server-on-cloudflare | Building remote MCP servers with tools, OAuth, and deployment | | building-ai-agent-on-cloudflare | Building AI agents with state, WebSockets, and tool integration | diff --git a/skills/agent-ready/SKILL.md b/skills/agent-ready/SKILL.md new file mode 100644 index 0000000..c0704a8 --- /dev/null +++ b/skills/agent-ready/SKILL.md @@ -0,0 +1,78 @@ +--- +name: agent-ready +description: Make a Cloudflare-hosted site discoverable and usable by AI agents — publish the agent-discovery signals (RFC 8288 Link headers, RFC 9727 api-catalog, MCP Server Card, A2A Agent Card, agent-skills index, llms.txt, security.txt, AIPREF Content-Signal, Markdown for Agents, OAuth/OIDC discovery, and DNS-AID SVCB records + DNSSEC). Load when a user asks to "make my site agent-ready", "pass isitagentready", "add agent discovery", "publish an api-catalog / MCP server card / A2A agent card / llms.txt", "expose tools to agents (WebMCP)", or fix any of those signals on a site fronted by Cloudflare. +references: + - dedicated-discovery-worker + - dns-aid + - troubleshooting +--- + +# Agent-ready skill + +Turns "make my site discoverable to AI agents" into the concrete set of HTTP, DNS, and well-known signals that agent crawlers (e.g. isitagentready.com) and autonomous agents look for — implemented the Cloudflare-native way so you never have to redeploy or risk the main application. + +You are the agent. Implement the signals the user is missing, then **verify each one over the wire** with `curl`/`dig` before reporting success. Most failures are not code bugs — they are routing, auth-gate, caching, or commit-email problems specific to how the site is served on Cloudflare. The "Gotchas" section is the most valuable part of this skill; read it before you touch anything. + +## When to load this skill + +Load when the user mentions any of: +- "agent-ready", "isitagentready", "agent discovery", "discoverable by agents" +- a specific signal: "Link header", "api-catalog", "MCP server card", "A2A agent card", "agent-skills index", "llms.txt", "security.txt", "Content-Signal", "Markdown for Agents", "OAuth discovery", "DNS-AID", "WebMCP" +- the site is behind Cloudflare (Workers, Pages, or just Cloudflare DNS/proxy) + +## The signals (what to publish, and where) + +| Signal | Path / location | Content-Type | Spec | +|--------|-----------------|--------------|------| +| Link headers | response header on `/` (all pages) | — | RFC 8288 | +| API Catalog | `/.well-known/api-catalog` | `application/linkset+json` | RFC 9727 / 9264 | +| MCP Server Card | `/.well-known/mcp/server-card.json` | `application/json` | SEP-1649 | +| A2A Agent Card | `/.well-known/agent-card.json` | `application/json` | a2a-protocol.org | +| Agent Skills index | `/.well-known/agent-skills/index.json` | `application/json` | agentskills.io v0.2.0 | +| llms.txt | `/llms.txt` | `text/plain` | llmstxt.org | +| security.txt | `/.well-known/security.txt` | `text/plain` | RFC 9116 | +| Content-Signal | `/robots.txt` (`Content-Signal:` line) | `text/plain` | AIPREF / contentsignals.org | +| Markdown for Agents | content-negotiated on every HTML page | `text/markdown` | Cloudflare zone setting | +| OAuth discovery | `/.well-known/oauth-authorization-server` + `/.well-known/oauth-protected-resource` | `application/json` | RFC 8414 / 9728 | +| DNS-AID | `_index._agents`, `_mcp._agents`, `_a2a._agents` SVCB records + DNSSEC | DNS | draft-mozleywilliams-dnsop-dnsaid + RFC 9460 | +| WebMCP | `navigator.modelContext.provideContext()` client JS | — | webmachinelearning.github.io/webmcp | + +## Recommended architecture (read this first) + +**Do NOT add these routes to the user's main application worker** unless that is the only option. On most real sites the main worker is large, gated behind auth middleware, or diverged from its git source — touching it is risky and slow. Instead: + +1. **A dedicated "discovery" Worker on more-specific routes.** Serve every JSON/markdown well-known document from one small Worker bound to *specific* routes (`example.com/.well-known/api-catalog`, `.../agent-card.json`, `/llms.txt`, …). Cloudflare routes the most-specific match first, so these win over the main `example.com/*` worker and the main app is never modified. See `references/dedicated-discovery-worker.md`. +2. **Response headers via a Transform Rule, not code.** The homepage `Link` header is best set with a zone `http_response_headers_transform` rule — no worker, no redeploy, applies regardless of which worker serves the page. +3. **`Content-Signal` / `robots.txt`** can also be served from the discovery worker (more-specific `/robots.txt` route) so you don't redeploy the main app just to add one line. +4. **Markdown for Agents** is a native zone setting — flip it, no code: `PATCH /zones/{zone}/settings/content_converter {"value":"on"}`. +5. **DNS-AID** is DNS records + DNSSEC on the zone. See `references/dns-aid.md`. +6. **WebMCP** is the one signal that *must* live in the page's client JS (the main app), because it registers tools on `navigator.modelContext` at page load. Ship it as a small, feature-detected client component. + +This split means ~11 of the 12 signals ship without ever redeploying the user's application. + +## Flow + +1. **Auth + scope.** You need a Cloudflare API token with the right scopes for what you'll touch: **Workers Scripts:Edit** (discovery worker), **Zone:Edit / Zone Settings:Edit** (Transform Rule, content_converter), **DNS:Edit** + **DNSSEC** (DNS-AID). `wrangler`'s OAuth token is usually `zone:read` only — get a real API token. Never write the token to a shared file or print it. +2. **Measure first.** Run `scripts/audit.sh ` (or curl each path) to see which signals already pass. Many "failures" reported by a scanner are stale — re-measure live before building. +3. **Build the missing signals** using the dedicated-worker + Transform-Rule approach. Author real content (don't ship empty arrays): the api-catalog should list the site's real APIs; the agent-skills index entries need a real `sha256` (compute it with `crypto.subtle` over the served document at request time). +4. **Verify every signal over the wire** — status code AND content-type AND a content sanity check. A `200 text/html` on a `.json` path means the route detached and fell through to the app (see Gotchas). +5. **DNSSEC** can be *enabled* at Cloudflare by you, but it only validates once the **DS record is published at the registrar** — which the user must do if the domain isn't registered at Cloudflare. Surface the DS record; don't claim DNSSEC is done while it's `pending`. +6. **Report** per-signal: live status + the one or two items that need the user (registrar DS, a mailbox for security.txt, a main-app deploy for WebMCP). + +## Gotchas (hard-won — these are why your fix "isn't working") + +- **A `.json` well-known path returns `200 text/html`** → your route detached and the request fell through to the main app, which served its gate/login HTML. Re-deploy the discovery worker to re-attach routes; confirm with `curl -sI` that the content-type is JSON. Discovery-worker routes can silently detach on some account/route changes — a redeploy is the idempotent fix. +- **A well-known path returns `307`/redirect** → it's hitting the main app's auth gate. Serve it from the discovery worker on a more-specific route, OR add the path to the app's public allowlist. Well-known URIs (RFC 8615) must be public. +- **`Link` header present in `curl` but the scanner says missing** → the scan was taken before your change (scanners cache), OR you used `rel=token` unquoted and the parser wants quotes. Prefer `rel="api-catalog"`. (When setting via a Transform Rule, escape the quotes in the JSON body, or the API rejects it.) +- **OAuth/oauth-protected-resource fails with "origin mismatch"** → the doc hardcodes one host but the scanner hit the other (`www` vs apex). Build `resource`/`issuer` per-request from the request origin so both hosts validate. +- **Edge-cached discovery docs read stale after an edit** → they're `Cache-Control: public, max-age=...`. Either wait out the TTL or purge cache (needs a token with **Cache Purge** scope). +- **DNS-AID records "found" but DNSSEC "not validated"** → DNSSEC is `pending` because the **DS record isn't at the registrar**. If the domain is registered at Cloudflare it auto-activates; otherwise the user must paste the DS at their registrar (e.g. Squarespace, GoDaddy). +- **Vercel/CI "No GitHub account matching commit author email" / "Deployment blocked"** → the *commit author email* isn't a verified email on a GitHub account — not a code error. Use a recognized author email (the account's verified email or the GitHub `noreply`), or add the email under GitHub → Settings → Emails. +- **WebMCP "no tools detected"** → it must be registered in client JS at page load and is a Chrome origin-trial API. Feature-detect `navigator.modelContext` and no-op where absent; it only "passes" in a browser that supports it. + +## Things you must NOT do +- Don't gate the discovery documents behind auth — they must be publicly fetchable. +- Don't ship empty/placeholder catalogs or skills arrays just to make a scanner pass; advertise the site's real, reachable resources. +- Don't enable DNSSEC and report it "done" while status is `pending` and no DS is at the registrar. +- Don't modify the main application worker for header/well-known signals when a dedicated worker + Transform Rule will do it without a redeploy. +- Don't write API tokens to shared files or print them in output. diff --git a/skills/agent-ready/references/dedicated-discovery-worker.md b/skills/agent-ready/references/dedicated-discovery-worker.md new file mode 100644 index 0000000..2547502 --- /dev/null +++ b/skills/agent-ready/references/dedicated-discovery-worker.md @@ -0,0 +1,88 @@ +# Dedicated discovery Worker + +Serve every well-known / discovery document from one small Worker bound to **more-specific routes**, so Cloudflare routes them before the main `example.com/*` worker and the main application is never modified. + +## wrangler.toml + +```toml +name = "site-agent-discovery" +main = "src/index.js" +compatibility_date = "2026-01-01" + +routes = [ + { pattern = "example.com/.well-known/api-catalog", zone_name = "example.com" }, + { pattern = "example.com/.well-known/agent-card.json", zone_name = "example.com" }, + { pattern = "example.com/.well-known/mcp/server-card.json", zone_name = "example.com" }, + { pattern = "example.com/.well-known/agent-skills/index.json", zone_name = "example.com" }, + { pattern = "example.com/.well-known/oauth-authorization-server", zone_name = "example.com" }, + { pattern = "example.com/.well-known/oauth-protected-resource", zone_name = "example.com" }, + { pattern = "example.com/.well-known/security.txt", zone_name = "example.com" }, + { pattern = "example.com/llms.txt", zone_name = "example.com" }, + { pattern = "example.com/robots.txt", zone_name = "example.com" }, + # repeat each for www. (and apex) so both hosts are covered +] +``` + +Add `www.` (and apex) variants of every route — scanners hit both, and per-host coverage avoids the origin-mismatch failure. + +## src/index.js (shape) + +```js +const json = (obj, ct = "application/json; charset=utf-8") => + new Response(JSON.stringify(obj, null, 2), { + headers: { "content-type": ct, "cache-control": "public, max-age=3600", "access-control-allow-origin": "*" }, + }); + +// sha256 for the agent-skills index entries (computed at request time) +async function sha256hex(s) { + const b = await crypto.subtle.digest("SHA-256", new TextEncoder().encode(s)); + return [...new Uint8Array(b)].map((x) => x.toString(16).padStart(2, "0")).join(""); +} + +export default { + async fetch(request) { + const { pathname, origin } = new URL(request.url); // origin = www OR apex → build docs per-request + if (pathname === "/.well-known/api-catalog") + return json(catalog(origin), 'application/linkset+json; profile="https://www.rfc-editor.org/info/rfc9727"'); + if (pathname === "/.well-known/agent-card.json") return json(agentCard(origin)); + if (pathname === "/.well-known/mcp/server-card.json") return json(mcpCard(origin)); + if (pathname === "/.well-known/agent-skills/index.json") return json(await skills(origin)); + if (pathname === "/.well-known/oauth-protected-resource") return json(protectedResource(origin)); + if (pathname === "/.well-known/oauth-authorization-server") return json(authServer(origin)); + if (pathname === "/.well-known/security.txt") return new Response(securityTxt, { headers: { "content-type": "text/plain; charset=utf-8" } }); + if (pathname === "/llms.txt") return new Response(llms, { headers: { "content-type": "text/plain; charset=utf-8" } }); + if (pathname === "/robots.txt") return new Response(robots(origin), { headers: { "content-type": "text/plain; charset=utf-8" } }); + return new Response("Not found", { status: 404 }); + }, +}; +``` + +Key points: +- Build `resource`/`issuer`/anchors from the **request origin** so `www` and apex both validate (no hardcoded host → no origin-mismatch failure). +- The API Catalog (RFC 9727) is an RFC 9264 **linkset**: `{ "linkset": [ { "anchor": "/", "service-desc": [{ href, type }], "related": [...] } ] }`. +- If `/llms.txt` or `/robots.txt` already exists in the (gated) main app, you can **proxy** it from the discovery worker with a pre-provisioned service key to un-gate it, instead of duplicating the content. + +## Link header — Transform Rule (no worker) + +```bash +curl -X PUT "https://api.cloudflare.com/client/v4/zones/$ZONE/rulesets/phases/http_response_headers_transform/entrypoint" \ + -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" --data '{ + "rules": [{ + "action": "rewrite", + "action_parameters": { "headers": { "Link": { "operation": "set", + "value": "; rel=\"api-catalog\", ; rel=\"mcp-server\"" } } }, + "expression": "(http.host in {\"example.com\" \"www.example.com\"} and http.request.uri.path eq \"/\")", + "description": "RFC 8288 Link header for agent discovery" + }] + }' +``` + +`PUT .../entrypoint` creates the phase ruleset if absent. GET it first and merge if other response-header rules already exist. + +## Markdown for Agents (zone setting, no code) + +```bash +curl -X PATCH "https://api.cloudflare.com/client/v4/zones/$ZONE/settings/content_converter" \ + -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" --data '{"value":"on"}' +``` +Requests with `Accept: text/markdown` then receive a markdown rendering; browsers still get HTML. diff --git a/skills/agent-ready/references/dns-aid.md b/skills/agent-ready/references/dns-aid.md new file mode 100644 index 0000000..aaff9e7 --- /dev/null +++ b/skills/agent-ready/references/dns-aid.md @@ -0,0 +1,51 @@ +# DNS-AID — DNS for AI Discovery + +Publish ServiceMode SVCB records under `_agents.` so resolvers/agents can discover entrypoints via DNS, and sign the zone with DNSSEC so the answers are authenticated. + +Spec: `draft-mozleywilliams-dnsop-dnsaid` (Internet-Draft) + RFC 9460 (SVCB/HTTPS). + +## Records + +Point each label at the real host for that entrypoint: + +```dns +_index._agents.example.com. 3600 IN SVCB 1 example.com. ( alpn="h2,h3" port=443 ) ; → /.well-known/api-catalog +_mcp._agents.example.com. 3600 IN SVCB 1 mcp.example.com. ( alpn="h2" port=443 ) ; → MCP server +_a2a._agents.example.com. 3600 IN SVCB 1 agents.example.com. ( alpn="h2" port=443 ) ; → A2A endpoint +``` + +Only publish a label if the target host actually exists. If your MCP server lives on `*.workers.dev`, set the SVCB TargetName to that host directly rather than inventing a subdomain. + +## Create via Cloudflare API + +```bash +ZONE= +api(){ curl -s -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE/dns_records" \ + -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" --data "$1" | jq -c '{ok:.success, name:.result.name}'; } + +api '{"type":"SVCB","name":"_index._agents","data":{"priority":1,"target":"example.com","value":"alpn=\"h2,h3\" port=443"},"ttl":3600}' +api '{"type":"SVCB","name":"_mcp._agents","data":{"priority":1,"target":"mcp.example.com","value":"alpn=\"h2\" port=443"},"ttl":3600}' +``` + +The draft's `endpoint=` SvcParam isn't an IANA-registered SvcParamKey; Cloudflare may reject it. Standard `alpn`/`port` params satisfy "ServiceMode SVCB with alpn"; convey the path via the api-catalog the `_index` record points to. + +## DNSSEC + +```bash +# enable signing at Cloudflare (safe: no resolution impact until the DS is at the registrar) +curl -s -X PATCH "https://api.cloudflare.com/client/v4/zones/$ZONE/dnssec" \ + -H "Authorization: Bearer $TOKEN" --data '{"status":"active"}' | jq -c '{status:.result.status}' # -> "pending" +# fetch the DS record to hand to the registrar +curl -s "https://api.cloudflare.com/client/v4/zones/$ZONE/dnssec" -H "Authorization: Bearer $TOKEN" \ + | jq -r '.result | "DS \(.key_tag) \(.algorithm) \(.digest_type) \(.digest)"' +``` + +Then **add that DS at the registrar** (Settings → DNSSEC). Status flips `pending → active` once the parent zone has the DS. If the domain is registered at Cloudflare it auto-activates; otherwise this is a manual step only the domain owner can do. + +## Verify + +```bash +dig _index._agents.example.com TYPE64 +short # SVCB present (old dig won't pretty-print; raw \# hex is fine) +dig +dnssec example.com SOA | grep RRSIG # zone signed +delv _mcp._agents.example.com SVCB # authenticated answer once DS is live +``` diff --git a/skills/agent-ready/references/troubleshooting.md b/skills/agent-ready/references/troubleshooting.md new file mode 100644 index 0000000..9c56e7a --- /dev/null +++ b/skills/agent-ready/references/troubleshooting.md @@ -0,0 +1,28 @@ +# Troubleshooting agent-discovery signals + +Symptom → cause → fix. Most "it's not working" cases are routing/auth/cache/identity, not content. + +| Symptom | Cause | Fix | +|---|---|---| +| `.json` well-known path returns **`200 text/html`** | route detached → request fell through to the main app, which served gate/login HTML | redeploy the discovery worker (idempotent — re-attaches routes); confirm `content-type` is JSON via `curl -sI` | +| well-known path returns **`307`/redirect** | hitting the main app's auth gate | serve it from the discovery worker (more-specific route) or add the path to the app's public allowlist; RFC 8615 well-known URIs must be public | +| `Link` header in `curl` but scanner says **missing** | scanner result is cached/pre-change, or unquoted `rel=token` not parsed | re-scan; use `rel="api-catalog"` (quoted). In a Transform Rule JSON body, escape the quotes or the API 400s | +| OAuth doc fails **"origin mismatch"** | doc hardcodes one host; scanner hit the other (`www` vs apex) | build `resource`/`issuer`/anchors from the request origin so both hosts validate | +| edited discovery doc still **reads stale** | `Cache-Control: public, max-age=…` at the edge | wait out TTL, or purge cache (token needs **Cache Purge** scope) | +| DNS-AID **"records found, DNSSEC not validated"** | DNSSEC `pending` — DS not at registrar | publish the DS at the registrar (auto if domain is on Cloudflare Registrar) | +| CI/Vercel **"No GitHub account matching commit author email"** / "Deployment blocked" | commit *author email* isn't verified on a GitHub account — not a code error | use a recognized author email, or add the email under GitHub → Settings → Emails. Note: squash-merge to the default branch usually re-authors to the account email, so the block often only affects *branch preview* builds | +| **WebMCP** "no tools detected" | must register in client JS at page load; Chrome origin-trial API | feature-detect `navigator.modelContext`, register on mount, no-op where absent | +| api-catalog/agent-skills "**returned HTML**" | same as the `307`/detached cases above | route through the discovery worker; verify JSON content-type | + +## Token scopes by task + +| Task | Required Cloudflare API token scope | +|---|---| +| Deploy discovery worker | Account → Workers Scripts:Edit | +| Link Transform Rule | Zone → Zone:Edit (rulesets) | +| Markdown for Agents (`content_converter`) | Zone → Zone Settings:Edit | +| DNS-AID records | Zone → DNS:Edit | +| Enable DNSSEC | Zone → DNS:Edit (DNSSEC) | +| Purge stale discovery docs | Zone → Cache Purge:Purge | + +`wrangler`'s OAuth login is typically `zone:read` only — insufficient for the writes above. Create a scoped API token and pass it as `CLOUDFLARE_API_TOKEN`. Never write it to a shared file or print it. diff --git a/skills/agent-ready/scripts/audit.sh b/skills/agent-ready/scripts/audit.sh new file mode 100755 index 0000000..094c82e --- /dev/null +++ b/skills/agent-ready/scripts/audit.sh @@ -0,0 +1,52 @@ +#!/usr/bin/env bash +# audit.sh — measure which agent-discovery signals a site already serves. +# Read-only: only GETs / HEADs public URLs and queries public DNS. No credentials. +# Usage: scripts/audit.sh www.example.com +set -euo pipefail +HOST="${1:?usage: audit.sh e.g. www.example.com}" +BASE="https://$HOST" + +ok() { printf " \033[32mPASS\033[0m %s\n" "$1"; } +bad() { printf " \033[31mFAIL\033[0m %s\n" "$1"; } + +# check +check() { + local path="$1" want="$2" + local out code ct + out=$(curl -s -o /dev/null -w "%{http_code} %{content_type}" "$BASE$path" || echo "000 -") + code="${out%% *}"; ct="${out#* }" + if [ "$code" = "200" ] && printf '%s' "$ct" | grep -qi "$want"; then + ok "$path ($code, $ct)" + else + bad "$path ($code, ${ct:-no-ct}; want 200 + $want)" + fi +} + +echo "== Agent-readiness audit: $HOST ==" + +echo "-- well-known JSON/markdown documents --" +check "/.well-known/api-catalog" "linkset+json" +check "/.well-known/agent-card.json" "json" +check "/.well-known/mcp/server-card.json" "json" +check "/.well-known/agent-skills/index.json" "json" +check "/.well-known/oauth-authorization-server" "json" +check "/.well-known/oauth-protected-resource" "json" +check "/.well-known/security.txt" "text/plain" +check "/auth.md" "markdown" +check "/llms.txt" "text" + +echo "-- homepage headers / negotiation --" +if curl -sI "$BASE/" | grep -qi '^link:'; then ok "Link header on /"; else bad "Link header on / (missing)"; fi +md=$(curl -s -o /dev/null -w "%{content_type}" -H "Accept: text/markdown" "$BASE/") +printf '%s' "$md" | grep -qi markdown && ok "Markdown for Agents ($md)" || bad "Markdown for Agents (got $md)" +curl -s "$BASE/robots.txt" | grep -qi 'content-signal' && ok "Content-Signal in robots.txt" || bad "Content-Signal in robots.txt (missing)" + +echo "-- DNS-AID (SVCB under _agents) --" +for label in _index _mcp _a2a; do + ans=$(dig +short "$label._agents.${HOST#www.}" TYPE64 2>/dev/null | head -1) + [ -n "$ans" ] && ok "$label._agents SVCB present" || bad "$label._agents SVCB (none)" +done +dnssec=$(dig +dnssec +short "${HOST#www.}" SOA 2>/dev/null | grep -c RRSIG || true) +[ "${dnssec:-0}" -gt 0 ] && ok "DNSSEC RRSIG present" || bad "DNSSEC not validated (publish DS at registrar)" + +echo "== done =="