Skip to content

Enable Workers Logs (incident J-002 follow-up)#17

Merged
klappy merged 1 commit intomainfrom
enable-workers-logs-j002
Apr 23, 2026
Merged

Enable Workers Logs (incident J-002 follow-up)#17
klappy merged 1 commit intomainfrom
enable-workers-logs-j002

Conversation

@klappy
Copy link
Copy Markdown
Owner

@klappy klappy commented Apr 23, 2026

Why

Direct response to incident J-002 (transient 503s on aquifer.klappy.dev/mcp at 2026-04-22T00:17:02Z, single trace 3daca3e0-1024-4733-93af-510d65a8b7ae). Investigation 28 hours later established that:

  • The deployed Worker bundle returns status: 500 from its only two error paths and status: 503 from zero. Every 503 the orchestrator sees from this service is, by mechanism, a Cloudflare-edge-injected response (Error 1101/1102, R2 transient, isolate cold-start collision, or upstream Cloudflare service blip) — not application logic.
  • wrangler.toml had no [observability] block, so Workers Logs was never ingesting per-invocation logs / exceptions / CPU and wall time / subrequest counts. The data class that would have distinguished those four hypotheses for the J-002 incident does not exist and cannot be reconstructed.

This PR closes that visibility gap so the next 503 is a one-query investigation instead of a four-hypothesis exploration.

What

wrangler.toml:

  • Top-level [observability] block, enabled = true, [observability.logs] with invocation_logs = true and head_sampling_rate = 1.
  • Same block mirrored under [env.staging] for parity with wrangler dev --env staging and the staging branch preview at staging-aquifer-mcp.klappy.workers.dev.

odd/ledger/journal.md:

  • Appends J-002 entry covering the incident: Observation, Learning, Decision (this PR), Constraint (logs are not retroactive), and four Handoffs.
    • H7oddkit_encode did not honor DOLCHE splitting on the planning session that produced this entry; collapsed five-section payload into one Constraint stub. Recorded for separate investigation against the oddkit worker.
    • H8 — this PR.
    • H9 — bt-servant orchestrator constructed get arguments as {compound_key: "..."} instead of three separate fields; investigate tool-description presentation layer.
    • H10 — open question whether "always enable Workers Logs on klappy.dev MCP servers" deserves its own canon constraint.

Validation after merge

Workers Builds will redeploy on push to main. Within ~1 hour of the deploy, the Workers Logs view should populate:

Quick sanity check: wrangler tail --name aquifer-mcp --format pretty should stream live invocation events; previously this returned no events because no telemetry was being captured.

Risk

Low. The change is purely additive — no code changed, no bindings changed, no compatibility flags changed. Cloudflare Workers Builds will redeploy the Worker on merge (the deployed bundle is bytes-identical to the current 2026-04-13T21:35:14Z deploy plus the observability config flag). At head_sampling_rate = 1 the maximum possible log ingest cost is 1 row per invocation — within the Workers Paid free allotment for this service's traffic level.

Mode trail

Investigation conducted under exploration → execution per the project model operating contract. J-002 entry was hand-written rather than copied verbatim from oddkit_encode output because the encoder collapsed the OLDC+H payload to a single Constraint stub (see H7); the hand-written entry follows the J-001 narrative precedent already at the bottom of the journal.


Note

Low Risk
Low risk: configuration-only change enabling Workers Logs for production and staging; no runtime code paths, bindings, or caching behavior are modified.

Overview
Enables Cloudflare Workers invocation logs for aquifer-mcp by adding [observability] (and mirrored [env.staging.observability]) to wrangler.toml, so transient edge-injected 5xxs can be investigated with per-invocation telemetry.

Appends a new journal entry J-002 to odd/ledger/journal.md documenting the 2026-04-22 transient 503 incident and the decision to enable Workers Logs as the follow-up action.

Reviewed by Cursor Bugbot for commit 2059ac0. Bugbot is set up for automated code reviews on this repo. Configure here.

Adds [observability] block to top-level wrangler.toml and to [env.staging],
turning on Workers Logs (per-invocation logs, exceptions, CPU/wall time,
subrequest counts) with full-rate head sampling and invocation_logs.

Direct response to incident J-002 (transient 503s on 2026-04-22T00:17:02Z),
which was un-investigable because Workers Logs was not enabled. Without this
block, edge-injected 5xx responses cannot be correlated to a runtime cause —
the application code never returns 503 itself, so every 503 the orchestrator
sees is by definition a Cloudflare-injected response with no application
signal.

Also appends J-002 to odd/ledger/journal.md per the project convention that
every PR carries an OLDC journal entry. The new entry covers Observations
from the incident replay, Learnings about the application 503 ceiling,
the Decision to enable logging now (this PR), Constraints (logs cannot be
enabled retroactively), and four Handoffs including this PR (H8) and an
encoder-defect note (H7) for separate follow-up.
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
aquifer-mcp 2059ac0 Commit Preview URL

Branch Preview URL
Apr 23 2026, 04:36 AM

@klappy klappy merged commit 9c17720 into main Apr 23, 2026
3 checks passed
@klappy klappy deleted the enable-workers-logs-j002 branch April 23, 2026 04:39
klappy added a commit to klappy/klappy.dev that referenced this pull request Apr 24, 2026
Graduates the paired pattern observed across aquifer-mcp J-002 -> H11b
(PRs klappy/aquifer-mcp#17 through #20) as tier-2 canon.

Three deciding-argument recurrences under the manual-enforcement reading:
1. PR #18 BootstrapEntityResult + 9 Bugbot findings (3 silent-truth-loss bugs).
2. PR #20 FanOutEntityResult + High-severity ctx ReferenceError.
3. esbuild-transpile-only boundary condition — orchestrator decision to
   treat adversarial review as sole defense when pipeline bypasses types.

Release-validation-gate: Bugbot clean (both HEADs); independent Sonnet 4.6
validator dispatched; all findings dispositioned in closeout comment
#136 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant