Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions odd/ledger/journal.md
Original file line number Diff line number Diff line change
Expand Up @@ -1448,3 +1448,30 @@ The fifth commit on this PR closes the remaining **#5 Low** — `populateEntityI

Final tally for PR #20: **5 Bugbot findings across 4 review cycles**. **4 closed by Cursor Agent autofix** (1 High, 2 Medium, 1 Low). **1 closed by manual fix** (Low). The High finding (undeclared `ctx` in `searchByEntity`) is the class of bug that TypeScript would normally catch — but esbuild's transpile-only pipeline silently compiles free-variable references, so the type system was not the guard here; adversarial review was. The J-003→J-004 meta-lesson repeats: type contract + adversarial review is the durable bar; either one alone is insufficient. When the type system itself is disabled in the build pipeline (as with esbuild transpile-only), adversarial review becomes the only line of defense between compile and runtime, which is precisely the condition that made #1 a High severity.

---

### J-006 — H11b validated in production; the principle holds

**Observation:** PR #20 (H11b) merged at 2026-04-24T02:42:39Z. Workers Builds completed the production redeploy at 02:43Z. Live validation probe at 02:43–02:44Z against `aquifer.klappy.dev/mcp` exercised the cold-first-visit → background-warm → warm-second-visit flow for two known-cold entities (`person:Obadiah`, `person:Onesimus`) plus a spot-check of the previously-warmed `person:Paul`.

First visit to `person:Obadiah` returned in **985 ms** with `partial=True` and the disclosure note `"0/33 indexed, 33 warming"`. The trace header showed every per-resource `entities.json` read returning `miss` — fan-out confirmed no entity indexes existed for the current composite SHA. `ctx.waitUntil(warmEntityIndexesForResources(33 missing resources, ...))` was scheduled. Pre-H11b baseline for the same call (pre-deploy probe at 01:11Z) had returned **HTTP 502 at 11.2 s** — Cloudflare's frontend cut the response stream while the Worker was still running the inline bootstrap scan.

Second visit 20 seconds later returned in **1094 ms** with 78 matches and `partial=True, "28/33 indexed, 5 warming"`. Trace header showed per-resource entity-index reads now returning `cache` hits — the background warm from the first visit had populated 28 of 33 per-resource entity indexes during the 20-second interval. The remaining 5 either take longer to scan (larger content-file counts) or the `ctx.waitUntil` budget on the first visit's invocation expired before those finished. The self-healing property held: next query will re-trigger warms for the remaining 5.

Third visit (`person:Onesimus`, a distinct entity) returned `partial=False` with 73 complete matches in 1347 ms — because the prior warm for Obadiah wrote full per-resource entity maps (every entity association in every article), not Obadiah-only data. Any entity present in the articles of the 28 warmed resources is now servable complete, including ones never queried. This is the intended amortization behavior: the cost of warming one resource benefits every future entity lookup touching that resource.

Workers Logs aggregate query over 02:42Z–02:48Z: 6 invocations, 100% outcome=success, 0 errors, 0 `exceededMemory`, 0 `exceededCpu`, 0 exceptions. Pre-H11b invocations of the same entities produced the 255-subrequest OOM signature (J-002) and/or Cloudflare 502 (pre-deploy probe). Both failure modes absent from post-merge logs.

**Learning:** The operator-stated principle — *"partial data with transparency to come back for more; background fetch warms the cache before you come back"* — works precisely as named. The first visit's 985 ms wall clock is the fan-out latency (33 parallel R2 reads of small blobs, most returning null); that's the floor of what cold-cache entity lookup can cost when no per-resource index exists. Subsequent visits share the warming cost across all entities in touched resources — the amortization is substantially better than per-entity bootstrap caching would have produced, because one warm benefits N entities and not just the queried one. The partial-note disclosure text (`"N/M indexed, K warming"`) is sufficiently concrete that the user has a bounded expectation for when to retry; the second visit's note (`"28/33 indexed, 5 warming"`) proves the mechanism visibly, not just in theory.

The H11→H11b arc is itself a lesson in distinguishing "the corpus scan is the problem" from "the corpus scan being on the user-blocking path is the problem." H11 moved the scan off the user path to the index-build path and still called it solved; the scan itself moved, the user-blocking property did not, it was just hidden by the composite-SHA-staleness cache keeping the H11-era build from ever re-running. H11b removes the user-blocking property from both the build path and the query path by never running the scan in either — it runs only in the background after a partial query has already been served. The principle is not about where expensive work runs. It is about what the user waits for.

**Decision:** Close H11, H11b, J-005 as resolved. The entity lookup path has been proven in production to serve partial-with-disclosure responses immediately and self-heal via background warm. Promote H14 (paired pattern: type contract + adversarial review) to canon — the H11b PR produced five Bugbot findings including one High-severity crash that the type system would have caught if esbuild transpile-only weren't bypassing type checking in the build pipeline; this is the third documented recurrence of the pattern being the deciding reason for a decision and satisfies the canon-graduation test per `klappy://canon/principles/cache-fetches-and-parses` method. Do not delete `bootstrapEntityMatches` / `BootstrapEntityResult` / `formatPartialBootstrapNote` yet — they are now provably unused from user paths but the 24-hour observation window for production stability hasn't elapsed; H16 remains open for a follow-up PR after stability is confirmed.

**Constraint:** The observed 985 ms first-visit wall clock is the floor for un-indexed entity lookups. If future corpus growth pushes fan-out latency past ~3 s (i.e., the R2-read-parallelism ceiling for 62+ resources is worse than 33), the partial-note path will start producing client-side timeouts even though the Worker completes cleanly. Monitor fan-out latency as the 23 unserved repos come online per the multi-language metadata probe work. The Workers Logs subrequest counts for the validation window (63 total across 6 invocations) are lower than expected if every `ctx.waitUntil` warm was firing a full 33-resource scan; either the warms completed earlier and are not in the 5-minute window I queried, or most of the metadata and content fetches hit R2 storage cache from prior sessions, or Cloudflare rolls up subrequest counts in a way that doesn't include `ctx.waitUntil` children of completed invocations. The distinction matters only if fan-out slowness in production points back at the warm not actually running; the second-visit 28/33 cache-hits proves the warm DID run, so this is a measurement-convention question and not a correctness question.

**Handoff:**
- **H15 closed** — 24-hour observation is the last safety check. Workers Logs outcome distribution remains 100% success post-H11b merge across the 6 invocations in the first ~5 minutes. Continue monitoring; re-query at 24h and 48h for confirmation of durability under organic load.
- **H16 unblocked** — delete `bootstrapEntityMatches`, `BootstrapEntityResult`, `formatPartialBootstrapNote` (and their tests that still route through them directly) in a standalone PR after 24 hours of clean production observation. They are confirmed dead code from the user path. The direct tests that exercise `bootstrapEntityMatches` via a direct call should be deleted alongside the function itself — they test an internal function that's about to be removed.
- **H19 promoted from J-005's H17** — the disclosure note text is static ("retry in a few seconds"). Observed behavior: 20-second wait produced 28/33 warmed, so "a few seconds" understates the realistic warm time for full corpus coverage. Consider dynamic wording based on `missing_resources.length` and an estimated seconds-per-resource, or a simpler "retry in N seconds" where N = missing_resources.length. Low-priority UX tuning; not a correctness issue.
- **H14 promoted** — encode "type contract + adversarial review" as a paired pattern in canon. Three recurrences documented: J-003 (BootstrapEntityResult contract + Bugbot catching implementation drift); J-005 (H11b FanOutEntityResult contract + Bugbot catching the undeclared `ctx` crash); and the generalized form observed across both — when the type system is present at author time but bypassed in the build pipeline (esbuild transpile-only), adversarial review is the sole defense between compile and runtime. Candidate canon path: `klappy://canon/principles/type-contract-plus-adversarial-review`.
Loading