feat(entity): H11b — partial data with transparency + background warm#20
Merged
feat(entity): H11b — partial data with transparency + background warm#20
Conversation
Applies the operator's stated principle uniformly to the entity lookup
path: "partial data with transparency to come back for more; background
fetch will warm the cache before you come back." The user never blocks
on a corpus scan — not at index-build time, not at query time.
Problem H11b solves:
H11 moved the corpus scan from per-query to per-index-build. Post-merge
production probe showed two issues:
1. The scan moved, didn't disappear. A true cold index build still
pays ~22s on populateEntityIndexes running inline in buildIndex.
2. Indexes built before the H11 deploy never got entity indexes
populated (no composite-SHA change → no buildIndex rerun → no
populateEntityIndexes trigger). Every entity lookup today still
routes through the tier-3 bootstrap fallback. Observed live:
entity:Obadiah first-visit produces HTTP 502 at 11.2s (Cloudflare
frontend cut). Second-visit returns in 588ms from the R2 bootstrap
cache warmed accidentally by the first attempt's completing
background scan.
What changes:
1. src/registry.ts:
- Remove `await populateEntityIndexes(...)` from buildIndex. Index
builds no longer scan content files. Cold-start index build
returns to pre-H11 fast path.
- fanOutEntitySearch return type changes from Promise<ArticleRef[]>
to Promise<FanOutEntityResult> (new exported interface). The
result carries `matches`, `complete`, `scanned_resources`,
`total_resources`, and `missing_resources: ResourceEntry[]`.
- New exported warmEntityIndexesForResources(resources, repoShas,
env, storage) scans the specified resources' content files with
bounded concurrency (same 4×8 caps as J-003's bootstrap) and
writes per-resource entityIndexKey blobs to R2. Idempotency check
(getJSON-before-write) keeps concurrent warms from duplicating work.
- populateEntityIndexes kept as internal helper for potential
diagnostic use; unused in the new path.
2. src/tools.ts:
- handleEntity and searchByEntity wire to the new FanOutEntityResult
shape. When missing_resources is non-empty: emit
formatPartialFanOutNote, kick off ctx?.waitUntil?.(
warmEntityIndexesForResources(missing, repoShas, env, storage)),
return the partial matches immediately.
- bootstrapEntityMatches REMOVED from both user-facing query paths.
Still exported from tools.ts for manual diagnostic use — deletion
tracked as H16 post-stabilization.
- New formatPartialFanOutNote helper: "⚠ Partial result: N/M
resources indexed, K still warming in the background. Retry in a
few seconds for the complete result."
3. odd/ledger/journal.md:
- Appends J-005 covering the observation, the H11-wrong-axis
learning, the H11b decision, the constraint (cache-fetches-and-
parses principle governs — per-resource entity indexes are parse
products and stay cached; only the trigger moves), and 4 handoffs
(H15 post-deploy monitoring; H16 bootstrap deletion; H17
disclosure-note tuning; H18 explicit-retry tool consideration).
Verification:
- npm ci && npm run build && npm run test (with GITHUB_TOKEN, mirroring
CI build-test)
- 168/168 tests pass (was 165 on main: +3 H11b-specific tests, 5
pre-existing tests updated to match the new behavior because they
tested the old bootstrap-fallback path that H11b retires)
- 3 new tests explicitly verify the H11b contract:
* ctx.waitUntil called exactly once with a Promise when
missing_resources is non-empty
* ctx absence doesn't throw — warm just skips, self-healing via
next query
* complete fan-out produces no partial note AND no waitUntil call
- Regression coverage: empty fan-out does NOT call mockFetchJson
(confirms the bootstrap-free user path)
- wrangler deploy --dry-run: clean, no binding changes
Vodka check:
- warmEntityIndexesForResources and fanOutEntitySearch are generic
corpus-scanning / parallel-R2-loading functions. No domain branches.
- FanOutEntityResult fields describe scan completeness, not content
structure.
- Zero new `if (resource_type === ...)` anywhere.
- Cache-fetches-and-parses canon upheld: the parse products (per-
resource entity indexes) stay cached; only the cache-population
trigger changes. SHA-keyed lifecycle unchanged.
Risk:
Low. Reversibility: git revert restores H11 exactly. The bootstrap
function is kept as a safety net during H11b observation; once H15
confirms the new path is healthy, H16 deletes it in a separate PR.
Unknown: the concurrent-warm window for a single resource if two
requests hit the same missing resource at the same time — both schedule
warms, the second warm's getJSON idempotency check catches it and
aborts, wasted cost is one metadata fetch. Acceptable.
Mode trail:
Pre-write: oddkit_search for the governing canon principle
(cache-fetches-and-parses), oddkit_get to retrieve it, oddkit_challenge
against the design to surface the disconfirmer, reversibility, scope,
and alternatives. Challenge returned block-until-addressed; addressed
inline before writing code. Tests added before push. oddkit_validate
returned NEEDS_ARTIFACTS for session capture and change summary;
this commit message + the J-005 journal entry provide both.
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
aquifer-mcp | e94e0de | Commit Preview URL Branch Preview URL |
Apr 24 2026, 02:05 AM |
- Fetch metadata from eng path in warmEntityIndexesForResources to match buildIndex, so R2 cache hits and warms succeed for non-eng resources - Remove stale JSDoc that became attached to formatPartialFanOutNote; restore doc block above formatPartialBootstrapNote
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 739c5c3. Configure here.
Cursor Bugbot review of the first H11b commit surfaced 5 findings across 4 review cycles. Cursor Agent autofixed 4 of them in commits bd35ae8 (#1 High — searchByEntity missing ctx → runtime crash), 94e2648 (#2 Medium — warm path language mismatch + JSDoc placement), and 739c5c3 (#4 Medium — rejected storage reads falsely reporting complete). This commit closes the remaining #5 Low. #5 Low — populateEntityIndexes was dead code after H11b removed its only call site from buildIndex. The PR description claimed it was "preserved for potential diagnostic use" but the function had no export, making that justification false — no external caller could reach it. All of its scanning logic is now represented by warmEntityIndexesForResources. Changes: - src/registry.ts: delete populateEntityIndexes (60-line function + JSDoc). Fix two now-stale comments in the file that referenced the deleted function by name (the buildIndex-removal note and the warmEntityIndexesForResources docstring). - src/tools.test.ts: fix one stale comment referencing the deleted function as the writer of the per-resource entity index format. - odd/ledger/journal.md: appends a final Update sub-section to J-005 covering the full 5-finding Bugbot arc on PR #20. Verification: - npm ci && npm run build && npm run test (with GITHUB_TOKEN set) - 168/168 tests pass (unchanged from prior commit — deletion removed only dead code with no test coverage) - wrangler deploy --dry-run: clean - grep -rn populateEntityIndexes src/ returns nothing after this commit Meta-lesson captured in J-005 update: the High-severity #1 (undeclared ctx in searchByEntity) is the class of bug TypeScript normally catches, but esbuild's transpile-only build pipeline silently compiled the free-variable reference. Adversarial review (Bugbot) was the only line of defense between compile and runtime. When the type system is disabled in the build pipeline, type contract + adversarial review collapses to adversarial review alone, which is precisely the condition that made #1 High rather than something the compiler would have stopped.
This was referenced Apr 24, 2026
klappy
added a commit
to klappy/klappy.dev
that referenced
this pull request
Apr 24, 2026
…warm + graduation ledger (#137) Graduates the operator-stated architectural principle from the aquifer-mcp J-002 -> H11b session as tier-2 canon, with the session graduation ledger. Principle: the user-blocking path never pays the full cost of an expensive corpus scan. Return what is already observed, disclose what is missing, warm in the background via ctx.waitUntil. Three deciding-argument recurrences: 1. refreshAndUpdateCurrentIndex in aquifer-mcp src/registry.ts (implicit). 2. H11b architecture (PR klappy/aquifer-mcp#20) — FanOutEntityResult, formatPartialFanOutNote, warmEntityIndexesForResources. 3. Operator reframe producing canon promotion (meta-level). Companion: odd/ledger/2026-04-24-aquifer-session-principles-graduated.md records the graduation arc. Release-validation-gate: Bugbot clean; Sonnet 4.6 validator dispatched; findings dispositioned in closeout comment #137 (comment)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Why
Applies the operator's stated principle uniformly: "It's a principle of partial data with transparency to come back for more. Background fetch will warm the cache before you come back."
PR #19 (H11) moved the entity-corpus scan off the query path onto the index-build path. Post-merge production probe showed this didn't actually solve the user-blocking problem:
entity:Obadiahfirst-visit → HTTP 502 at 11.2s9f1136be2ab319b0-ORD, no Worker outcome (Cloudflare frontend cut the stream while Worker was still running bootstrap)entity:Obadiahsecond-visit → 78 matches in 588msentities.jsonblobs →missstorage:index/AquiferO…/6ebe2335…/entities.j…=104ms(miss)across every resourceThe deployed H11 code is present, but
populateEntityIndexesnever ran — becausebuildIndexonly executes on a composite-SHA change, and the deploy didn't rotate any repo SHAs. Every entity query today still routes through the tier-3 bootstrap fallback, masked by the per-session cache warming my probes accidentally did.H11 fixed the wrong axis. The durable fix isn't "build the index eagerly instead of lazily." It's: the user path must never pay for a corpus scan regardless of when the scan runs.
What this PR does
src/registry.tsawait populateEntityIndexes(...)frombuildIndex. Index builds return to pre-H11 fast path.fanOutEntitySearchreturn type fromPromise<ArticleRef[]>toPromise<FanOutEntityResult>. New exported interface carriesmatches,complete,scanned_resources,total_resources, andmissing_resources: ResourceEntry[].warmEntityIndexesForResources(resources, repoShas, env, storage)— scans the specified resources' content files with the same4×8bounded concurrency caps as J-003's bootstrap, writes per-resourceentityIndexKeyblobs to R2. Idempotency guarded by a pre-writegetJSONcheck so concurrent warms don't duplicate work.populateEntityIndexesas an internal helper (unused in hot path, preserved for potential diagnostic use; deletion tracked as H16).src/tools.tshandleEntityandsearchByEntityare rewired to:index.entity.get(normalized)(tier 1, unchanged).fanOutEntitySearch→ unpackFanOutEntityResult.missing_resources.length > 0: emitformatPartialFanOutNote(fan)and kick offctx?.waitUntil?.(warmEntityIndexesForResources(...)). The user's response returns immediately with whatever fan-out found — no corpus scan on the request path.bootstrapEntityMatchesfrom both user-facing paths. Function still exported for direct diagnostic use; full deletion is H16 once production observation confirms the new path is healthy.formatPartialFanOutNote(r)produces the user-visible disclosure:odd/ledger/journal.mdcache-fetches-and-parsescanon constraint (per-resource entity indexes ARE parse products and stay cached — only the trigger moves), and 4 handoffs (H15 post-deploy monitoring, H16 bootstrap deletion, H17 disclosure-note tuning, H18 explicit-retry tool consideration).Verification
npm ci && npm run build && npm run testwithGITHUB_TOKENset, mirroring CIbuild-test:168/168 tests pass (was 165 on main; +3 new H11b-specific tests, 5 pre-existing tests updated because they tested the old bootstrap-fallback path H11b retires).
wrangler deploy --dry-run: clean, no binding or compatibility-flag changes.H11b-specific test coverage:
ctx.waitUntilis invoked exactly once with a Promise whenmissing_resourcesis non-emptyctxdoesn't throw — warm just skips, self-healing via next querywaitUntilcallmockFetchJson(confirms the bootstrap-free user path)Vodka check
warmEntityIndexesForResourcesandfanOutEntitySearchare generic corpus-scanning / parallel-R2-loading functions. No domain branches.FanOutEntityResultfields describe scan completeness, not content structure.if (resource_type === ...)anywhere.cache-fetches-and-parsescanon upheld: the parse products (per-resource entity indexes) stay cached — only the cache-population trigger changes. SHA-keyed lifecycle unchanged; no staleness risk per anti-cache-lying.Pressure-tested against canon before writing code
Ran
oddkit_challengeon the design before touching any file. Challenge returnedblock-until-addressedflagging missing disconfirmer + reversibility. Addressed inline:clientDisconnectedor non-success outcomes on first-visit entity calls, the fan-out itself is too slow — retract.git revert). Bootstrap function preserved during migration.refreshAndUpdateCurrentIndexinregistry.tsalready uses this pattern for SHA-staleness refresh. H11b is cloning an existing pattern, not inventing one.oddkit_validatereturnedNEEDS_ARTIFACTSfor session capture + change summary. This PR body + the J-005 journal entry provide both; the commit message provides DoD §5 self-audit detail.Post-deploy validation plan (H15)
entity entity_id=person:Obadiah(known-cold per pre-deploy probe). First call should return immediately with partial note; background warm kicks off inctx.waitUntil.exceededMemory, zeroexceededCpu, zero edge 502s, noclientDisconnectedfrom user-side timeouts on entity calls.Mode trail
Investigation → principle extraction → planning (canon search + retrieve + challenge) → execution (code + tests) → validate → artifact (journal + PR + commit message). Claim-as-debt preserved: no post-deploy latency prediction is stated as fact in the journal; J-005 will get an update once H15 observation confirms or refutes the hypothesis.
Note
Medium Risk
Changes the entity lookup flow to remove the synchronous corpus-scan fallback and instead schedule background index warms via
ctx.waitUntil, which affects user-visible completeness semantics and relies on correct cache-population behavior under load/failures.Overview
Entity lookups now never run a corpus scan inline.
buildIndexstops populating entity indexes, andfanOutEntitySearchswitches from returningArticleRef[]to a structuredFanOutEntityResultthat includesmissing_resources/completeness metadata.When per-resource entity indexes are missing,
handleEntityandsearchByEntityreturn whatever matches are available immediately, append a new disclosure note (formatPartialFanOutNote), and kick offwarmEntityIndexesForResources(...)viactx.waitUntilto populate the missing per-resourceentities.jsonblobs in R2 for the next request.Tests are updated to reflect the bootstrap-free request path and add coverage for the
waitUntilscheduling contract (called once with a Promise when partial; skipped whenctxabsent or fan-out complete). Documentation journal entryJ-005is added describing the H11b rationale and rollout/monitoring handoffs.Reviewed by Cursor Bugbot for commit e94e0de. Bugbot is set up for automated code reviews on this repo. Configure here.