Skip to content

refactor(telemetry): retire indexSource interpreter — fetchers report, telemetry tallies#141

Merged
klappy merged 3 commits intomainfrom
refactor/retire-indexsource-interpreter
Apr 26, 2026
Merged

refactor(telemetry): retire indexSource interpreter — fetchers report, telemetry tallies#141
klappy merged 3 commits intomainfrom
refactor/retire-indexsource-interpreter

Conversation

@klappy
Copy link
Copy Markdown
Owner

@klappy klappy commented Apr 26, 2026

What

Retire the tracer's _indexSource interpreter that picked one tier as the "winner" of a multi-fetch request and wrote it to telemetry blob9. Replace with per-fetch records (recordFetch) and derived arithmetic (cacheStats { hits, misses, total }). Mirrors the TH-mcp pattern that has avoided this whole class of bug.

Why

Four bugs in four PRs from the interpreter:

TH-mcp doesn't have these because it never collapses. It records each fetch with cached: true|false and lets queries do aggregation. Transparency was the goal; the aggregator was the obscuring layer.

Locked design (per handoff)

  • tracing.ts: recordFetch({ url, duration_ms, cached, size?, status? }) accumulates onto a private fetches[]. cacheStats getter does the arithmetic. _indexSource, indexSource, and the addSpan setter logic are deleted. toJSON drops index_source, adds fetches + cacheStats. addSpan retained for non-fetcher events (action spans, sha:* spans).
  • zip-baseline-fetcher.ts: All 15 cache-layer call sites converted. URL-prefix carries the tier — memory:// cf-cache:// r2:// build:// or real GitHub URL. cached boolean is the primary fact, URL is the breadcrumb. Duplicate index "cache" span dropped (cacheGet now records the cf-cache:// fetch directly).
  • index.ts: tracer.cacheStats replaces tracer.indexSource. PR fix(telemetry): cache_tier streaming-race — read tracer AFTER body consumption #138's streaming-race fix preserved — the read still happens inside ctx.waitUntil after await responseClone.text() resolves.
  • telemetry.ts: recordTelemetry signature: cacheStats?: {hits, total} replaces cacheTier?: string. blob9 retired (slot stays free per "no deprecation, nobody uses them yet"). double7: cache_hits and double8: cache_lookups added. BASELINE_BLOB_SEMANTIC_NAMES is 8; BASELINE_DOUBLE_SEMANTIC_NAMES is 8.

Tests

  • 4 PR fix(telemetry): tracer recognizes file:* spans for oddkit_get fast path #139 file:*-recognition tests + index-wins-over-file test deleted (interpreter gone)
  • Streaming-race regression test reworked to use cacheStats (semantics unchanged, only the accessor)
  • New: recordFetch arithmetic test — cacheStats mirrors fetches[]
  • All recordTelemetry call sites updated to pass {hits, total} instead of string
  • 19 telemetry tests pass, 0 fail
  • governance-parser (105 pass), tokenize (7 pass), tsc --noEmit clean

Companion PR

Canon update lands as a separate PR on klappy/klappy.dev: canon/telemetry-governance-cache-stats.

Verification (post-merge, after Workers Builds)

-- Hit rate over last 30 min
SELECT SUM(double7) AS hits, SUM(double8) AS lookups,
       SUM(double7) * 1.0 / NULLIF(SUM(double8), 0) AS hit_rate
FROM oddkit_telemetry
WHERE timestamp > NOW() - INTERVAL '30' MINUTE;

-- Cold-start frequency
SELECT blob3 AS tool, SUM(_sample_interval) AS cold_starts
FROM oddkit_telemetry
WHERE timestamp > NOW() - INTERVAL '1' HOUR
  AND double7 = 0 AND double8 > 0
GROUP BY blob3;

Per-tier detail visible in any response's debug.trace.fetches[].

Release validation gate

Will wait for Cursor Bugbot to reach completed before squash-merging, per klappy://canon/constraints/release-validation-gate. PR touches tracing.ts, index.ts, telemetry.ts (envelope behavior) and may warrant Sonnet 4.6 read-only validator dispatch via Managed Agents before promotion.


Note

Medium Risk
Medium risk because it changes the Analytics Engine telemetry schema (retires blob9 and adds new doubles) and refactors tracing semantics, which could break dashboards/queries if not updated.

Overview
Replaces the tracer’s single “winning cache tier” (indexSource/cache_tier) with per-fetch recording via RequestTracer.recordFetch() and derived cacheStats (hits/total), and exposes these fetch records in debug.trace output.

Telemetry writing is updated to retire blob9 and instead emit double7: cache_hits and double8: cache_lookups, with index.ts still reading tracer data only after streaming responses complete to avoid the prior race.

Knowledge-base fetch paths are updated to emit memory://, cf-cache://, r2://, build://, and network fetch records, and integration tests are adjusted to the new 8-blob/8-double schema and new cache arithmetic expectations.

Reviewed by Cursor Bugbot for commit b703860. Bugbot is set up for automated code reviews on this repo. Configure here.

…, telemetry tallies

The tracer's _indexSource interpreter picked one tier as the 'winner' of a
multi-fetch request and wrote it to telemetry blob9 (cache_tier). It produced
four bugs in four PRs (#137 missing field, #138 streaming race, #139 file:*
recognition, ongoing first-vs-slowest debate). TH-mcp doesn't have these bugs
because it never collapses — it records each fetch with cached: true|false and
reports cacheStats {hits, total} via arithmetic.

This refactor does what TH-mcp does: deletes the interpreter, records per-fetch
facts, lets dashboard queries do any aggregation. Transparency was the goal;
the aggregator was the obscuring layer.

workers/src/tracing.ts:
- Add recordFetch({ url, duration_ms, cached, size?, status? }) — per-fetch records
- Add cacheStats getter returning { hits, misses, total } — derived arithmetic
- Delete _indexSource field, indexSource getter, addSpan setter logic
- toJSON returns { spans, fetches, cacheStats, total_ms } — drops index_source
- Keep addSpan for non-fetcher use (action spans, sha:* spans)

workers/src/zip-baseline-fetcher.ts:
- Convert all 15 cache-layer addSpan call sites to recordFetch
- URL prefix carries the tier: memory:// cf-cache:// r2:// build:// or real https://
- Drop duplicate 'index' cache-tier span (cacheGet now records the cf-cache:// fetch)
- sha:* spans kept as addSpan per spec

workers/src/index.ts:
- Replace cacheTier = tracer.indexSource with stats = tracer.cacheStats
- Pass cacheStats {hits, total} to recordTelemetry
- Preserve PR #138 streaming-race fix (read inside ctx.waitUntil after body consumed)

workers/src/telemetry.ts:
- blob9 retired (slot stays free per 'no deprecation, nobody uses them yet' rule)
- Add double7 cache_hits, double8 cache_lookups
- recordTelemetry signature: cacheStats?: {hits, total} replaces cacheTier?: string
- BASELINE_BLOB_SEMANTIC_NAMES shrunk to 8; BASELINE_DOUBLE_SEMANTIC_NAMES grown to 8

workers/test/telemetry-integration.test.mjs:
- Delete 4 PR #139 file:*-recognition tests + index-wins-over-file test (interpreter gone)
- Rework streaming-race regression test to use cacheStats (semantics unchanged)
- Add tracer.recordFetch arithmetic test (cacheStats mirrors fetches[])
- All recordTelemetry call sites updated to pass {hits, total} instead of string
- 19 tests pass, 0 fail

Canon update for telemetry-governance.md ships in klappy/klappy.dev separately.
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 26, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
oddkit b703860 Commit Preview URL

Branch Preview URL
Apr 26 2026, 04:05 PM

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Exported shortKey function is now unused dead code
    • Removed the unused shortKey export from workers/src/tracing.ts since it had no remaining consumers after the refactor.
Preview (b703860666)
diff --git a/workers/src/index.ts b/workers/src/index.ts
--- a/workers/src/index.ts
+++ b/workers/src/index.ts
@@ -444,13 +444,14 @@
   knowledge_base_url — which knowledge base is being served
   document_uri      — for get calls, the klappy:// URI requested
   worker_version    — oddkit version string
-  cache_tier        — which storage tier served the index
   count             — always 1 (use SUM for aggregation)
   duration_ms       — request processing time (full wall-clock at worker edge)
   bytes_in          — UTF-8 byte length of the request body
   bytes_out         — UTF-8 byte length of the response body (0 for SSE streams)
   tokens_in         — cl100k_base token count of the request body
   tokens_out        — cl100k_base token count of the response body
+  cache_hits        — count of fetches in the request that hit a cache tier
+  cache_lookups     — total fetches in the request (denominator for hit rate)
   index1            — sampling key (consumer label)
 
 Use SUM(_sample_interval) instead of COUNT(*) to account for Analytics Engine sampling.
@@ -979,7 +980,9 @@
       const response = await handler(request, env, ctx);
 
       // Phase 1 telemetry — non-blocking, fire-and-forget (E0008)
-      // Phase 1.5: cache_tier from tracer feeds blob9 (E0008.1)
+      // Phase 1.5: cache_tier interpreter retired (PR refactor/retire-indexsource-interpreter).
+      // The tracer now records per-fetch facts; recordTelemetry receives the
+      // arithmetic (cache_hits, cache_lookups) instead of a single "winner" tier.
       // Phase 2: payload shape (bytes_in/out, tokens_in/out) feeds doubles
       // 3-6. tokenize_ms was tried and dropped — Workers freezes both
       // performance.now() and Date.now() during synchronous CPU work, making
@@ -989,15 +992,15 @@
       // response. The helper handles clone failures safely.
       if (telemetryClone) {
         const durationMs = Date.now() - startTime;
-        // NOTE: Do NOT read tracer.indexSource here. The MCP handler returns
+        // NOTE: Do NOT read tracer.cacheStats here. The MCP handler returns
         // a streaming Response — `await handler(...)` resolves with the
         // Response object before the tool handler closure has finished
-        // running, so the tracer has not yet recorded the `index` span at
-        // this point. Reading here yields "none" for every tool. The tracer
-        // is only fully populated once the response body has been consumed
-        // (which forces the streaming tool handler to complete). The read
-        // therefore happens inside the waitUntil callback below, after
-        // `await responseClone.text()` resolves.
+        // running, so the tracer has not yet recorded any fetch records at
+        // this point. Reading here yields {hits:0,total:0} for every tool.
+        // The tracer is only fully populated once the response body has been
+        // consumed (which forces the streaming tool handler to complete).
+        // The read therefore happens inside the waitUntil callback below,
+        // after `await responseClone.text()` resolves. (PR #138 fix retained.)
         // Clone the response synchronously before returning so the body is
         // still available to read inside the deferred waitUntil callback.
         const responseClone = response.clone();
@@ -1016,15 +1019,15 @@
               } catch {
                 // Fall through with empty string; bytes_out / tokens_out will be 0.
               }
-              // Read tracer.indexSource AFTER the response body has been
+              // Read tracer.cacheStats AFTER the response body has been
               // consumed. By this point the streaming tool handler has
-              // completed and any "index" / "index-build" spans have been
-              // recorded. Reading earlier (e.g. immediately after `await
-              // handler()` returned) was the streaming-race bug that caused
-              // every tool call to record cache_tier="none" in production.
-              const cacheTier = tracer.indexSource;
+              // completed and any storage-tier fetches have been recorded.
+              // Reading earlier (e.g. immediately after `await handler()`
+              // returned) was the streaming-race bug that caused every tool
+              // call to record cache_tier="none" in production (PR #138).
+              const stats = tracer.cacheStats;
               const shape = await measurePayloadShape(requestText, responseText);
-              recordTelemetry(request, requestText, env, durationMs, cacheTier, shape);
+              recordTelemetry(request, requestText, env, durationMs, stats, shape);
             } catch {
               // Telemetry must never break MCP requests
             }

diff --git a/workers/src/telemetry.ts b/workers/src/telemetry.ts
--- a/workers/src/telemetry.ts
+++ b/workers/src/telemetry.ts
@@ -17,7 +17,10 @@
  *                            (deploy-time injection) with a build-time fallback
  *                            to workers/package.json::version. Never "unknown"
  *                            on a normal deploy.
- *   blob9: cache_tier    — which storage tier served the index (E0008.1)
+ *   blob9: RETIRED         — was cache_tier (E0008.1, removed in the
+ *                            retire-indexsource-interpreter refactor). Slot
+ *                            stays free; cache effectiveness is now derived
+ *                            from double7/double8.
  *   double1: count        — always 1 (for SUM aggregation)
  *   double2: duration_ms  — Full MCP request wall-clock, measured at the worker
  *                            edge from request entry through handler return.
@@ -39,6 +42,12 @@
  *                            0 when tokenization was skipped or failed.
  *   double6: tokens_out   — cl100k_base token count of the response body. 0 for
  *                            streamed responses or tokenizer failure.
+ *   double7: cache_hits   — count of per-fetch records in the request whose
+ *                            `cached` flag was true. Sourced from
+ *                            tracer.cacheStats.hits.
+ *   double8: cache_lookups — total per-fetch records in the request (the
+ *                            denominator for hit rate). Sourced from
+ *                            tracer.cacheStats.total.
  *
  *   NOTE: a previous iteration shipped a `double7: tokenize_ms` field intended
  *   to capture the wall-clock cost of tokenization for bench-vs-prod
@@ -236,7 +245,7 @@
   requestBody: string,
   env: Env,
   durationMs: number,
-  cacheTier?: string,
+  cacheStats?: { hits: number; total: number },
   shape?: PayloadShape | null,
 ): void {
   if (!env.ODDKIT_TELEMETRY) return;
@@ -258,6 +267,8 @@
   const bytesOut = shape?.bytes_out ?? 0;
   const tokensIn = shape?.tokens_in ?? 0;
   const tokensOut = shape?.tokens_out ?? 0;
+  const cacheHits = cacheStats?.hits ?? 0;
+  const cacheLookups = cacheStats?.total ?? 0;
 
   for (const payload of messages) {
     const { label: consumerLabel, source: consumerSource } = parseConsumerLabel(
@@ -286,7 +297,9 @@
         toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "",
         documentUri,
         env.ODDKIT_VERSION || BUILD_VERSION,
-        cacheTier || "none", // blob9: E0008.1 x-ray cache tier
+        // blob9 retired (was cache_tier). Slot stays free per the
+        // "no deprecation, nobody uses them yet" rule. Cache effectiveness
+        // moved to double7/double8.
       ],
       doubles: [
         1,                // double1: count
@@ -295,6 +308,8 @@
         bytesOut,         // double4: bytes_out
         tokensIn,         // double5: tokens_in
         tokensOut,        // double6: tokens_out
+        cacheHits,        // double7: cache_hits
+        cacheLookups,     // double8: cache_lookups
       ],
       indexes: [consumerLabel],
     });
@@ -339,7 +354,9 @@
   "knowledge_base_url", // blob6
   "document_uri",     // blob7
   "worker_version",   // blob8
-  "cache_tier",       // blob9
+  // blob9 retired (was cache_tier). Slot stays free per the
+  // "no deprecation, nobody uses them yet" rule. Hit-rate moved to
+  // double7/double8.
 ] as const;
 
 /**
@@ -348,12 +365,14 @@
  * which are parseable at runtime. Baseline is the safety net.
  */
 const BASELINE_DOUBLE_SEMANTIC_NAMES = [
-  "count",       // double1
-  "duration_ms", // double2
-  "bytes_in",    // double3
-  "bytes_out",   // double4
-  "tokens_in",   // double5
-  "tokens_out",  // double6
+  "count",         // double1
+  "duration_ms",   // double2
+  "bytes_in",      // double3
+  "bytes_out",     // double4
+  "tokens_in",     // double5
+  "tokens_out",    // double6
+  "cache_hits",    // double7
+  "cache_lookups", // double8
 ] as const;
 
 /** Build a SchemaMap from ordered blob/double name arrays. Exported for unit testing. */

diff --git a/workers/src/tracing.ts b/workers/src/tracing.ts
--- a/workers/src/tracing.ts
+++ b/workers/src/tracing.ts
@@ -1,13 +1,20 @@
 /**
  * Lightweight edge-compatible request tracer for X-Ray performance diagnostics.
  *
- * Adapted from aquifer-mcp's RequestTracer (which was modeled on
- * translation-helps-mcp's EdgeXRayTracer). Records every I/O operation
- * with timing, source tier, and optional detail.
+ * Adapted from translation-helps-mcp's EdgeXRayTracer. Records every fetch
+ * (storage tier access or network call) as a per-fetch fact with timing,
+ * cached boolean, optional status and size. The URL prefix carries the tier
+ * (memory:// cf-cache:// r2:// build:// or a real https:// URL); there is no
+ * separate `source` field on FetchRecord — `cached` is the primary fact, the
+ * URL is the breadcrumb.
  *
- * Usage: create one RequestTracer per inbound request, thread it through
- * storage reads and tool handlers, then serialize via toHeader() or toJSON().
+ * Telemetry derives cache_hits and cache_lookups from the per-fetch records
+ * via the `cacheStats` getter — the dashboard does the aggregation, not the
+ * tracer. There is no interpretation layer that picks a "winning" tier.
  *
+ * `addSpan` is retained for non-fetcher events (action timing, sha:* SHA
+ * resolution, anything that is not a storage tier read).
+ *
  * Part of E0008: Observability epoch.
  */
 
@@ -18,16 +25,34 @@
   detail?: string;
 }
 
+export interface FetchRecord {
+  url: string;
+  duration_ms: number;
+  cached: boolean;
+  status?: number;
+  size?: number;
+}
+
+export interface CacheStats {
+  hits: number;
+  misses: number;
+  total: number;
+}
+
 export class RequestTracer {
   private spans: TraceSpan[] = [];
+  private fetches: FetchRecord[] = [];
   private startTime: number;
-  private _indexSource: TraceSpan["source"] | null = null;
 
   constructor() {
     this.startTime = performance.now();
   }
 
-  /** Record a span with explicit timing, source, and detail. */
+  /**
+   * Record a non-fetcher span (action timing, SHA resolution, etc.).
+   * Storage tier reads should use `recordFetch` instead so they roll into
+   * the `cacheStats` arithmetic.
+   */
   addSpan(label: string, duration_ms: number, source?: TraceSpan["source"], detail?: string): void {
     this.spans.push({
       label,
@@ -35,29 +60,30 @@
       ...(source ? { source } : {}),
       ...(detail ? { detail } : {}),
     });
+  }
 
-    // Track the primary cache tier for telemetry (first span matching a data
-    // fetch). Three label families count:
-    //   - "index" / "index-build"  → navigability index fetch (search/orient/etc.)
-    //   - "file:*"                 → individual file fetch (oddkit_get fast path)
-    // First-wins: actions like runSearch call getIndex *before* getFile, so
-    // the index tier wins for those — file:* spans that fire later are
-    // ignored. Actions like runGet for klappy:// URIs call getFile only,
-    // so the file tier wins. file-r2:* (r2 miss with source="miss") is
-    // excluded because "miss" is not a tier.
-    if (!this._indexSource && source && source !== "miss") {
-      if (
-        label === "index" ||
-        label === "index-build" ||
-        label.startsWith("file:")
-      ) {
-        this._indexSource = source;
-      }
-    }
+  /**
+   * Record one storage-tier fetch as a per-fetch fact. URL prefix carries
+   * the tier:
+   *   - memory://path     → module-level cache hit (always cached: true)
+   *   - cf-cache://key    → Cloudflare Cache API
+   *   - r2://path         → R2 durable storage
+   *   - build://path      → cold rebuild from ZIP
+   *   - https://...       → real network fetch (status, size populated)
+   *
+   * `cached: true` for hits, `cached: false` for misses and live fetches.
+   * `cacheStats` aggregates these into hit/miss/total counts; the dashboard
+   * does any further per-tier breakdown via debug.trace.fetches[].
+   */
+  recordFetch(record: FetchRecord): void {
+    this.fetches.push({
+      ...record,
+      duration_ms: Math.round(record.duration_ms),
+    });
   }
 
   /**
-   * Time an async operation and record it as a span.
+   * Time an async operation and record it as a (non-fetch) span.
    * Returns the operation's result.
    */
   async trace<T>(
@@ -78,22 +104,16 @@
   }
 
   /**
-   * Which storage tier served the primary data fetch for this request.
-   * This is the single summary value that feeds telemetry blob9 (cache_tier).
-   * "memory" = module-level cache hit (0ms, best case)
-   * "cache"  = Cache API edge hit (~1ms)
-   * "r2"     = R2 durable storage read (~40ms)
-   * "build"  = cold build from ZIP (seconds, worst case)
-   * "github" = GitHub network fetch (when no R2/cache layers exist)
-   * "none"   = no data fetch happened (e.g. version, time actions)
-   *
-   * The value reflects the primary fetch — for actions like search/orient
-   * that load the navigability index first, this is the index tier. For
-   * oddkit_get with a klappy:// URI (the fast path, no index needed), this
-   * is the file fetch tier. Either way: where did the work come from?
+   * Cache-hit arithmetic over the recorded fetches. `total` is the number of
+   * fetches; `hits` is the count where `cached === true`. This is what
+   * telemetry uses to populate `cache_hits` and `cache_lookups` doubles.
+   * Replaces the retired `indexSource` interpretation: no winner is chosen,
+   * the dashboard computes hit-rate as `SUM(cache_hits) / SUM(cache_lookups)`.
    */
-  get indexSource(): string {
-    return this._indexSource ?? "none";
+  get cacheStats(): CacheStats {
+    let hits = 0;
+    for (const f of this.fetches) if (f.cached) hits++;
+    return { hits, misses: this.fetches.length - hits, total: this.fetches.length };
   }
 
   /** Total elapsed time since tracer creation. */
@@ -109,27 +129,36 @@
       if (s.detail) val += `[${s.detail}]`;
       return val;
     });
+    for (const f of this.fetches) {
+      const tag = f.cached ? "hit" : "miss";
+      parts.push(`fetch=${f.duration_ms}ms(${tag})[${f.url}]`);
+    }
     parts.push(`total=${this.elapsed_ms}ms`);
     return parts.join(", ");
   }
 
   /** Structured JSON for debug envelope inclusion. */
-  toJSON(): { spans: TraceSpan[]; total_ms: number; index_source: string } {
-    return { spans: [...this.spans], total_ms: this.elapsed_ms, index_source: this.indexSource };
+  toJSON(): {
+    spans: TraceSpan[];
+    fetches: FetchRecord[];
+    cacheStats: CacheStats;
+    total_ms: number;
+  } {
+    return {
+      spans: [...this.spans],
+      fetches: [...this.fetches],
+      cacheStats: this.cacheStats,
+      total_ms: this.elapsed_ms,
+    };
   }
 
-  /** Number of recorded spans. */
+  /** Number of recorded spans (non-fetch). */
   get spanCount(): number {
     return this.spans.length;
   }
-}
 
-/** Shorten a cache/storage key for readable trace output. */
-export function shortKey(key: string): string {
-  const parts = key.split("/");
-  if (parts.length <= 2) return key;
-  const mid = parts.slice(1, -1).map((p) => (p.length > 10 ? p.slice(0, 10) + "…" : p)).join("/");
-  const last = parts[parts.length - 1]!;
-  const shortLast = last.length > 12 ? last.slice(0, 12) + "…" : last;
-  return `${parts[0]}/${mid}/${shortLast}`;
+  /** Number of recorded fetches. */
+  get fetchCount(): number {
+    return this.fetches.length;
+  }
 }

diff --git a/workers/src/zip-baseline-fetcher.ts b/workers/src/zip-baseline-fetcher.ts
--- a/workers/src/zip-baseline-fetcher.ts
+++ b/workers/src/zip-baseline-fetcher.ts
@@ -21,7 +21,7 @@
  */
 
 import { unzipSync } from "fflate";
-import { RequestTracer, shortKey } from "./tracing";
+import { RequestTracer } from "./tracing";
 
 // Index schema version — included in cache key so that code changes
 // to the indexing pipeline (filters, fields, scoring) invalidate stale indexes.
@@ -513,12 +513,24 @@
       const hit = await c.match(this.cacheRequest(key));
       if (hit) {
         const text = await hit.text();
-        this.tracer?.addSpan(`cache:${shortKey(key)}`, performance.now() - start, "cache");
+        this.tracer?.recordFetch({
+          url: `cf-cache://${key}`,
+          duration_ms: performance.now() - start,
+          cached: true,
+        });
         return JSON.parse(text) as T;
       }
-      this.tracer?.addSpan(`cache:${shortKey(key)}`, performance.now() - start, "miss");
+      this.tracer?.recordFetch({
+        url: `cf-cache://${key}`,
+        duration_ms: performance.now() - start,
+        cached: false,
+      });
     } catch {
-      this.tracer?.addSpan(`cache:${shortKey(key)}`, performance.now() - start, "miss", "error");
+      this.tracer?.recordFetch({
+        url: `cf-cache://${key}`,
+        duration_ms: performance.now() - start,
+        cached: false,
+      });
     }
     return null;
   }
@@ -670,7 +682,7 @@
 
     // Check memory cache first (unless skipping cache)
     if (!skipCache && this.zipCache.has(cacheKey)) {
-      this.tracer?.addSpan("zip", 0, "memory");
+      this.tracer?.recordFetch({ url: `memory://${cacheKey}`, duration_ms: 0, cached: true });
       return this.zipCache.get(cacheKey)!;
     }
 
@@ -680,11 +692,20 @@
       const r2Object = await this.env.BASELINE.get(cacheKey);
       if (r2Object) {
         const data = new Uint8Array(await r2Object.arrayBuffer());
-        this.tracer?.addSpan("zip", performance.now() - r2Start, "r2");
+        this.tracer?.recordFetch({
+          url: `r2://${cacheKey}`,
+          duration_ms: performance.now() - r2Start,
+          cached: true,
+          size: data.length,
+        });
         this.zipCache.set(cacheKey, data);
         return data;
       }
-      this.tracer?.addSpan("zip-r2", performance.now() - r2Start, "miss");
+      this.tracer?.recordFetch({
+        url: `r2://${cacheKey}`,
+        duration_ms: performance.now() - r2Start,
+        cached: false,
+      });
     }
 
     // Fetch from GitHub
@@ -696,13 +717,23 @@
 
       if (!response.ok) {
         console.error(`Failed to fetch ZIP: ${response.status} ${url}`);
-        this.tracer?.addSpan("zip", performance.now() - ghStart, "github", `${response.status}`);
+        this.tracer?.recordFetch({
+          url,
+          duration_ms: performance.now() - ghStart,
+          cached: false,
+          status: response.status,
+        });
         return null;
       }
 
       const data = new Uint8Array(await response.arrayBuffer());
-      this.tracer?.addSpan("zip", performance.now() - ghStart, "github",
-        `${(data.length / 1024).toFixed(0)}KB`);
+      this.tracer?.recordFetch({
+        url,
+        duration_ms: performance.now() - ghStart,
+        cached: false,
+        status: response.status,
+        size: data.length,
+      });
 
       // Cache in R2 (no TTL — content-addressed by SHA at the index/file layer)
       if (this.env.BASELINE) {
@@ -856,7 +887,7 @@
     // Step 0: Module-level memory cache (0ms, 5-min TTL)
     const expectedKey = `v${INDEX_VERSION}/${getCacheKey(knowledgeBaseUrl || "default")}`;
     if (cachedIndex && cachedIndexKey === expectedKey && Date.now() - indexCachedAt < MODULE_CACHE_TTL_MS) {
-      this.tracer?.addSpan("index", 0, "memory");
+      this.tracer?.recordFetch({ url: `memory://index/${expectedKey}`, duration_ms: 0, cached: true });
       return cachedIndex;
     }
 
@@ -869,8 +900,8 @@
     const shaKey = `${baselineSha || "unknown"}_${canonSha || "none"}`;
     const cacheKey = `index/v${INDEX_VERSION}/${getCacheKey(knowledgeBaseUrl || "default")}_${shaKey}`;
 
-    // Step 2: Cache API (~1ms edge read)
-    const cacheStart = performance.now();
+    // Step 2: Cache API (~1ms edge read) — cacheGet records the cf-cache:// fetch.
+    // We do NOT add a duplicate "index" span here; one I/O = one fetch record.
     const cacheHit = await this.cacheGet<BaselineIndex>(cacheKey);
     if (cacheHit) {
       // Verify SHAs match (Cache API doesn't have KV's eventual consistency issue,
@@ -878,7 +909,6 @@
       const baselineShaMatch = !baselineSha || cacheHit.commit_sha === baselineSha;
       const canonShaMatch = !canonSha || cacheHit.canon_commit_sha === canonSha;
       if (baselineShaMatch && canonShaMatch) {
-        this.tracer?.addSpan("index", performance.now() - cacheStart, "cache");
         // Populate module cache
         cachedIndex = cacheHit;
         cachedIndexKey = expectedKey;
@@ -895,7 +925,12 @@
         try {
           const text = await r2Object.text();
           const r2Index = JSON.parse(text) as BaselineIndex;
-          this.tracer?.addSpan("index", performance.now() - r2Start, "r2");
+          this.tracer?.recordFetch({
+            url: `r2://${cacheKey}`,
+            duration_ms: performance.now() - r2Start,
+            cached: true,
+            size: text.length,
+          });
 
           // Populate upstream caches
           cachedIndex = r2Index;
@@ -905,10 +940,18 @@
 
           return r2Index;
         } catch {
-          this.tracer?.addSpan("index", performance.now() - r2Start, "r2", "parse-error");
+          this.tracer?.recordFetch({
+            url: `r2://${cacheKey}`,
+            duration_ms: performance.now() - r2Start,
+            cached: false,
+          });
         }
       } else {
-        this.tracer?.addSpan("index-r2", performance.now() - r2Start, "miss");
+        this.tracer?.recordFetch({
+          url: `r2://${cacheKey}`,
+          duration_ms: performance.now() - r2Start,
+          cached: false,
+        });
       }
     }
 
@@ -948,8 +991,11 @@
       canon_commit_sha: canonSha || undefined,
     };
 
-    this.tracer?.addSpan("index-build", performance.now() - buildStart, "build",
-      `${allEntries.length} entries`);
+    this.tracer?.recordFetch({
+      url: `build://index/${expectedKey}`,
+      duration_ms: performance.now() - buildStart,
+      cached: false,
+    });
 
     // Persist to R2 + Cache API (no KV)
     if (this.env.BASELINE) {
@@ -1028,7 +1074,7 @@
       // Tier 0: Module-level file cache (0ms)
       const moduleCached = fileCache.get(cacheKey);
       if (moduleCached && Date.now() - moduleCached.cachedAt < MODULE_CACHE_TTL_MS) {
-        this.tracer?.addSpan(`file:${path}`, 0, "memory");
+        this.tracer?.recordFetch({ url: `memory://${path}`, duration_ms: 0, cached: true });
         return moduleCached.content;
       }
 
@@ -1038,7 +1084,12 @@
         const r2Object = await this.env.BASELINE.get(cacheKey);
         if (r2Object) {
           const content = await r2Object.text();
-          this.tracer?.addSpan(`file:${path}`, performance.now() - r2Start, "r2");
+          this.tracer?.recordFetch({
+            url: `r2://${path}`,
+            duration_ms: performance.now() - r2Start,
+            cached: true,
+            size: content.length,
+          });
 
           // Populate module cache
           if (fileCache.size >= MAX_FILE_CACHE_ENTRIES) evictExpiredFileCache();
@@ -1047,7 +1098,11 @@
           }
           return content;
         }
-        this.tracer?.addSpan(`file-r2:${path}`, performance.now() - r2Start, "miss");
+        this.tracer?.recordFetch({
+          url: `r2://${path}`,
+          duration_ms: performance.now() - r2Start,
+          cached: false,
+        });
       }
 
       // Tier 2: Extract from ZIP
@@ -1067,7 +1122,12 @@
 
           if (repoPath === path) {
             const content = new TextDecoder().decode(fileData);
-            this.tracer?.addSpan(`file:${path}`, performance.now() - zipStart, "build", "zip-extract");
+            this.tracer?.recordFetch({
+              url: `build://${path}`,
+              duration_ms: performance.now() - zipStart,
+              cached: false,
+              size: content.length,
+            });
 
             // Persist to R2 (no TTL — content-addressed)
             if (this.env.BASELINE) {

diff --git a/workers/test/telemetry-integration.test.mjs b/workers/test/telemetry-integration.test.mjs
--- a/workers/test/telemetry-integration.test.mjs
+++ b/workers/test/telemetry-integration.test.mjs
@@ -179,14 +179,14 @@
   });
 
   const shape = await measurePayloadShape(requestBody, responseBody);
-  recordTelemetry(mockRequest(), requestBody, env, 42, "memory", shape);
+  recordTelemetry(mockRequest(), requestBody, env, 42, { hits: 1, total: 1 }, shape);
 
   assert.equal(env.ODDKIT_TELEMETRY.writes.length, 1, "should write 1 data point");
   const point = env.ODDKIT_TELEMETRY.writes[0];
 
-  // Schema shape
-  assert.equal(point.blobs.length, 9, `blobs should be 9, got ${point.blobs.length}`);
-  assert.equal(point.doubles.length, 6, `doubles should be 6, got ${point.doubles.length}`);
+  // Schema shape — blob9 retired, doubles 7 and 8 added
+  assert.equal(point.blobs.length, 8, `blobs should be 8 (blob9 retired), got ${point.blobs.length}`);
+  assert.equal(point.doubles.length, 8, `doubles should be 8, got ${point.doubles.length}`);
   assert.equal(point.indexes.length, 1, "indexes should be 1");
 
   // Blobs
@@ -196,7 +196,6 @@
   assert.equal(point.blobs[3], "integration-test", "blob4 = consumer_label");
   assert.equal(point.blobs[4], "query-param", "blob5 = consumer_source");
   assert.equal(point.blobs[7], "0.23.1-test", "blob8 = worker_version");
-  assert.equal(point.blobs[8], "memory", "blob9 = cache_tier");
 
   // Doubles
   assert.equal(point.doubles[0], 1, "double1 = count");
@@ -205,6 +204,8 @@
   assert.equal(point.doubles[3], shape.bytes_out, "double4 = bytes_out");
   assert.equal(point.doubles[4], shape.tokens_in, "double5 = tokens_in");
   assert.equal(point.doubles[5], shape.tokens_out, "double6 = tokens_out");
+  assert.equal(point.doubles[6], 1, "double7 = cache_hits");
+  assert.equal(point.doubles[7], 1, "double8 = cache_lookups");
 
   console.log(`     bytes_in=${shape.bytes_in} bytes_out=${shape.bytes_out} ` +
               `tokens_in=${shape.tokens_in} tokens_out=${shape.tokens_out}`);
@@ -231,7 +232,7 @@
   });
 
   const shape = await measurePayloadShape(requestBody, responseBody);
-  recordTelemetry(mockRequest("realistic-test"), requestBody, env, 215, "r2", shape);
+  recordTelemetry(mockRequest("realistic-test"), requestBody, env, 215, { hits: 0, total: 1 }, shape);
 
   const point = env.ODDKIT_TELEMETRY.writes[0];
   assert.equal(point.blobs[2], "oddkit", "tool_name = oddkit (router)");
@@ -256,7 +257,7 @@
   });
   // Simulating the call site path where Content-Type was not application/json
   const shape = await measurePayloadShape(requestBody, "");
-  recordTelemetry(mockRequest(), requestBody, env, 50, "memory", shape);
+  recordTelemetry(mockRequest(), requestBody, env, 50, { hits: 1, total: 1 }, shape);
 
   const point = env.ODDKIT_TELEMETRY.writes[0];
   assert.equal(point.doubles[3], 0, "bytes_out should be 0 for empty response");
@@ -277,7 +278,7 @@
   const responseBody = JSON.stringify(batch.map(m => ({ jsonrpc: "2.0", id: m.id, result: { ok: true } })));
 
   const shape = await measurePayloadShape(requestBody, responseBody);
-  recordTelemetry(mockRequest(), requestBody, env, 30, "cache", shape);
+  recordTelemetry(mockRequest(), requestBody, env, 30, { hits: 1, total: 1 }, shape);
 
   assert.equal(env.ODDKIT_TELEMETRY.writes.length, 3, `should write 3 data points, got ${env.ODDKIT_TELEMETRY.writes.length}`);
   assert.equal(env.ODDKIT_TELEMETRY.writes[0].blobs[2], "oddkit_time");
@@ -304,10 +305,12 @@
 // Build a test schema map (mirrors the production baseline)
 const TEST_BLOB_NAMES = [
   "event_type", "method", "tool_name", "consumer_label", "consumer_source",
-  "knowledge_base_url", "document_uri", "worker_version", "cache_tier",
+  "knowledge_base_url", "document_uri", "worker_version",
+  // blob9 (cache_tier) retired in retire-indexsource-interpreter
 ];
 const TEST_DOUBLE_NAMES = [
   "count", "duration_ms", "bytes_in", "bytes_out", "tokens_in", "tokens_out",
+  "cache_hits", "cache_lookups",
 ];
 const testMap = buildSchemaMapFromArrays(TEST_BLOB_NAMES, TEST_DOUBLE_NAMES);
 
@@ -341,18 +344,18 @@
 });
 
 await test("rewriteSqlToRaw: translates all blob semantic names", async () => {
-  const sql = "SELECT event_type, method, tool_name, consumer_label, consumer_source, knowledge_base_url, document_uri, worker_version, cache_tier FROM oddkit_telemetry";
+  const sql = "SELECT event_type, method, tool_name, consumer_label, consumer_source, knowledge_base_url, document_uri, worker_version FROM oddkit_telemetry";
   const rewritten = rewriteSqlToRaw(sql, testMap);
   assert.ok(rewritten.includes("blob1"), "event_type → blob1");
   assert.ok(rewritten.includes("blob2"), "method → blob2");
   assert.ok(rewritten.includes("blob3"), "tool_name → blob3");
   assert.ok(rewritten.includes("blob6"), "knowledge_base_url → blob6");
-  assert.ok(rewritten.includes("blob9"), "cache_tier → blob9");
+  assert.ok(rewritten.includes("blob8"), "worker_version → blob8");
   assert.ok(!rewritten.includes("event_type"), "event_type should be gone");
 });
 
 await test("rewriteSqlToRaw: translates all double semantic names", async () => {
-  const sql = "SELECT SUM(count) AS n, AVG(duration_ms), SUM(bytes_in), SUM(bytes_out), AVG(tokens_in), AVG(tokens_out) FROM oddkit_telemetry";
+  const sql = "SELECT SUM(count) AS n, AVG(duration_ms), SUM(bytes_in), SUM(bytes_out), AVG(tokens_in), AVG(tokens_out), SUM(cache_hits), SUM(cache_lookups) FROM oddkit_telemetry";
   const rewritten = rewriteSqlToRaw(sql, testMap);
   assert.ok(rewritten.includes("double1"), "count → double1");
   assert.ok(rewritten.includes("double2"), "duration_ms → double2");
@@ -360,8 +363,12 @@
   assert.ok(rewritten.includes("double4"), "bytes_out → double4");
   assert.ok(rewritten.includes("double5"), "tokens_in → double5");
   assert.ok(rewritten.includes("double6"), "tokens_out → double6");
+  assert.ok(rewritten.includes("double7"), "cache_hits → double7");
+  assert.ok(rewritten.includes("double8"), "cache_lookups → double8");
   assert.ok(!rewritten.includes("duration_ms"), "duration_ms should be gone");
   assert.ok(!rewritten.includes("tokens_out"), "tokens_out should be gone");
+  assert.ok(!rewritten.includes("cache_hits"), "cache_hits should be gone");
+  assert.ok(!rewritten.includes("cache_lookups"), "cache_lookups should be gone");
 });
 
 await test("rewriteSqlToRaw: knowledge_base_url doesn't clobber shorter substrings", async () => {
@@ -525,7 +532,7 @@
   const shape = await measurePayloadShape(requestBody, "ok");
 
   // Should not throw
-  recordTelemetry(mockRequest(), requestBody, env, 10, "none", shape);
+  recordTelemetry(mockRequest(), requestBody, env, 10, { hits: 0, total: 0 }, shape);
   assert.equal(env.ODDKIT_TELEMETRY.writes.length, 0, "should not write anything for malformed input");
 });
 
@@ -536,65 +543,63 @@
   const requestBody = JSON.stringify({ jsonrpc: "2.0", id: 1, method: "tools/list" });
   const shape = await measurePayloadShape(requestBody, "{}");
   // Should not throw
-  recordTelemetry(mockRequest(), requestBody, env, 5, "memory", shape);
+  recordTelemetry(mockRequest(), requestBody, env, 5, { hits: 1, total: 1 }, shape);
 });
 
-// ─── Test 7: Streaming-race regression — cacheTier must be read AFTER body ──
+// ─── Test 7: Streaming-race regression — cacheStats must be read AFTER body ──
 
-await test("cache_tier reads must happen after the streaming response body completes", async () => {
+await test("cacheStats reads must happen after the streaming response body completes", async () => {
   // The MCP handler from agents/mcp returns a streaming Response. `await
   // handler(...)` resolves with the Response object before the tool handler
-  // closure has finished populating the tracer. Reading `tracer.indexSource`
-  // immediately after the await yields "none" for every tool because the
-  // "index" / "index-build" span has not been recorded yet. The fix in
+  // closure has finished populating the tracer. Reading `tracer.cacheStats`
+  // immediately after the await yields {hits:0,total:0} for every tool
+  // because no fetch records have been written yet. The fix in
   // workers/src/index.ts moves the read inside the waitUntil callback,
   // after the response body has been consumed (which forces the streaming
   // tool handler to complete).
   //
-  // This test simulates the timing pattern. A tracer is created, then a
-  // deferred task adds the "index" span asynchronously (mimicking the tool
-  // handler running while the response body streams). We assert:
-  //   (a) the OLD pattern (read immediately) returns "none" — this is the
-  //       observable bug the fix exists to prevent
-  //   (b) the FIXED pattern (read after a microtask flush) returns the
-  //       actual span source — this proves the fix recovers the value
+  // The interpretation layer (`indexSource`) was retired in
+  // refactor/retire-indexsource-interpreter, but the streaming-race
+  // regression survives unchanged — semantics are identical, only the
+  // accessor changed (indexSource → cacheStats).
 
   const tracer = new RequestTracer();
 
-  // Schedule the "index" span for the next tick — this models a streaming
-  // tool handler that has not yet recorded its index access at the moment
-  // the outer handler's `await` resolves.
+  // Schedule a fetch record for the next tick — this models a streaming
+  // tool handler that has not yet recorded its storage access at the
+  // moment the outer handler's `await` resolves.
   const handlerDone = new Promise((resolve) => {
     setImmediate(() => {
-      tracer.addSpan("index", 12, "cache");
+      tracer.recordFetch({ url: "cf-cache://index/v2.4/baseline_abc", duration_ms: 12, cached: true });
       resolve();
     });
   });
 
-  // (a) OLD pattern: read tracer.indexSource synchronously, before the
-  // deferred span has been added. This reproduces the production bug.
-  const oldPatternRead = tracer.indexSource;
+  // (a) OLD pattern: read tracer.cacheStats synchronously, before the
+  // deferred fetch has been added. This reproduces the production bug.
+  const oldPatternRead = tracer.cacheStats;
   assert.equal(
-    oldPatternRead,
-    "none",
-    "OLD pattern (read immediately after await) returns 'none' — this is the streaming-race bug",
+    oldPatternRead.total,
+    0,
+    "OLD pattern (read immediately after await) sees zero fetches — the streaming-race bug",
   );
 
-  // Wait for the deferred span to land (modeling `await responseClone.text()`
+  // Wait for the deferred fetch to land (modeling `await responseClone.text()`
   // forcing the streaming tool handler to finish).
   await handlerDone;
 
-  // (b) FIXED pattern: read tracer.indexSource AFTER the deferred work has
-  // completed. The tracer now reflects the actual cache tier.
... diff truncated: showing 800 of 976 lines

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 31e386d. Configure here.

Comment thread workers/src/tracing.ts
@klappy klappy merged commit 1a1fd4e into main Apr 26, 2026
5 checks passed
@klappy klappy deleted the refactor/retire-indexsource-interpreter branch April 26, 2026 19:18
klappy added a commit that referenced this pull request Apr 26, 2026
Workers Builds reported success on 1a1fd4e (PR #141 merge) at 2026-04-26T19:18:49Z, but the deployed bundle is still serving pre-#141 tracing.ts (envelope still has `index_source` field that was deleted in the refactor; no `fetches[]` array). Suspect stale build cache or ref-tracking issue on the CF integration. This empty commit gives the build a fresh SHA to react to.

If this doesn't resolve within ~5 min, the next step is dashboard-side investigation of Workers Builds settings — not a code fix.

Co-authored-by: Klappy (via Claude) <claude@klappy.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants