Skip to content

Merge pull request #134 from klappy/feat/telemetry-tokenization#136

Merged
klappy merged 16 commits intoprodfrom
main
Apr 26, 2026
Merged

Merge pull request #134 from klappy/feat/telemetry-tokenization#136
klappy merged 16 commits intoprodfrom
main

Conversation

@klappy
Copy link
Copy Markdown
Owner

@klappy klappy commented Apr 24, 2026

Note

Medium Risk
Adds new telemetry fields and rewrites the telemetry query path (semantic-to-raw SQL translation plus result rewriting), which could affect observability accuracy and query behavior if parsing/rewrite edge cases slip through.

Overview
Adds payload-shape telemetry to the Worker by measuring request/response bytes_* and tokens_* (via lazy-loaded gpt-tokenizer cl100k) and writing them as double3double6 in oddkit_telemetry, with response cloning + measurement deferred in ctx.waitUntil (and no Content-Type filtering).

Refactors telemetry recording to accept the already-read request body and payload shape, fixes a streaming race by reading tracer.indexSource only after response body consumption, and expands cache-tier detection to treat file:* spans as primary fetches.

Upgrades telemetry_public querying to a semantic schema contract: loads a canon-derived mapping (baseline fallback), rejects raw blobN/doubleN references, rewrites consumer SQL to raw slots for Cloudflare, and rewrites result column names back to semantic. Adds new unit/integration tests for tokenization, telemetry write path, and SQL rewriting, and bumps root/worker versions to 0.24.0 (plus adds gpt-tokenizer dependency).

Reviewed by Cursor Bugbot for commit e2912f3. Bugbot is set up for automated code reviews on this repo. Configure here.

Claude (drafting for klappy) and others added 13 commits April 23, 2026 19:01
…-tokenizer

Adds payload-shape instrumentation to MCP telemetry. New doubles 3-7
capture wire size and cl100k_base token counts for every request and
response, plus the wall-clock cost of tokenization itself.

Implementation:

- New module workers/src/tokenize.ts wraps gpt-tokenizer/encoding/cl100k_base
  with a lazy-loaded singleton encoder and a safe-failure surface
  (countTokensSafe, measurePayloadShape). Module-level promise caches the
  encoder across requests within a worker isolate; cold path pays parse
  once, all subsequent calls are warm.

- Refactors workers/src/telemetry.ts recordTelemetry signature to accept
  a pre-read body string + optional PayloadShape rather than reading the
  request body itself. Schema doc comment expanded to describe doubles
  3-7. Synchronous now (no longer returns a Promise) since the callers
  measurement work happens in waitUntil.

- Updates workers/src/index.ts call site: clones the response (when
  Content-Type is application/json), reads request and response bodies in
  the waitUntil background task, calls measurePayloadShape, then
  recordTelemetry. Zero user-facing latency added — measurement happens
  after the response is sent. SSE responses skip body measurement.

Tokenizer choice:

- gpt-tokenizer/encoding/cl100k_base over @anthropic-ai/tokenizer.
  Empirical bench (Node v22, same V8 as Workers): cl100k median 0.05-1.3ms
  across 200B-50KB payloads vs 0.30-7.4ms for Anthropic WASM. p95 dramatically
  better (no WASM memory-grow spikes).
- Token count diverges ~3-4% from Claude tokenizer on English prose;
  acceptable noise floor for shape analysis (we are not billing).
- Bundle delta measured empirically via esbuild: 432KB gzipped
  (993KB minified). Comfortably within paid-tier Workers limits.

Failure handling:

- Any tokenizer load or encode failure → countTokensSafe returns null,
  treated as 0 in telemetry. tokenize_ms = 0 alongside non-zero bytes
  signals a measurement skip in the data.
- Telemetry must never break MCP requests — all measurement code wrapped
  in try/catch within the waitUntil block.

Tests:

- New workers/test/tokenize.test.mjs (8 cases, all pass): empty input,
  positive integer output, scaling with length, full PayloadShape contract,
  UTF-8 byte length correctness, JSON-RPC payload tokenization, tokenize_ms
  finiteness, empty-response (SSE) skip path.
- Compiles tokenize.ts via tsc into a temp dir, then dynamic-imports;
  exercises the same TypeScript surface that ships in the worker bundle.
- npm run typecheck clean.

Methodology note:

- This change exists because three theoretical objections (bundle bloat,
  vodka violation, tokenizer-choice domain opinion) were falsified by a
  five-minute bench. See klappy://canon/constraints/measure-before-you-object
  and klappy://canon/observations/performed-prudence-anti-pattern (drafts
  pending merge into klappy.dev).
Mocks env.ODDKIT_TELEMETRY with a writeDataPoint capture, then exercises
recordTelemetry + measurePayloadShape with realistic JSON-RPC payloads.

Verifies end-to-end that the full PayloadShape lands in doubles 3-7,
that bytes match TextEncoder UTF-8 length, that batch JSON-RPC produces
one point per message, and that malformed input is silently dropped.

7/7 cases pass. Notable: the realistic ~8KB response measured
tokenize_ms=0.948ms — within 14% of the bench prediction (~1.1ms median
for 8KB on Node). The dream-home walkthrough was accurate; real prod
will differ but the order of magnitude is locked.

Compiles tokenize.ts + telemetry.ts via tsc into a temp dir, post-patches
the JSON import to add Node 22's required attribute syntax, then
dynamic-imports. Same code path that ships in the worker bundle.

This is the verification that wrangler dev would have done if workerd
ran in this nested sandbox (it doesn't — workerd dies after declaring
ready, likely a Linux capability issue with the container).
Two assertions that would have failed against the pre-fix code:

1. SSE response now asserts tokenize_ms=0 (was: only checked
   bytes_out/tokens_out, missed the spurious non-zero tokenize_ms that
   the original logic would record on every SSE response).

2. New test 'Bugbot invariant: tokenize_ms is 0 only when encoder did
   not actually run' explicitly covers the both-empty case (must be 0)
   and the request-only case (must be valid finite number).

Both new assertions verify Bugbot's distinction: a 0 from countTokensSafe
on empty input is a trivial short-circuit, not a real tokenization. Only
non-null results on non-empty input prove the encoder ran. The pre-fix
code conflated these and would have polluted the bench-vs-prod A/B
comparison with spurious tokenize_ms readings on SSE traffic.

Real-world tokenize_ms on the realistic 8KB integration test:
1.016ms (bench predicted 1.1ms — within 8%).

8/8 cases passing.
… JSON

CRITICAL FIX. A managed-agent smoke test against the preview deployment
caught that doubles 4 (bytes_out), 6 (tokens_out), and 7 (tokenize_ms)
were all zero across every recorded data point. Six telemetry rows
queried, six rows with bytes_out=0.

Root cause: the call site in workers/src/index.ts filtered the response
clone by Content-Type, only cloning when the type included
'application/json'. MCP's Streamable HTTP transport returns
'text/event-stream' (SSE) for tool calls, not JSON. The filter was
silently dropping almost every response, leaving responseClone null and
recording zeros for the entire response side.

This was the same performed-prudence pattern the new canon docs warn
about, applied in micro: I assumed MCP responses would be JSON without
measuring what the SDK actually returns. The smoke test caught it
because canon also prescribes verification before declaring done.

Fix:

1. New helper measureResponseShape(requestText, response) in tokenize.ts.
   Clones the response, reads the body, runs measurePayloadShape. No
   Content-Type filter — read everything. SSE protocol overhead (~10
   bytes per event) is negligible against the actual payload size, and
   oddkit's responses are bounded (no long-lived streams).

2. Call site in index.ts simplified to use the helper. Drops the
   filter, drops the separate clone, drops the responseClone variable.
   Cleaner code AND correct behavior.

3. Four new unit tests for measureResponseShape:
   - measures application/json responses
   - measures text/event-stream responses (this would have caught the
     bug pre-merge)
   - leaves the original response body intact (clone correctness)
   - handles already-consumed body without throwing

12/12 unit tests pass, typecheck clean.

Methodology note: this fix exists because the smoke test (live MCP
calls + telemetry_public SQL) caught what unit tests missed. The
canon-prescribed verification gate worked exactly as designed —
release-validation-gate (E0008.3) at klappy://canon/constraints/release-validation-gate
mandates independent live smoke for load-bearing surface changes
before merge. The agent dispatch is that smoke.
…Workers

Third smoke confirmed bytes_in/out and tokens_in/out now populate
correctly (357-21319 bytes_out, 142-5398 tokens_out across varied
payloads). But double7 (tokenize_ms) is still 0 across every row.

Root cause: Cloudflare Workers' performance.now() is a deterministic
timer — it does NOT advance during synchronous CPU work. The mitigation
prevents timing-side-channel attacks. The timer only ticks on I/O.

Tokenization (countTokensSafe) is pure CPU work. The encoder runs
between two reads of performance.now() with no I/O in between, so both
reads return the same value and tokenize_ms is always 0. Tests passed
in Node because Node's performance.now() is a real high-resolution
timer.

Fix: switch to Date.now(). Always advances, at 1ms resolution. The
bench-vs-prod comparison loses sub-millisecond precision (sub-ms
tokenizations round to 0) but gains a working signal for any payload
above ~5KB where bench timing exceeded 1ms.

Updated the telemetry.ts schema doc comment to document the 1ms
resolution and the Workers-specific reason.

Methodology: this is the third Cloudflare Workers gotcha caught in
prod that unit tests can't catch — Workers Runtime != Node:

  1. b94aaa6 (mine): assumed MCP responses are application/json (they're SSE)
  2. 1a555df (mine): assumed clone() inside waitUntil works (body already drained)
  3. THIS: assumed performance.now() advances in synchronous code (it doesn't)

Each was caught by the live Managed Agent smoke + telemetry_public SQL,
not by typecheck or unit tests. The release-validation-gate is the
only thing standing between this branch and a quietly broken prod
telemetry pipeline.

8 unit tests still pass. Typecheck clean.
Fourth smoke confirmed bytes_in/out and tokens_in/out work in production
(357-21319 bytes_out, 142-5398 tokens_out across varied payload sizes).
But tokenize_ms remained 0 across every row even with the Date.now()
fix from 279f761.

Root cause discovered by the agent: Cloudflare Workers freezes BOTH
performance.now() AND Date.now() during synchronous CPU work. Both
timers only advance on network I/O events as a side-channel mitigation
(documented at developers.cloudflare.com/workers/runtime-apis/web-standards/).
Tokenization is pure CPU work, so any sub-request timing of it always
reads 0 in production. This is a structural runtime constraint, not a
bug we can patch.

Workarounds considered and rejected:
- Force artificial I/O between reads (KV.list, fetch) — adds real
  latency to telemetry-only paths, grotesque
- Two writeDataPoint calls with start/end timestamps — over-engineered,
  doubles write count, complicates queries
- Keep the column as always-0 — actively misleading

Decision: drop tokenize_ms entirely from PayloadShape, the doubles
array, schema doc, and tests. The bench at workers/test/tokenize.test.mjs
already characterized the cost curve (cl100k handles 50 KB in ~1.3 ms
on Node v22). Bytes_out + tokens_out are sufficient signal — a future
maintainer can predict tokenize_ms from the bench curve given the
observed payload sizes.

Schema before:
  doubles: [count, duration_ms, bytes_in, bytes_out, tokens_in,
            tokens_out, tokenize_ms]  // 7 fields

Schema after:
  doubles: [count, duration_ms, bytes_in, bytes_out, tokens_in,
            tokens_out]  // 6 fields

Companion canon update at klappy/klappy.dev coming in next commit on
that branch — drops tokenize_ms row from the doubles table and removes
the tokenize_ms mention in 'What This Enables'.

Methodology: this is the fourth Workers Runtime != Node behavioral diff
caught by live smoke on this branch. Each was unmeasurable from unit
tests because Node behaves differently:
  1. b94aaa6 (mine, broken): Content-Type filter (MCP returns SSE)
  2. 1a555df (mine, broken): clone in waitUntil (body already drained)
  3. 279f761 (mine, broken): Date.now() in Workers (frozen too)
  4. THIS: drop the unmeasurable column entirely

The release-validation-gate canon doc is the only thing that surfaced
each of these — the live preview smoke + telemetry_public SQL caught
what no test setup I could ship would have caught. The Workers-runtime
gap was real and the gate worked.

Tests:
- 7/7 unit tests pass (workers/test/tokenize.test.mjs)
- 6/6 integration tests pass (workers/test/telemetry-integration.test.mjs)
- typecheck clean
Minor bump for payload-shape telemetry (PR #134).

Bumps:
  package.json              0.23.1 -> 0.24.0
  workers/package.json      0.23.1 -> 0.24.0
  package-lock.json         0.23.0 -> 0.24.0  (root drifted one release behind)
  workers/package-lock.json 0.23.1 -> 0.24.0

CHANGELOG.md gains the [0.24.0] entry above [0.23.1] documenting:
  - Added: bytes_in/out, tokens_in/out telemetry doubles + helpers
  - Changed: drop the Content-Type filter (MCP responses are SSE)
  - Removed: tokenize_ms — Workers freezes both perf.now and Date.now
  - Fixed: root package-lock.json version drift back-fill

The four Workers Runtime != Node behavioral diffs caught by the five
Managed Agent smoke sessions on this branch are listed in the Refs
trailer for forensic record.

Tests: 7/7 unit + 6/6 integration pass on bumped state. Typecheck clean
(reports as oddkit-mcp-worker@0.24.0).

Per workflow: dedicated chore/release-x.y.z PR. Branch is off
feat/telemetry-tokenization HEAD, so it carries the feature commits +
the bump together. After merge, feat/telemetry-tokenization can be
closed (its commits are already in main via this release branch).
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 24, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
oddkit e2912f3 Commit Preview URL

Branch Preview URL
Apr 26 2026, 03:04 AM

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Unprotected response.clone() can crash MCP handler
    • Wrapped the synchronous response.clone() call in a try/catch so a clone failure falls back to skipping telemetry and lets return response execute, upholding the invariant that telemetry must never break MCP requests.
Preview (be41e0c37f)
diff --git a/workers/src/index.ts b/workers/src/index.ts
--- a/workers/src/index.ts
+++ b/workers/src/index.ts
@@ -970,29 +970,37 @@
         const cacheTier = tracer.indexSource;
         // Clone the response synchronously before returning so the body is
         // still available to read inside the deferred waitUntil callback.
-        const responseClone = response.clone();
+        // Guard the clone itself — telemetry must never break MCP requests.
+        let responseClone: Response | null = null;
+        try {
+          responseClone = response.clone();
+        } catch {
+          // If cloning fails, skip telemetry for this request.
+        }
 
-        ctx.waitUntil(
-          (async () => {
-            try {
-              const requestText = await telemetryClone.text();
+        if (responseClone) {
+          ctx.waitUntil(
+            (async () => {
+              try {
+                const requestText = await telemetryClone.text();
 
-              const { measurePayloadShape } = await import("./tokenize");
-              const { recordTelemetry } = await import("./telemetry");
+                const { measurePayloadShape } = await import("./tokenize");
+                const { recordTelemetry } = await import("./telemetry");
 
-              let responseText = "";
-              try {
-                responseText = await responseClone.text();
+                let responseText = "";
+                try {
+                  responseText = await responseClone!.text();
+                } catch {
+                  // Fall through with empty string; bytes_out / tokens_out will be 0.
+                }
+                const shape = await measurePayloadShape(requestText, responseText);
+                recordTelemetry(request, requestText, env, durationMs, cacheTier, shape);
               } catch {
-                // Fall through with empty string; bytes_out / tokens_out will be 0.
+                // Telemetry must never break MCP requests
               }
-              const shape = await measurePayloadShape(requestText, responseText);
-              recordTelemetry(request, requestText, env, durationMs, cacheTier, shape);
-            } catch {
-              // Telemetry must never break MCP requests
-            }
-          })(),
-        );
+            })(),
+          );
+        }
       }
 
       return response;

You can send follow-ups to the cloud agent here.

Comment thread workers/src/index.ts
klappy added 2 commits April 25, 2026 22:25
…ble* slots (#137)

Consumers of telemetry_public now write SQL using semantic names only.
Raw slot names (blob1-blob9, double1-double6) are hidden from the
consumer interface entirely.

- Raw slot names in queries are rejected with a helpful error naming
  the semantic equivalent
- Semantic names are transparently rewritten to raw slots before
  forwarding to Cloudflare Analytics Engine
- Result columns are rewritten back from raw slots to semantic names
  before returning
- Tool docstring shows semantic names only

The schema mapping is sourced from canon at runtime via
KnowledgeBaseFetcher.getFile(canon/constraints/telemetry-governance.md)
with a hardcoded baseline as safety net per vodka architecture.

Per maintainer directive: no deprecation, nobody uses the raw names yet.

Phase 1 audit findings (telemetry-governance compliance):
  bytes_in:    populated (sum=41,545 over 7 days)
  bytes_out:   populated (sum=1,316,480)
  tokens_in:   populated (sum=12,602)
  tokens_out:  populated (sum=325,627)
  cache_tier:  always "none" — surfaced as separate open issue

Bug fixes from Cursor Bugbot review:
  - count() SQL aggregate collision (negative lookahead guard)
  - String literal corruption in rewrite (literal-skip splitting)
  - Detection inconsistency vs rewrite (literal strip in detection)

Test coverage: 17/17 passing including 3 dedicated regression tests
for each Bugbot fix family.

Open issue surfaced for follow-up: cache_tier always reads "none"
in production — recordTelemetry receives the parameter correctly,
something upstream is always passing "none".
…nsumption (#138)

The MCP handler from agents/mcp returns a streaming Response.
`await handler(request, env, ctx)` resolves with the Response object
before the tool handler closure has finished populating the tracer.
Reading `tracer.indexSource` immediately after that await yields
"none" because the "index" / "index-build" span has not been recorded
yet. This caused EVERY tool call in production telemetry to record
cache_tier="none" across the entire E0008.1 lifetime of the field —
including oddkit_search and oddkit_orient which definitely call
getIndex().

Fix: move the cacheTier read INSIDE the ctx.waitUntil callback,
AFTER `await responseClone.text()` resolves. Reading the response
body to completion forces the streaming tool handler to have
finished, which means the tracer is fully populated.

Diagnostic evidence (same session that found the bug):
  oddkit_search → response.debug.trace.index_source = "cache"
                  (correct — read inside handleUnifiedAction at
                   orchestrate.ts:2806, AFTER action ran)
                  telemetry blob9 = "none"
                  (incorrect — read at index.ts:981 too early)

Same tracer instance, two read sites, two different values.
Proves the race.

Regression test added: uses setImmediate to model a streaming tool
handler that hasn't yet recorded its index access at the moment the
outer handler's await resolves. Asserts both the OLD-pattern bug
reproduces and the FIXED-pattern recovers the value.

Test count: 17 → 18 passing.

Open issue (separate PR): getFile emits `file:${path}` spans, not
`index` / `index-build`. The tracer's _indexSource setter only
recognizes the latter two labels. Whether oddkit_get's blob9 should
track the index tier (current behavior post-fix), the file tier
(new span label needed), or both is a follow-up scope decision.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Canon-derived names used in regex without escaping
    • Added an escapeRegExp helper and used it on the semantic name when building the rewrite regex in rewriteSqlToRaw, so canon-derived names containing regex metacharacters can no longer corrupt the pattern.
Preview (838d97cd27)
diff --git a/workers/src/telemetry.ts b/workers/src/telemetry.ts
--- a/workers/src/telemetry.ts
+++ b/workers/src/telemetry.ts
@@ -456,6 +456,16 @@
 const RAW_SLOT_PATTERN = /\b(blob[1-9]|double[1-9])\b/gi;
 
 /**
+ * Escape regex metacharacters in a string so it can be safely interpolated
+ * into a `new RegExp(...)` pattern. Used for canon-derived semantic names,
+ * which are sourced from a maintainer-controlled markdown table but could
+ * in principle contain characters like `.` that have special regex meaning.
+ */
+function escapeRegExp(s: string): string {
+  return s.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+}
+
+/**
  * Reject queries that contain raw slot names (blob1..9 / double1..6).
  * Returns an error message string, or null if the query is clean.
  *
@@ -522,7 +532,10 @@
     for (const [semantic, raw] of entries) {
       // \b word-boundary anchors prevent partial matches inside longer identifiers.
       // Negative lookahead (?!\s*\() skips function-call positions (e.g. count(*)).
-      const pattern = new RegExp(`\\b${semantic}\\b(?!\\s*\\()`, "g");
+      const pattern = new RegExp(
+        `\\b${escapeRegExp(semantic)}\\b(?!\\s*\\()`,
+        "g",
+      );
       out = out.replace(pattern, raw);
     }
     return out;

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 838d97c. Configure here.

Comment thread workers/src/telemetry.ts
// \b word-boundary anchors prevent partial matches inside longer identifiers.
// Negative lookahead (?!\s*\() skips function-call positions (e.g. count(*)).
const pattern = new RegExp(`\\b${semantic}\\b(?!\\s*\\()`, "g");
out = out.replace(pattern, raw);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Canon-derived names used in regex without escaping

Low Severity

In rewriteSqlToRaw, semantic names are interpolated directly into new RegExp() without escaping regex metacharacters. The baseline names are safe (only [a-z_]), but parseDoublesFromCanon accepts backtick-quoted identifiers containing arbitrary characters (e.g., ., (, +). If canon ever provides such a name, it would corrupt the regex and produce incorrect or broken SQL rewriting.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 838d97c. Configure here.

…th (#139)

Follow-up to #138. The streaming-race fix corrected the timing bug,
but oddkit_get for klappy:// URIs still recorded cache_tier="none"
because of a separate defect: the tracer's _indexSource setter only
recognized 'index' and 'index-build' labels.

runGet for klappy:// URIs takes a fast path that skips getIndex
entirely and calls getFile directly. The fetcher emits 'file:\${path}'
spans which the setter ignored. ~95% of oddkit_get calls hit this
path, so even after #138, ~95% of get calls had cache_tier='none'.

Fix: extend the setter to also recognize 'file:*' labels alongside
'index' / 'index-build'. First-wins guard preserved:

  - runSearch: 'index' fires first → index tier wins
  - runGet (klappy://): only 'file:*' fires → file tier wins
  - runGet (kb://, odd://): 'index' first → index tier wins (~5%)
  - 'file-r2:*' (R2 miss with source='miss') excluded by guard

Internal field/getter names stay unchanged. Public envelope key
'index_source' in tracer.toJSON() is part of the response contract.
Doc-comment updated to reflect broader semantic.

4 regression tests added covering: file:* recognition, index-wins
preserved, miss-spans excluded, original behavior unchanged.

Test count: 18 → 22 passing.

This completes the cache_tier dashboard: every tool with a real data
fetch now records its actual tier.
@klappy klappy merged commit 96d0157 into prod Apr 26, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants