Feat/eng 2964#725
Conversation
…sk types Adds curate-tool-mode, query-tool-mode, dream-scan, dream-finalize to TaskTypes + TASK_TYPE_VALUES. After ENG-2925's rename the daemon dispatches these four types but the analytics enum still predates v4.0 — task_created / task_completed / task_failed silently rejected every tool-mode emit at the Zod boundary. - TaskTypes keeps the legacy 'curate' / 'query' / 'dream' values for back-compat with any constructor still building those payloads - per-event schemas (task_created / task_completed / task_failed) pick up the new types automatically via z.enum(TASK_TYPE_VALUES) - M12 per-flavor schemas (curate_run_completed / query_completed) still hardcode their own literals and continue to reject tool-mode types here — M14.2 migrates them to the canonical enum as a follow-up TDD: - new task-types tests assert TaskCreated/Completed/Failed accept all four new types - regression tests pin the M12 schemas' continued rejection so M14.2 has a clear flip point
…sk_type to TASK_TYPE_VALUES
Migrates the two M12 per-flavor schemas from hardcoded literal task_type
values to the canonical TASK_TYPE_VALUES enum so v4.0 tool-mode tasks
round-trip the wire boundary instead of being silently Zod-rejected
inside AnalyticsHook.
- curate_run_completed: z.enum(['curate', 'curate-folder']) →
z.enum(TASK_TYPE_VALUES). curate-tool-mode payloads now validate.
- query_completed: z.literal('query') → z.enum(TASK_TYPE_VALUES).
query-tool-mode payloads now validate.
The schemas no longer structurally constrain task_type to the curate
or query family; the hook is trusted to only emit each event for the
right flavor. Docblocks call out the widening for the next maintainer.
TDD:
- curate-run-completed.test asserts curate-tool-mode + legacy values
both succeed; an unknown task_type still rejects
- query-completed.test mirrors the same coverage for query-tool-mode
- task-types.test M14.1 regression assertions flipped from rejection
to acceptance for the M12 schemas; M14.1 docblock comments updated
Without this, M14.3's hook code would land but the M12 emits that fire
alongside the new generic task_* emits would silently disappear on
every tool-mode task — that's the bug operators noticed in Mixpanel.
…ed in AnalyticsHook Adds the three generic funnel-event emits described in M14's customer ask. Every daemon task (curate / curate-folder / curate-tool-mode (aliased from curate-html-direct) / query / query-tool-mode / dream-scan / dream-finalize / search) now produces, in order: task_created onTaskCreate (funnel entry — unconditional) ...optional per-op + M12 per-flavor terminal emit task_completed onTaskCompleted (terminal-event-last on success) task_failed onTaskError / onTaskCancelled (terminal-event-last) Three coupled changes shipped together so the wire stays consistent: 1. analytics-hook.ts grows three emits and a `toAnalyticsTaskType` alias-translator. The daemon still dispatches the pre-ENG-2925 name `curate-html-direct`; analytics canonicalises it to the post-rename `curate-tool-mode` so the wire enum matches TASK_TYPE_VALUES. Once ENG-2925 lands, the alias becomes a no-op identity. 2. CURATE_TASK_TYPES + QUERY_TASK_TYPES gain the tool-mode names so M12 per-flavor state init kicks in for tool-mode curates / queries. M12 counters stay all-zero on tool-mode today (no LLM tool calls fire) — that's a separate follow-up (FU-1 in plans/analytics-m14/follow-ups). 3. Both M12 payload builders route `task_type` through the same alias so curate-tool-mode tasks emit `task_type='curate-tool-mode'` on the curate_run_completed / query_completed events too. TDD: - new analytics-hook-m14.test.ts: 15 simulation tests covering every task type, the curate-html-direct → curate-tool-mode alias, has_files / has_folder semantics, both terminal paths, and the ordering invariant - existing analytics-hook.test.ts updated to filter out the new generic emits via `filterM12()` so M12-focused assertions keep their intent - integration stress tests updated for the new 13-event sequence (task_created → ops → curate_run_completed → task_completed) - repo full suite: 9121 passing, 0 failing
…ords/tags, structured related_paths
Tightens the curate / query M12 payloads per code review:
curate_operation_applied:
- rename absolute_path → relative_path (project-relative via path.relative
against the task's projectPath; falls back to identity when projectPath
is unset so search tasks at the daemon root still emit a usable string)
- keywords / tags promoted from optional to required arrays (default [])
so the wire shape stays uniform regardless of frontmatter read success
query_completed.read_paths_with_metadata:
- same absolute_path → relative_path rename
- same keywords / tags promotion (required arrays, default [])
- flat related: string[] → structured related_paths: [{relative_path,
keywords, tags}] so each linked topic carries its own metadata slot.
Keywords/tags default to [] until a future FU cascade-reads each
linked file's frontmatter — the wire shape doesn't need to change
when that lands.
Hook implementation:
- new `toRelativePath(filePath, projectPath)` helper using node:path.relative
- CurateTaskAnalyticsState stores projectPath captured at onTaskCreate so
per-op emits can relativize without threading task through processToolResult
- all four payload sites (curate-op, curate-run-completed, query-completed
read_paths_with_metadata) route through the helper
Inspection test added at test/unit/.../analytics-hook-toolmode-inspection
that pretty-prints every event + payload for curate-tool-mode / query
/ query-tool-mode flows — gives PMs a single place to verify the wire
shape end-to-end.
Privacy win: relative paths drop the /Users/{name} prefix from every
file-touched event, keeping host-identifiable PII off the analytics
wire while still letting PMs reason about which file inside a project
an operation touched.
Tests: full repo 9132 passing.
…re_kind on task_failed Delivers M15.6's customer-facing piece: the actual wiring of AnalyticsHook into the daemon's lifecycleHooks[] (which M14.3 built emit logic for but never plumbed) plus the coarse failure_kind classifier on task_failed. Per the M14 / M15.6 alignment (Option A): M14.1-M14.3 ship as the foundation. M15.6 adds the wiring + failure_kind. M15.6's stated "scaffolded only" stance on task_created + per-flavor schemas is intentionally relaxed — M14 already produces those events and the review-tightening on curate_operation_applied / read_paths_with_metadata delivers PM-visible value that the M15.6 description deferred. Changes: src/server/infra/daemon/brv-server.ts: - import AnalyticsHook - construct it BEFORE TransportHandlers so it can land in lifecycleHooks[] alongside curate-log / query-log / task-history (now 4 peers, not 3) - the isAnalyticsEnabled gate from setupFeatureHandlers binds in via a closure ref that defers lookup until emit-time - capture setupFeatureHandlers's return value (was discarded), then setAnalyticsClient(analyticsClient) on the pre-registered hook src/shared/analytics/events/task-failed.ts: - new FailureKindValues + FailureKind type: 'cancelled' | 'timeout' | 'agent_error' | 'unknown' (coarse vocab, ≤64 chars, never raw error.message) - failure_kind added to TaskFailedSchema as a required field - docblock notes the M15.6 privacy contract src/server/infra/process/analytics-hook.ts: - new classifyFailureKind(errorMessage) helper. Substring sentinels for 'timeout' / 'timed out' / 'deadline exceeded' → 'timeout'; 'agent' / 'llm' / 'provider' / 'tool' → 'agent_error'; default 'unknown'. Raw error string NEVER ends up on the wire. - emitTaskFailed now takes a FailureKind argument - onTaskCancelled → emitTaskFailed(..., 'cancelled') - onTaskError → emitTaskFailed(..., classifyFailureKind(errorMessage)) TDD: - task-failed.test: 3 new tests pin the failure_kind shape (accepts the 4 canonical values, rejects out-of-vocab, rejects missing) - analytics-hook-m14.test: 3 new tests verify the classifier paths land on the wire — 'kaboom' → 'unknown', timeout strings → 'timeout', agent strings → 'agent_error' - new analytics-hook-lifecycle-wiring.test integration covers the end-to-end task:create → analyticsHook.onTaskCreate path through a real TaskRouter, plus all four task-type variants going through task_created → task_completed in sequence Full suite: 9145 passing, 0 failing. Lint clean.
#1 classifyFailureKind — word-boundary regex + pinned precedence: Substring matching let 'tooltip' / 'engagement' / 'urgent' bucket into agent_error. Tightened to /\b(agent|llm|provider|tool)\b/ and /\b(timeout|timed out|deadline exceeded)\b/. Docblock pins precedence (timeout > agent_error > unknown) so future if-order shuffles can't silently rebucket the funnel. #2 toRelativePath outside-project guard (privacy bug): path.relative('/proj','/Users/dev/other/x.md') returned '../../Users/dev/other/x.md' — still encoded the host layout. Same hole when projectPath was undefined (fell back to raw absolute path). Now returns '<outside-project>/<basename>' in both cases; preserves enough signal for backend grouping without becoming PII. Same guard also runs against absolute-path tails (Windows drive-letter switches). #3 toAnalyticsTaskType drift guard: Replaced `daemonType as TaskType` (per CLAUDE.md anti-cast rule) with a TASK_TYPE_SET membership check + processLog warning + 'unknown' sentinel fallback. A future un-enumerated dispatch now lands a debuggable warning at the daemon instead of disappearing at the backend Zod check. #4 dumpEvents test pollution: Gated behind DUMP_ANALYTICS=1. `npm test` runs the shape assertions silently; opt-in dump still works via `DUMP_ANALYTICS=1 npx mocha test/unit/.../analytics-hook-toolmode-inspection.test.ts`. #5 FU-1 forward-compat comment: Added a block-comment note above the tool-mode "operations_*: 0" assertions so when FU-1 lands and the counters flip non-zero, the failure reads as "feature, update the test" rather than a regression. #6 curate_operation_applied.related asymmetry: Added a TODO(M15.x) marker on the `related` field flagging the wire-shape asymmetry with `query_completed.read_paths_with_metadata .related_paths` (structured). Restructure is consumer-migration territory — own ticket. #7 brv-server.ts loud-fail assertion: Added an explicit throw if setupFeatureHandlers returns without analyticsClient. Future refactors that drop the field will explode here instead of silently no-op'ing every emit forever. Tests added to lock in #1, #2, #3 behavior so the review intent doesn't drift. Existing stress-test fixtures moved from `/A/op-N.md` under-the-root paths to `/proj/A/op-N.md` so they exercise the in-project case (out-of-project is now its own focused test). Full suite: 9227 passing, 0 failing. Lint clean.
#722 re-review) Re-review caught a real bug in the previous fixup: `toAnalyticsTaskType` returned `'unknown' as TaskType` for un-enumerated daemon types, but `'unknown'` was NOT in `TASK_TYPE_VALUES`. The docblock promised "keeps the event on the wire instead of silently failing the Zod check at the backend" — but the wire-side `z.enum(TASK_TYPE_VALUES)` actually still rejected the row. The local Sinon trackStub doesn't run Zod, which is why the m14 test passed even though the runtime path was broken. Also the `as TaskType` cast moved sites but didn't disappear — the exact anti-pattern the original review flagged. Fix: - Add `TaskTypes.UNKNOWN: 'unknown'` to the canonical vocabulary. - Append `TaskTypes.UNKNOWN` to `TASK_TYPE_VALUES` so every task_* schema accepts the sentinel via `z.enum(TASK_TYPE_VALUES)`. - Introduce `isCanonicalTaskType()` predicate so `toAnalyticsTaskType` narrows without the `as TaskType` cast. Sentinel returned as `TaskTypes.UNKNOWN` directly. - Update `task-types.test` to assert the new `UNKNOWN` key/value pair. Existing negative test for an unknown task_type already uses `'not-a-real-type'` (not `'unknown'`), so it stays green. Cosmetic: dropped the extra-space drift in the OUTSIDE_PROJECT_PATH docblock noted in the same re-review. Tests: 9227 passing, 0 failing.
… A e2e E2E shake-down caught a regression my own M15.6 commit (6983e88) introduced: when I refactored `setupFeatureHandlers`'s return capture from destructured `{analyticsClient, analyticsFlushScheduler, isAnalyticsEnabled}` to a bundled `featureHandlers` object, I dropped the call to `analyticsFlushScheduler.start()` that lived right after the destructure (originally landed in M4.3 / ENG-2645) AND lost the `analyticsFinalFlush` shutdown hook bound to the ShutdownHandler. Symptom: every `track()` call still landed in JSONL with status='pending', but the periodic 30s flush tick never armed (start() un-called) and shutdown never force-flushed (analyticsFinalFlush undefined). Events sat on disk forever. Production-affecting; the existing dev-beta.e2e.ts kept passing only because its CI run uses a real backend that the periodic flush never reached. Fix in brv-server.ts: - Re-add `let analyticsFinalFlush: (() => Promise<void>) | undefined` before ShutdownHandler construction so the shutdown closure can reference it via late-bind (same mutable-holder pattern M4.3 used). - Pass `analyticsFinalFlush: () => analyticsFinalFlush?.() ?? Promise.resolve()` to ShutdownHandler. - After the M15.6 late-bind of setAnalyticsClient, call `featureHandlers.analyticsFlushScheduler.start()` and assign the final-flush closure (`stop() + flushFinal({timeoutMs: 3000})`). E2E coverage (Phase A — 5 cancel scenarios): - New test/e2e/analytics/lifecycle-wire.e2e.ts drives real daemon + transport client + in-process HTTP capture stub. - Each scenario: task:create → 50ms wait → task:cancel → wait for the natural 30s flush tick → assert wire payload. - Coverage: curate-html-direct (alias → curate-tool-mode on wire), curate-folder (via the same flow), query-tool-mode, dream-scan, dream-finalize, search. - Assertions: task_created + task_failed{failure_kind=cancelled} reach the wire with the right task_type. dream / search assert no per-flavor M12 row (only the generic task_* pair). - Wire-shape sanity per request: device-id header, brv-cli/* UA, super-properties (cli_version, os, node_version, environment, device_id) on every event payload. Gated by `npm run test:e2e:lifecycle` (180s timeout) — not part of default `npm test` glob. ~2m end-to-end. Matches dev-beta.e2e.ts precedent. Phase B (super-props deep checks, JSONL/HTTP parity) deferred to a follow-up commit.
… into feat/ENG-2964 # Conflicts: # src/server/infra/daemon/brv-server.ts # test/integration/infra/process/analytics-hook-lifecycle-wiring.test.ts
Adds `npm run test:e2e:db` covering the FULL chain from `brv` CLI through real HTTP to the local telemetry container into postgres `raw_events`. Complements the Phase A `test:e2e:lifecycle` in-process capture suite — that one stops at the daemon's HTTP request body; this one proves the records actually land in the partitioned `raw_events` table with the documented event names, task_type alias translation, failure_kind, and shared device_id. Three scenarios: - curate-tool-mode (alias-translated from curate-html-direct) - query-tool-mode - search (no per-flavor M12 event, only task_created + task_failed) Test reaches postgres via `docker exec ... psql -t -A` against the existing byterover-telemetry compose stack (no host psql dependency). A `before()` precondition skips the suite with a clear migration hint when raw_events is missing, and `afterEach` preserves the scenario temp dirs on failure so daemon logs + JSONL can be inspected post-hoc. Not picked up by `npm test` (default glob excludes test/e2e/).
…erage
Grows `npm run test:e2e:db` from 3 → 7 scenarios. Adds dream-scan +
dream-finalize roundtrips (previously absent from DB coverage), a
deep wire-shape sanity check on a curate scenario, and a two-task
isolation guard that confirms task_id selectors don't bleed events
across concurrent runs.
New per-row sanity helper (`assertRowShape`) is invoked from every
scenario and pins:
- schema_version = 1
- cli_version matches semver
- os ∈ {darwin, linux, win32}
- node_version starts with `v<digit>`
- environment ∈ {development, production}
- device_id non-empty + identical across all rows of a task
- user_id null (anonymous emits — test never authenticates)
Per-flavor payload assertions added:
- curate_run_completed: outcome='cancelled', duration_ms ≥ 0, all
operations_* counters and pending_review_count = 0 (we cancel
before any tool calls land)
- query_completed: outcome='cancelled', duration_ms ≥ 0,
read_tool_call_count = 0, search_call_count = 0
- task_created: has_files=false, has_folder=false
- task_failed: failure_kind='cancelled'
Wire-shape sanity scenario adds:
- client_timestamp ≤ received_at + 5s budget
- both timestamps fall in [test_start - 10s, test_end + 10s]
- properties.device_id === identity_device_id column (M4.1 stamp
matches the request header)
Isolation scenario:
- drives a curate-tool-mode task and a query-tool-mode task in
sequence against the same daemon
- asserts no task_id bleed, task_type partitions cleanly along
task_id, both share device_id (single daemon → single config)
Cutover note: switched all e2e dispatches from 'curate-html-direct'
to 'curate-tool-mode' after the proj branch merge — TaskTypeSchema
only accepts the canonical name on the wire now. The legacy alias
remains in AnalyticsHook's `toAnalyticsTaskType` and is exercised
by the integration test directly, not via the transport.
Suite runtime: ~5 min (7 scenarios × ~30s each), all green.
|
Claude finished @cuongdo-byterover's task in 5m 44s —— View job Claude PR Review — feat/ENG-2964 (M15.6 e2e + final-flush wiring)Overall score: 2 / 5 — The shutdown-flush restoration in FindingsBlockers (must fix before merge):
Substantive:
Minor / nitpicks:
What looks good
Verification suggestionBefore merging, please run locally with the two task-type edits applied: npm run build && npx mocha --forbid-only "test/integration/infra/process/analytics-hook-lifecycle-wiring.test.ts"
# expected: 7/7 passing
npm run test:e2e:lifecycle -- --grep "P0 — curate"
# expected: P0 curate-tool-mode case asserts task_created + task_failed· branch |
… tests After the proj/analytics-system-tool-mode merge, TaskTypeSchema only accepts the canonical 'curate-tool-mode' over the wire — ENG-2925's rename fully landed and 'curate-html-direct' is no longer enumerated on `TaskCreateRequest` / `TaskExecute`. Tests that dispatch through TaskRouter (the integration lifecycle-wiring suite + the e2e wire capture suite) were rejected at the Zod boundary, never reached AnalyticsHook, and asserted on undefined emits. The merge resolution I picked for the integration test was wrong — I kept the alias in HEAD on the assumption it exercised the alias translation, but the transport boundary now strips it long before AnalyticsHook gets a chance to translate. The AnalyticsHook unit tests (analytics-hook-m14.test.ts, et al) still exercise the alias path by constructing TaskInfo directly and invoking `hook.onTaskCreate(task)` — those bypass TaskRouter, so 'curate-html-direct' remains a valid test input there and the `toAnalyticsTaskType` aliasing is still covered.
|
Claude finished @cuongdo-byterover's task in 2m 59s —— View job Claude PR Review — feat/ENG-2964 (re-review after
|
| # | Status | Notes |
|---|---|---|
| 1 | ✅ Fixed | analytics-hook-lifecycle-wiring.test.ts:225 now uses type: 'curate-tool-mode' (commit 3f42369). |
| 2 | ✅ Fixed | lifecycle-wire.e2e.ts:296 now uses 'curate-tool-mode'; the describe label and inline comment were also updated to drop the misleading "alias-translated" framing. |
Substantive (still open):
src/server/infra/daemon/brv-server.ts:704-713— repeat from previous round, not addressed in3f42369. The comment still claimsstart()is called "AFTER the first track lands so the initial 30s window aligns with real traffic", butAnalyticsFlushScheduler.start()armssetTimeout(...nextIntervalMs())immediately (analytics-flush-scheduler.ts:205-209). There is nonotifyPushed-gated deferral. The empty-pending guard (tryFlushat:292) means an empty first tick is a cheap no-op — so this isn't a correctness bug — but the comment is stale and misleading. Suggested rewording posted as inline comment. (inline)
Minor (unchanged, low priority — not blocking merge):
test/e2e/analytics/lifecycle-db.e2e.ts:164—taskIdLikestill interpolated into raw SQL. Safe today (programmatic ids), but the prior review'spsql -v :'task_id'suggestion still applies. Worth a follow-up, not a blocker.test/e2e/analytics/lifecycle-wire.e2e.ts:240-241,292,326,...andlifecycle-db.e2e.ts:365,534,580-581,...— task ids are stille2e-...-${Date.now()}.TaskCreateRequestSchema.taskIdisz.string().uuid()(schemas.ts:788). Not validated at runtime today, but a latent break if anyone wires that validator at the transport boundary.crypto.randomUUID()is a one-liner fix.test/e2e/analytics/lifecycle-wire.e2e.ts:240-241,335-336—runBrvpopulatesreasonon failure, but.to.deep.include({ok: true})drops it. Switch toexpect(r.ok, r.reason).to.equal(true)for usable diagnostics.
What's good
- The merge-resolution regression is correctly identified and fixed; commit message explains exactly what happened and why alias coverage still lives at the unit layer (
analytics-hook-m14.test.ts:72,88). - DB e2e (
lifecycle-db.e2e.ts) is well-architected: three skip gates with actionable repair instructions,assertRowShapefactored out, three deep-check scenarios (curate roundtrip, wire-shape, two-task isolation) that go meaningfully beyonddev-beta.e2e.ts. - Wire e2e (
lifecycle-wire.e2e.ts) uses an in-process capture stub at a random localhost port — clean, hermetic, no flake-prone network dependencies. package.jsonaddstest:e2e:lifecycle(180s) +test:e2e:db(300s) with sensible timeouts and the samenpm run build && mocha …chain as the existing analytics e2e.brv-server.tsfinal-flush wiring is sound —stop()first, thenflushFinal({timeoutMs: 3000}). Hook gate at:698-700preserves PR Feat/eng 2964 #722's defensive throw.
Verification suggestion before merge
npm run build && npx mocha --forbid-only "test/integration/infra/process/analytics-hook-lifecycle-wiring.test.ts"
# expected: all cases passing now that the transport accepts the type
npm run test:e2e:lifecycle -- --grep "P0 — curate"
# expected: task_created + task_failed{failure_kind=cancelled,task_type=curate-tool-mode} land via the capture stub· branch feat/ENG-2964 ·
2f418cc
into
proj/analytics-system-tool-mode
Summary
Type of change
Scope (select all touched areas)
Linked issues
Root cause (bug fixes only, otherwise write
N/A)Test plan
User-visible changes
List user-visible changes (including defaults, config, or CLI output).
If none, write
None.Evidence
Attach at least one:
Checklist
npm test)npm run lint)npm run typecheck)npm run build)mainRisks and mitigations
List real risks for this PR. If none, write
None.