Skip to content

Feat/eng 2964#722

Merged
bao-byterover merged 8 commits into
proj/analytics-system-tool-modefrom
feat/ENG-2964
May 27, 2026
Merged

Feat/eng 2964#722
bao-byterover merged 8 commits into
proj/analytics-system-tool-modefrom
feat/ENG-2964

Conversation

@cuongdo-byterover
Copy link
Copy Markdown
Collaborator

Summary

  • Problem:
  • Why it matters:
  • What changed:
  • What did NOT change (scope boundary):

Type of change

  • Bug fix
  • New feature
  • Refactor (no behavior change)
  • Documentation
  • Test
  • Chore (build, dependencies, CI)

Scope (select all touched areas)

  • TUI / REPL
  • Agent / Tools
  • LLM Providers
  • Server / Daemon
  • Shared (constants, types, transport events)
  • CLI Commands (oclif)
  • Hub / Connectors
  • Cloud Sync
  • CI/CD / Infra

Linked issues

  • Closes #
  • Related #

Root cause (bug fixes only, otherwise write N/A)

  • Root cause:
  • Why this was not caught earlier:

Test plan

  • Coverage added:
    • Unit test
    • Integration test
    • Manual verification only
  • Test file(s):
  • Key scenario(s) covered:

User-visible changes

List user-visible changes (including defaults, config, or CLI output).
If none, write None.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording

Checklist

  • Tests added or updated and passing (npm test)
  • Lint passes (npm run lint)
  • Type check passes (npm run typecheck)
  • Build succeeds (npm run build)
  • Commits follow Conventional Commits format
  • Documentation updated (if applicable)
  • No breaking changes (or clearly documented above)
  • Branch is up to date with main

Risks and mitigations

List real risks for this PR. If none, write None.

  • Risk:
    • Mitigation:

…sk types

Adds curate-tool-mode, query-tool-mode, dream-scan, dream-finalize to
TaskTypes + TASK_TYPE_VALUES. After ENG-2925's rename the daemon
dispatches these four types but the analytics enum still predates v4.0
— task_created / task_completed / task_failed silently rejected every
tool-mode emit at the Zod boundary.

- TaskTypes keeps the legacy 'curate' / 'query' / 'dream' values for
  back-compat with any constructor still building those payloads
- per-event schemas (task_created / task_completed / task_failed) pick
  up the new types automatically via z.enum(TASK_TYPE_VALUES)
- M12 per-flavor schemas (curate_run_completed / query_completed) still
  hardcode their own literals and continue to reject tool-mode types
  here — M14.2 migrates them to the canonical enum as a follow-up

TDD:
- new task-types tests assert TaskCreated/Completed/Failed accept all
  four new types
- regression tests pin the M12 schemas' continued rejection so M14.2
  has a clear flip point
…sk_type to TASK_TYPE_VALUES

Migrates the two M12 per-flavor schemas from hardcoded literal task_type
values to the canonical TASK_TYPE_VALUES enum so v4.0 tool-mode tasks
round-trip the wire boundary instead of being silently Zod-rejected
inside AnalyticsHook.

- curate_run_completed: z.enum(['curate', 'curate-folder']) →
  z.enum(TASK_TYPE_VALUES). curate-tool-mode payloads now validate.
- query_completed: z.literal('query') → z.enum(TASK_TYPE_VALUES).
  query-tool-mode payloads now validate.

The schemas no longer structurally constrain task_type to the curate
or query family; the hook is trusted to only emit each event for the
right flavor. Docblocks call out the widening for the next maintainer.

TDD:
- curate-run-completed.test asserts curate-tool-mode + legacy values
  both succeed; an unknown task_type still rejects
- query-completed.test mirrors the same coverage for query-tool-mode
- task-types.test M14.1 regression assertions flipped from rejection
  to acceptance for the M12 schemas; M14.1 docblock comments updated

Without this, M14.3's hook code would land but the M12 emits that fire
alongside the new generic task_* emits would silently disappear on
every tool-mode task — that's the bug operators noticed in Mixpanel.
…ed in AnalyticsHook

Adds the three generic funnel-event emits described in M14's customer
ask. Every daemon task (curate / curate-folder / curate-tool-mode
(aliased from curate-html-direct) / query / query-tool-mode /
dream-scan / dream-finalize / search) now produces, in order:

  task_created      onTaskCreate (funnel entry — unconditional)
  ...optional per-op + M12 per-flavor terminal emit
  task_completed    onTaskCompleted (terminal-event-last on success)
  task_failed       onTaskError / onTaskCancelled (terminal-event-last)

Three coupled changes shipped together so the wire stays consistent:

1. analytics-hook.ts grows three emits and a `toAnalyticsTaskType`
   alias-translator. The daemon still dispatches the pre-ENG-2925 name
   `curate-html-direct`; analytics canonicalises it to the post-rename
   `curate-tool-mode` so the wire enum matches TASK_TYPE_VALUES. Once
   ENG-2925 lands, the alias becomes a no-op identity.
2. CURATE_TASK_TYPES + QUERY_TASK_TYPES gain the tool-mode names so M12
   per-flavor state init kicks in for tool-mode curates / queries. M12
   counters stay all-zero on tool-mode today (no LLM tool calls fire) —
   that's a separate follow-up (FU-1 in plans/analytics-m14/follow-ups).
3. Both M12 payload builders route `task_type` through the same alias
   so curate-tool-mode tasks emit `task_type='curate-tool-mode'` on the
   curate_run_completed / query_completed events too.

TDD:
- new analytics-hook-m14.test.ts: 15 simulation tests covering every
  task type, the curate-html-direct → curate-tool-mode alias, has_files
  / has_folder semantics, both terminal paths, and the ordering invariant
- existing analytics-hook.test.ts updated to filter out the new generic
  emits via `filterM12()` so M12-focused assertions keep their intent
- integration stress tests updated for the new 13-event sequence
  (task_created → ops → curate_run_completed → task_completed)
- repo full suite: 9121 passing, 0 failing
…ords/tags, structured related_paths

Tightens the curate / query M12 payloads per code review:

curate_operation_applied:
- rename absolute_path → relative_path (project-relative via path.relative
  against the task's projectPath; falls back to identity when projectPath
  is unset so search tasks at the daemon root still emit a usable string)
- keywords / tags promoted from optional to required arrays (default [])
  so the wire shape stays uniform regardless of frontmatter read success

query_completed.read_paths_with_metadata:
- same absolute_path → relative_path rename
- same keywords / tags promotion (required arrays, default [])
- flat related: string[] → structured related_paths: [{relative_path,
  keywords, tags}] so each linked topic carries its own metadata slot.
  Keywords/tags default to [] until a future FU cascade-reads each
  linked file's frontmatter — the wire shape doesn't need to change
  when that lands.

Hook implementation:
- new `toRelativePath(filePath, projectPath)` helper using node:path.relative
- CurateTaskAnalyticsState stores projectPath captured at onTaskCreate so
  per-op emits can relativize without threading task through processToolResult
- all four payload sites (curate-op, curate-run-completed, query-completed
  read_paths_with_metadata) route through the helper

Inspection test added at test/unit/.../analytics-hook-toolmode-inspection
that pretty-prints every event + payload for curate-tool-mode / query
/ query-tool-mode flows — gives PMs a single place to verify the wire
shape end-to-end.

Privacy win: relative paths drop the /Users/{name} prefix from every
file-touched event, keeping host-identifiable PII off the analytics
wire while still letting PMs reason about which file inside a project
an operation touched.

Tests: full repo 9132 passing.
…re_kind on task_failed

Delivers M15.6's customer-facing piece: the actual wiring of AnalyticsHook
into the daemon's lifecycleHooks[] (which M14.3 built emit logic for but
never plumbed) plus the coarse failure_kind classifier on task_failed.

Per the M14 / M15.6 alignment (Option A): M14.1-M14.3 ship as the
foundation. M15.6 adds the wiring + failure_kind. M15.6's stated
"scaffolded only" stance on task_created + per-flavor schemas is
intentionally relaxed — M14 already produces those events and the
review-tightening on curate_operation_applied / read_paths_with_metadata
delivers PM-visible value that the M15.6 description deferred.

Changes:

src/server/infra/daemon/brv-server.ts:
- import AnalyticsHook
- construct it BEFORE TransportHandlers so it can land in lifecycleHooks[]
  alongside curate-log / query-log / task-history (now 4 peers, not 3)
- the isAnalyticsEnabled gate from setupFeatureHandlers binds in via a
  closure ref that defers lookup until emit-time
- capture setupFeatureHandlers's return value (was discarded), then
  setAnalyticsClient(analyticsClient) on the pre-registered hook

src/shared/analytics/events/task-failed.ts:
- new FailureKindValues + FailureKind type: 'cancelled' | 'timeout' |
  'agent_error' | 'unknown' (coarse vocab, ≤64 chars, never raw
  error.message)
- failure_kind added to TaskFailedSchema as a required field
- docblock notes the M15.6 privacy contract

src/server/infra/process/analytics-hook.ts:
- new classifyFailureKind(errorMessage) helper. Substring sentinels
  for 'timeout' / 'timed out' / 'deadline exceeded' → 'timeout';
  'agent' / 'llm' / 'provider' / 'tool' → 'agent_error'; default
  'unknown'. Raw error string NEVER ends up on the wire.
- emitTaskFailed now takes a FailureKind argument
- onTaskCancelled → emitTaskFailed(..., 'cancelled')
- onTaskError → emitTaskFailed(..., classifyFailureKind(errorMessage))

TDD:
- task-failed.test: 3 new tests pin the failure_kind shape (accepts the
  4 canonical values, rejects out-of-vocab, rejects missing)
- analytics-hook-m14.test: 3 new tests verify the classifier paths land
  on the wire — 'kaboom' → 'unknown', timeout strings → 'timeout',
  agent strings → 'agent_error'
- new analytics-hook-lifecycle-wiring.test integration covers the
  end-to-end task:create → analyticsHook.onTaskCreate path through a
  real TaskRouter, plus all four task-type variants going through
  task_created → task_completed in sequence

Full suite: 9145 passing, 0 failing. Lint clean.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Claude finished @cuongdo-byterover's task in 5m 26s —— View job


Claude Code PR Review

  • Load review tools
  • Examine PR diff against origin/proj/analytics-system-tool-mode
  • Review analytics-hook.ts changes (largest delta)
  • Review event schemas (task-failed, query-completed, curate-*)
  • Review test coverage (new and modified tests)
  • Post inline comments for specific issues
  • Post final summary with score

Summary

This PR wires AnalyticsHook as the 4th lifecycle peer in TaskRouter, adds failure_kind to task_failed, extends TASK_TYPE_VALUES to cover the v4.0 tool-mode types (curate-tool-mode, query-tool-mode, dream-scan, dream-finalize), and tightens the M14-review wire shape (absolute_pathrelative_path, required keywords / tags arrays, structured related_paths).

The implementation is solid: the per-task FIFO queue logic in onToolResult is well-reasoned, the late-bind pattern in brv-server.ts for setAnalyticsClient / isAnalyticsEnabled is clean given the boot ordering, and the test coverage is genuinely good — analytics-hook-m14.test.ts and analytics-hook-lifecycle-wiring.test.ts cover the cross-product of (task type × outcome) thoroughly, including the daemon→wire alias for curate-html-directcurate-tool-mode.

Findings worth addressing before merge:

# Severity Finding
1 issue (privacy) toRelativePath doesn't guard against .. parent traversal when the read file is outside projectPath. Symlinks, linked sources (brv source add), and unset projectPath all produce wire strings that still leak the absolute filesystem layout.
2 suggestion classifyFailureKind uses raw .includes() for 'tool' / 'agent' / 'llm' / 'provider'. Matches 'tooltip', 'engagement', etc. → false agent_error classifications. Recommend word-boundary regex + a comment documenting the timeout > agent > unknown precedence.
3 issue (test hygiene) analytics-hook-toolmode-inspection.test.ts writes every emitted event to console.log via dumpEvents. Matches test/**/*.test.ts so it spams every npm test. CLAUDE.md explicitly calls this out.
4 nitpick (type safety) toAnalyticsTaskType returns daemonType as TaskType unchecked. A future daemon type added without updating TASK_TYPE_VALUES silently emits events the backend zod-rejects, with no daemon-side signal.
5 nitpick (forward-compat) The "counters all-zero today" assertions in analytics-hook-toolmode-inspection.test.ts pin a known limitation as a positive expectation — these will read like a regression when FU-1 lands.
6 question (consistency) curate_operation_applied.related stayed as flat string[] while query_completed.read_paths_with_metadata[].related_paths was tightened to structured objects. Asymmetry is fine if deliberate, but a TODO / tracking link would prevent the shape ossifying in two idioms.
7 praise The lifecycleHooks[] late-bind pattern in brv-server.ts (register hook early, inject client after setupFeatureHandlers resolves) is well-shaped. One tiny defensive suggestion left inline about isEnabled defaulting to false.

None of these are merge-blockers individually — #1 is the one I'd most want to fix on this PR (it's a privacy regression masquerading as a no-op), and #3 will quietly degrade test ergonomics for everyone if it isn't caught now.

The schema additions, test coverage, async ordering invariants, and the boot-time wire-up all look correct. The PII surface area is the main thing to harden.

Overall Score: 4 / 5

Strong, well-tested implementation; the privacy guard on toRelativePath and the test-output cleanup hold it back from a 5.
· Branch feat/ENG-2964

Comment thread src/server/infra/process/analytics-hook.ts
Comment thread src/server/infra/process/analytics-hook.ts
Comment thread src/server/infra/process/analytics-hook.ts Outdated
Comment thread src/shared/analytics/events/curate-operation-applied.ts
Comment thread src/server/infra/daemon/brv-server.ts
#1 classifyFailureKind — word-boundary regex + pinned precedence:
  Substring matching let 'tooltip' / 'engagement' / 'urgent' bucket into
  agent_error. Tightened to /\b(agent|llm|provider|tool)\b/ and
  /\b(timeout|timed out|deadline exceeded)\b/. Docblock pins precedence
  (timeout > agent_error > unknown) so future if-order shuffles can't
  silently rebucket the funnel.

#2 toRelativePath outside-project guard (privacy bug):
  path.relative('/proj','/Users/dev/other/x.md') returned
  '../../Users/dev/other/x.md' — still encoded the host layout. Same
  hole when projectPath was undefined (fell back to raw absolute path).
  Now returns '<outside-project>/<basename>' in both cases; preserves
  enough signal for backend grouping without becoming PII. Same guard
  also runs against absolute-path tails (Windows drive-letter switches).

#3 toAnalyticsTaskType drift guard:
  Replaced `daemonType as TaskType` (per CLAUDE.md anti-cast rule) with
  a TASK_TYPE_SET membership check + processLog warning + 'unknown'
  sentinel fallback. A future un-enumerated dispatch now lands a
  debuggable warning at the daemon instead of disappearing at the
  backend Zod check.

#4 dumpEvents test pollution:
  Gated behind DUMP_ANALYTICS=1. `npm test` runs the shape assertions
  silently; opt-in dump still works via
  `DUMP_ANALYTICS=1 npx mocha test/unit/.../analytics-hook-toolmode-inspection.test.ts`.

#5 FU-1 forward-compat comment:
  Added a block-comment note above the tool-mode "operations_*: 0"
  assertions so when FU-1 lands and the counters flip non-zero, the
  failure reads as "feature, update the test" rather than a regression.

#6 curate_operation_applied.related asymmetry:
  Added a TODO(M15.x) marker on the `related` field flagging the
  wire-shape asymmetry with `query_completed.read_paths_with_metadata
  .related_paths` (structured). Restructure is consumer-migration
  territory — own ticket.

#7 brv-server.ts loud-fail assertion:
  Added an explicit throw if setupFeatureHandlers returns without
  analyticsClient. Future refactors that drop the field will explode
  here instead of silently no-op'ing every emit forever.

Tests added to lock in #1, #2, #3 behavior so the review intent
doesn't drift. Existing stress-test fixtures moved from `/A/op-N.md`
under-the-root paths to `/proj/A/op-N.md` so they exercise the
in-project case (out-of-project is now its own focused test).

Full suite: 9227 passing, 0 failing. Lint clean.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Claude finished @cuongdo-byterover's task in 3m 55s —— View job


Claude Code PR Review (re-review after fixups)

  • Load review tools
  • Diff feat/ENG-2964 against origin/proj/analytics-system-tool-mode
  • Verify the 7 previous review findings were addressed in commit 190f4b4
  • Spot-check analytics-hook.ts (toRelativePath, classifyFailureKind, toAnalyticsTaskType) and inspection test
  • Look for any new issues introduced by the fixup
  • Post final summary with updated score

What got addressed

# Original finding Status Note
1 classifyFailureKind substring matching ✅ Resolved Word-boundary regexes (/\b(timeout|timed out|deadline exceeded)\b/, /\b(agent|llm|provider|tool)\b/). Precedence pinned in the docblock. Two new tests in analytics-hook-m14.test.ts (tooltip/engagement → unknown; llm provider timeout → timeout).
2 toRelativePath PII leak on .. / unset projectPath ✅ Resolved Returns <outside-project>/<basename> for parent-traversal, absolute (Windows drive switch), and projectPath===undefined. Two new tests cover both branches. Stress-test fixtures rebased onto /proj/A/op-N.md so the in-project case is also exercised.
3 toAnalyticsTaskType unchecked widening ⚠️ Partially resolved — new issue, see below Drift logged + sentinel returned, but the sentinel 'unknown' isn't on the wire vocabulary.
4 Inspection test console.log noise ✅ Resolved Gated behind process.env.DUMP_ANALYTICS === '1'. npm test is quiet; the diagnostic is opt-in.
5 FU-1 forward-compat comment ✅ Resolved Block comment in analytics-hook-toolmode-inspection.test.ts:106-109 explicitly tells future readers "this flips to non-zero in FU-1 — feature, not regression."
6 curate_operation_applied.related shape asymmetry ✅ Resolved (TODO marker) TODO(M15.x) added at curate-operation-applied.ts:27-30 pointing at the harmonization ticket.
7 Loud-fail on missing analyticsClient ✅ Resolved brv-server.ts:686-688 throws explicitly. The first defensive note (default isAnalyticsEnabledRef to () => false) was not taken, but it's strictly defensive — the boot-time race is already documented.

One real new issue from the fixup ⚠️

toAnalyticsTaskType drift sentinel doesn't actually keep the event on the wiresrc/server/infra/process/analytics-hook.ts:31 + 47-52

const UNKNOWN_TASK_TYPE: TaskType = 'unknown' as TaskType

function toAnalyticsTaskType(daemonType: string): TaskType {
  if (daemonType === 'curate-html-direct') return TaskTypes.CURATE_TOOL_MODE
  if (ANALYTICS_TASK_TYPE_SET.has(daemonType)) return daemonType as TaskType
  processLog(`AnalyticsHook: unknown task type '${daemonType}' — falling back to '${UNKNOWN_TASK_TYPE}'`)
  return UNKNOWN_TASK_TYPE
}

The docblock promises this "keeps the event on the wire instead of silently failing the Zod check at the backend." It doesn't — 'unknown' is not in TASK_TYPE_VALUES:

  • src/shared/analytics/task-types.ts:31-41TASK_TYPE_VALUES lists 9 types, none of them 'unknown'.
  • Every task_* schema validates via z.enum(TASK_TYPE_VALUES) — see task-failed.ts:38, task-created.ts:21, task-completed.ts:17, query-completed.ts:69, curate-run-completed.ts:30.
  • test/unit/shared/analytics/task-types.test.ts:96-107 is the smoking gun: the suite already asserts the schemas reject unknown task_types.

What actually happens at runtime: AnalyticsClient.track() (analytics-client.ts:230-249) and JsonlAnalyticsStore.append() don't run the per-event schema, so the event persists locally and the daemon logs nothing wrong. But the moment the batch ships, the backend Zod check fails the row — the same failure mode the guard claims to prevent, just with a noisier wire payload.

Secondary concern: 'unknown' as TaskType is the exact as Type widening CLAUDE.md forbids and that the original review #4 called out. The cast at the unsafe site moved to a cast on the constant.

Two ways to actually close the gap, pick one:

  1. Add 'unknown' to the canonical vocabularyTaskTypes.UNKNOWN: 'unknown' in TaskTypes, appended to TASK_TYPE_VALUES. The Zod schema then accepts the sentinel, the cast disappears, and the m14 test at line 269-275 holds for real.
  2. Don't emit the funnel event at all for unknown types — return undefined from toAnalyticsTaskType, have callers skip the emit. The log line still gives a daemon-side breadcrumb. Trade-off: an un-enumerated type vanishes from analytics entirely (no funnel row) instead of bucketing as unknown.

Option 1 matches the docblock's stated intent better and is the smaller change. Either way, this is the right PR to close the gap because the new m14 drift-guard test (analytics-hook-m14.test.ts:269-275) currently uses a Sinon stub that never runs Zod — a real integration would catch this.

Fix this →


Smaller observations (non-blocking)

  • src/server/infra/process/analytics-hook.ts:55-59 — the docblock for OUTSIDE_PROJECT_PATH has a leading-space indentation drift (lines 56-57 carry an extra space). Cosmetic.
  • src/server/infra/process/analytics-hook.ts:74basename is taken from the absolute path. On Windows running against a POSIX-style absolute string (or vice versa), the basename split won't honor the foreign separator. Realistically the daemon and the agent are on the same host so this never bites, but worth a thought if cross-platform mocking ever enters scope.
  • analytics-hook.test.ts was substantially reworked (+121/−81). The analytics-hook-async-stress.test.ts fixture rebase to /proj/... is the right move — it isolates the outside-project case to its own focused test instead of conflating it with serialization invariants.

Praise (post-fix)

The m14.test.ts additions are the right shape: every new branch (word-boundary classifier, outside-project guard, drift-guard sentinel, classifier precedence) gets a tight named test with a (PR #722) tag in the description. That makes review-comment provenance auditable from the test suite years from now — quietly excellent practice.


Overall Score: 4 / 5

The privacy fix, classifier hardening, test hygiene gating, and loud-fail assertion all landed clean and the new test coverage is thorough. Holding at 4 because the drift-guard sentinel (#3 in the previous review) didn't actually close the gap it claims to — 'unknown' isn't a valid wire value, so the backend still drops the row. Adding TaskTypes.UNKNOWN to the canonical vocabulary closes the loop and gets this to 5.
· Branch feat/ENG-2964

#722 re-review)

Re-review caught a real bug in the previous fixup: `toAnalyticsTaskType`
returned `'unknown' as TaskType` for un-enumerated daemon types, but
`'unknown'` was NOT in `TASK_TYPE_VALUES`. The docblock promised "keeps
the event on the wire instead of silently failing the Zod check at the
backend" — but the wire-side `z.enum(TASK_TYPE_VALUES)` actually still
rejected the row. The local Sinon trackStub doesn't run Zod, which is
why the m14 test passed even though the runtime path was broken.

Also the `as TaskType` cast moved sites but didn't disappear — the
exact anti-pattern the original review flagged.

Fix:
- Add `TaskTypes.UNKNOWN: 'unknown'` to the canonical vocabulary.
- Append `TaskTypes.UNKNOWN` to `TASK_TYPE_VALUES` so every task_*
  schema accepts the sentinel via `z.enum(TASK_TYPE_VALUES)`.
- Introduce `isCanonicalTaskType()` predicate so `toAnalyticsTaskType`
  narrows without the `as TaskType` cast. Sentinel returned as
  `TaskTypes.UNKNOWN` directly.
- Update `task-types.test` to assert the new `UNKNOWN` key/value pair.
  Existing negative test for an unknown task_type already uses
  `'not-a-real-type'` (not `'unknown'`), so it stays green.

Cosmetic: dropped the extra-space drift in the OUTSIDE_PROJECT_PATH
docblock noted in the same re-review.

Tests: 9227 passing, 0 failing.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Claude finished @cuongdo-byterover's task in 2m 19s —— View job


Claude Code PR Review (3rd pass — drift-sentinel fix)

  • Load review tools
  • Diff current HEAD against origin/proj/analytics-system-tool-mode (focus on commit 8d28f4b)
  • Verify drift sentinel (TaskTypes.UNKNOWN) is now on the wire vocabulary
  • Re-check analytics-hook.ts, task-types.ts, and the m14 drift-guard test
  • Look for any new issues introduced by the fixup
  • Post final summary with updated score

What landed in 8d28f4b

# Previous-review issue Status Evidence
3 (re-review) Drift sentinel 'unknown' returned but not in TASK_TYPE_VALUES → backend Zod still drops the row Resolved TaskTypes.UNKNOWN: 'unknown' added at src/shared/analytics/task-types.ts:30; appended to TASK_TYPE_VALUES at :50. Every task_* schema validates via z.enum(TASK_TYPE_VALUES) (task-created.ts:21, task-completed.ts:17, task-failed.ts:38, query-completed.ts:69, curate-run-completed.ts:30) — the sentinel now actually rides the wire.
3 (orig) daemonType as TaskType cast Resolved Replaced with a isCanonicalTaskType(value): value is TaskType user-defined type guard at analytics-hook.ts:32. The unsafe cast at the call site is gone. The cast moved to one tightly scoped (ANALYTICS_TASK_TYPE_SET as Set<string>).has(value) inside the predicate, which is the standard pattern for widening a Set<T>.has argument.
OUTSIDE_PROJECT_PATH docblock cosmetic drift ✅ Resolved Leading-space drift on lines 56–57 cleaned up; cosmetic only.
Default isAnalyticsEnabledRef to () => false (optional defensive nit) ⚪ Not taken brv-server.ts:378 still defaults to () => true. Acknowledged in the previous review as "strictly defensive — boot-time race is already documented." Fine to leave.

Verification chain (why this is genuinely closed, not just moved)

  1. TaskTypes.UNKNOWN is now part of Object.values(TaskTypes).
  2. TASK_TYPE_VALUES is asserted-equal to Object.values(TaskTypes) by the invariant test at test/unit/shared/analytics/task-types.test.ts:51-53 — so any future drift between the const-object and the tuple fails CI.
  3. All five task_* schemas validate task_type against z.enum(TASK_TYPE_VALUES) (grep confirmed). The wire-side check now accepts 'unknown' as a legitimate row value rather than dropping it.
  4. The negative test "rejects an unknown task_type" (task-types.test.ts:102-113) uses 'not-a-real-type', not 'unknown', so it's untouched by adding the sentinel.
  5. The m14 drift-guard test (analytics-hook-m14.test.ts:268-275) still uses a Sinon stub (no live Zod), but the invariant in step 2 covers the gap the previous review flagged — a regression where TaskTypes.UNKNOWN gets added without TASK_TYPE_VALUES being updated would fail at task-types.test.ts:52, before the analytics-hook layer ever runs.

One small forward-looking observation (non-blocking)

isCanonicalTaskType performs membership against Object.values(TaskTypes). That's a strict superset of TASK_TYPE_VALUES only if those two drift — but the invariant test from step 2 above pins them equal, so this is currently safe. If a future contributor finds a reason to maintain those independently (e.g. internal-only TaskTypes keys not on the wire), the predicate would lie. The TASK_TYPE_VALUES-equality test is the load-bearing invariant; worth keeping in mind for any future refactor.

Praise

The fix is exactly the right shape: smallest change that closes the real gap (TaskTypes.UNKNOWN lives in the canonical vocabulary, predicate eliminates the cast). The commit message accurately diagnoses why the previous fixup didn't work (Sinon stub doesn't run Zod) and explains what changed. That diagnosis is more valuable than the diff itself — it's the kind of failure analysis that prevents the same class of bug in a year.


Overall Score: 5 / 5

The drift-sentinel gap is genuinely closed — 'unknown' is on the wire vocabulary, every task_* schema accepts it via z.enum(TASK_TYPE_VALUES), and the invariant test at task-types.test.ts:51-53 is the load-bearing safety net for future schema/vocabulary drift. The unchecked as TaskType widening that CLAUDE.md forbids is gone. Privacy guard (toRelativePath), classifier hardening, test hygiene, and the loud-fail assertion from the prior pass all remain intact. Ready to merge.
· Branch feat/ENG-2964

@bao-byterover bao-byterover merged commit f7c691b into proj/analytics-system-tool-mode May 27, 2026
7 checks passed
@bao-byterover bao-byterover deleted the feat/ENG-2964 branch May 27, 2026 14:34
@github-actions github-actions Bot mentioned this pull request May 28, 2026
29 tasks
bao-byterover pushed a commit that referenced this pull request May 28, 2026
* feat: [ENG-2969] M14.1 extend TASK_TYPE_VALUES with v4.0 tool-mode task types

Adds curate-tool-mode, query-tool-mode, dream-scan, dream-finalize to
TaskTypes + TASK_TYPE_VALUES. After ENG-2925's rename the daemon
dispatches these four types but the analytics enum still predates v4.0
— task_created / task_completed / task_failed silently rejected every
tool-mode emit at the Zod boundary.

- TaskTypes keeps the legacy 'curate' / 'query' / 'dream' values for
  back-compat with any constructor still building those payloads
- per-event schemas (task_created / task_completed / task_failed) pick
  up the new types automatically via z.enum(TASK_TYPE_VALUES)
- M12 per-flavor schemas (curate_run_completed / query_completed) still
  hardcode their own literals and continue to reject tool-mode types
  here — M14.2 migrates them to the canonical enum as a follow-up

TDD:
- new task-types tests assert TaskCreated/Completed/Failed accept all
  four new types
- regression tests pin the M12 schemas' continued rejection so M14.2
  has a clear flip point

* fix: [ENG-2970] M14.2 relax curate_run_completed + query_completed task_type to TASK_TYPE_VALUES

Migrates the two M12 per-flavor schemas from hardcoded literal task_type
values to the canonical TASK_TYPE_VALUES enum so v4.0 tool-mode tasks
round-trip the wire boundary instead of being silently Zod-rejected
inside AnalyticsHook.

- curate_run_completed: z.enum(['curate', 'curate-folder']) →
  z.enum(TASK_TYPE_VALUES). curate-tool-mode payloads now validate.
- query_completed: z.literal('query') → z.enum(TASK_TYPE_VALUES).
  query-tool-mode payloads now validate.

The schemas no longer structurally constrain task_type to the curate
or query family; the hook is trusted to only emit each event for the
right flavor. Docblocks call out the widening for the next maintainer.

TDD:
- curate-run-completed.test asserts curate-tool-mode + legacy values
  both succeed; an unknown task_type still rejects
- query-completed.test mirrors the same coverage for query-tool-mode
- task-types.test M14.1 regression assertions flipped from rejection
  to acceptance for the M12 schemas; M14.1 docblock comments updated

Without this, M14.3's hook code would land but the M12 emits that fire
alongside the new generic task_* emits would silently disappear on
every tool-mode task — that's the bug operators noticed in Mixpanel.

* feat: [ENG-2971] M14.3 wire task_created + task_completed + task_failed in AnalyticsHook

Adds the three generic funnel-event emits described in M14's customer
ask. Every daemon task (curate / curate-folder / curate-tool-mode
(aliased from curate-html-direct) / query / query-tool-mode /
dream-scan / dream-finalize / search) now produces, in order:

  task_created      onTaskCreate (funnel entry — unconditional)
  ...optional per-op + M12 per-flavor terminal emit
  task_completed    onTaskCompleted (terminal-event-last on success)
  task_failed       onTaskError / onTaskCancelled (terminal-event-last)

Three coupled changes shipped together so the wire stays consistent:

1. analytics-hook.ts grows three emits and a `toAnalyticsTaskType`
   alias-translator. The daemon still dispatches the pre-ENG-2925 name
   `curate-html-direct`; analytics canonicalises it to the post-rename
   `curate-tool-mode` so the wire enum matches TASK_TYPE_VALUES. Once
   ENG-2925 lands, the alias becomes a no-op identity.
2. CURATE_TASK_TYPES + QUERY_TASK_TYPES gain the tool-mode names so M12
   per-flavor state init kicks in for tool-mode curates / queries. M12
   counters stay all-zero on tool-mode today (no LLM tool calls fire) —
   that's a separate follow-up (FU-1 in plans/analytics-m14/follow-ups).
3. Both M12 payload builders route `task_type` through the same alias
   so curate-tool-mode tasks emit `task_type='curate-tool-mode'` on the
   curate_run_completed / query_completed events too.

TDD:
- new analytics-hook-m14.test.ts: 15 simulation tests covering every
  task type, the curate-html-direct → curate-tool-mode alias, has_files
  / has_folder semantics, both terminal paths, and the ordering invariant
- existing analytics-hook.test.ts updated to filter out the new generic
  emits via `filterM12()` so M12-focused assertions keep their intent
- integration stress tests updated for the new 13-event sequence
  (task_created → ops → curate_run_completed → task_completed)
- repo full suite: 9121 passing, 0 failing

* refactor: [ENG-2971] M14.3 review fix — relative paths, required keywords/tags, structured related_paths

Tightens the curate / query M12 payloads per code review:

curate_operation_applied:
- rename absolute_path → relative_path (project-relative via path.relative
  against the task's projectPath; falls back to identity when projectPath
  is unset so search tasks at the daemon root still emit a usable string)
- keywords / tags promoted from optional to required arrays (default [])
  so the wire shape stays uniform regardless of frontmatter read success

query_completed.read_paths_with_metadata:
- same absolute_path → relative_path rename
- same keywords / tags promotion (required arrays, default [])
- flat related: string[] → structured related_paths: [{relative_path,
  keywords, tags}] so each linked topic carries its own metadata slot.
  Keywords/tags default to [] until a future FU cascade-reads each
  linked file's frontmatter — the wire shape doesn't need to change
  when that lands.

Hook implementation:
- new `toRelativePath(filePath, projectPath)` helper using node:path.relative
- CurateTaskAnalyticsState stores projectPath captured at onTaskCreate so
  per-op emits can relativize without threading task through processToolResult
- all four payload sites (curate-op, curate-run-completed, query-completed
  read_paths_with_metadata) route through the helper

Inspection test added at test/unit/.../analytics-hook-toolmode-inspection
that pretty-prints every event + payload for curate-tool-mode / query
/ query-tool-mode flows — gives PMs a single place to verify the wire
shape end-to-end.

Privacy win: relative paths drop the /Users/{name} prefix from every
file-touched event, keeping host-identifiable PII off the analytics
wire while still letting PMs reason about which file inside a project
an operation touched.

Tests: full repo 9132 passing.

* feat: [ENG-2964] M15.6 wire AnalyticsHook into lifecycleHooks + failure_kind on task_failed

Delivers M15.6's customer-facing piece: the actual wiring of AnalyticsHook
into the daemon's lifecycleHooks[] (which M14.3 built emit logic for but
never plumbed) plus the coarse failure_kind classifier on task_failed.

Per the M14 / M15.6 alignment (Option A): M14.1-M14.3 ship as the
foundation. M15.6 adds the wiring + failure_kind. M15.6's stated
"scaffolded only" stance on task_created + per-flavor schemas is
intentionally relaxed — M14 already produces those events and the
review-tightening on curate_operation_applied / read_paths_with_metadata
delivers PM-visible value that the M15.6 description deferred.

Changes:

src/server/infra/daemon/brv-server.ts:
- import AnalyticsHook
- construct it BEFORE TransportHandlers so it can land in lifecycleHooks[]
  alongside curate-log / query-log / task-history (now 4 peers, not 3)
- the isAnalyticsEnabled gate from setupFeatureHandlers binds in via a
  closure ref that defers lookup until emit-time
- capture setupFeatureHandlers's return value (was discarded), then
  setAnalyticsClient(analyticsClient) on the pre-registered hook

src/shared/analytics/events/task-failed.ts:
- new FailureKindValues + FailureKind type: 'cancelled' | 'timeout' |
  'agent_error' | 'unknown' (coarse vocab, ≤64 chars, never raw
  error.message)
- failure_kind added to TaskFailedSchema as a required field
- docblock notes the M15.6 privacy contract

src/server/infra/process/analytics-hook.ts:
- new classifyFailureKind(errorMessage) helper. Substring sentinels
  for 'timeout' / 'timed out' / 'deadline exceeded' → 'timeout';
  'agent' / 'llm' / 'provider' / 'tool' → 'agent_error'; default
  'unknown'. Raw error string NEVER ends up on the wire.
- emitTaskFailed now takes a FailureKind argument
- onTaskCancelled → emitTaskFailed(..., 'cancelled')
- onTaskError → emitTaskFailed(..., classifyFailureKind(errorMessage))

TDD:
- task-failed.test: 3 new tests pin the failure_kind shape (accepts the
  4 canonical values, rejects out-of-vocab, rejects missing)
- analytics-hook-m14.test: 3 new tests verify the classifier paths land
  on the wire — 'kaboom' → 'unknown', timeout strings → 'timeout',
  agent strings → 'agent_error'
- new analytics-hook-lifecycle-wiring.test integration covers the
  end-to-end task:create → analyticsHook.onTaskCreate path through a
  real TaskRouter, plus all four task-type variants going through
  task_created → task_completed in sequence

Full suite: 9145 passing, 0 failing. Lint clean.

* refactor: [ENG-2964] M15.6 address PR #722 review comments (7 of 7)

#1 classifyFailureKind — word-boundary regex + pinned precedence:
  Substring matching let 'tooltip' / 'engagement' / 'urgent' bucket into
  agent_error. Tightened to /\b(agent|llm|provider|tool)\b/ and
  /\b(timeout|timed out|deadline exceeded)\b/. Docblock pins precedence
  (timeout > agent_error > unknown) so future if-order shuffles can't
  silently rebucket the funnel.

#2 toRelativePath outside-project guard (privacy bug):
  path.relative('/proj','/Users/dev/other/x.md') returned
  '../../Users/dev/other/x.md' — still encoded the host layout. Same
  hole when projectPath was undefined (fell back to raw absolute path).
  Now returns '<outside-project>/<basename>' in both cases; preserves
  enough signal for backend grouping without becoming PII. Same guard
  also runs against absolute-path tails (Windows drive-letter switches).

#3 toAnalyticsTaskType drift guard:
  Replaced `daemonType as TaskType` (per CLAUDE.md anti-cast rule) with
  a TASK_TYPE_SET membership check + processLog warning + 'unknown'
  sentinel fallback. A future un-enumerated dispatch now lands a
  debuggable warning at the daemon instead of disappearing at the
  backend Zod check.

#4 dumpEvents test pollution:
  Gated behind DUMP_ANALYTICS=1. `npm test` runs the shape assertions
  silently; opt-in dump still works via
  `DUMP_ANALYTICS=1 npx mocha test/unit/.../analytics-hook-toolmode-inspection.test.ts`.

#5 FU-1 forward-compat comment:
  Added a block-comment note above the tool-mode "operations_*: 0"
  assertions so when FU-1 lands and the counters flip non-zero, the
  failure reads as "feature, update the test" rather than a regression.

#6 curate_operation_applied.related asymmetry:
  Added a TODO(M15.x) marker on the `related` field flagging the
  wire-shape asymmetry with `query_completed.read_paths_with_metadata
  .related_paths` (structured). Restructure is consumer-migration
  territory — own ticket.

#7 brv-server.ts loud-fail assertion:
  Added an explicit throw if setupFeatureHandlers returns without
  analyticsClient. Future refactors that drop the field will explode
  here instead of silently no-op'ing every emit forever.

Tests added to lock in #1, #2, #3 behavior so the review intent
doesn't drift. Existing stress-test fixtures moved from `/A/op-N.md`
under-the-root paths to `/proj/A/op-N.md` so they exercise the
in-project case (out-of-project is now its own focused test).

Full suite: 9227 passing, 0 failing. Lint clean.

* fix: [ENG-2964] M15.6 drift sentinel must be on the wire vocabulary (PR #722 re-review)

Re-review caught a real bug in the previous fixup: `toAnalyticsTaskType`
returned `'unknown' as TaskType` for un-enumerated daemon types, but
`'unknown'` was NOT in `TASK_TYPE_VALUES`. The docblock promised "keeps
the event on the wire instead of silently failing the Zod check at the
backend" — but the wire-side `z.enum(TASK_TYPE_VALUES)` actually still
rejected the row. The local Sinon trackStub doesn't run Zod, which is
why the m14 test passed even though the runtime path was broken.

Also the `as TaskType` cast moved sites but didn't disappear — the
exact anti-pattern the original review flagged.

Fix:
- Add `TaskTypes.UNKNOWN: 'unknown'` to the canonical vocabulary.
- Append `TaskTypes.UNKNOWN` to `TASK_TYPE_VALUES` so every task_*
  schema accepts the sentinel via `z.enum(TASK_TYPE_VALUES)`.
- Introduce `isCanonicalTaskType()` predicate so `toAnalyticsTaskType`
  narrows without the `as TaskType` cast. Sentinel returned as
  `TaskTypes.UNKNOWN` directly.
- Update `task-types.test` to assert the new `UNKNOWN` key/value pair.
  Existing negative test for an unknown task_type already uses
  `'not-a-real-type'` (not `'unknown'`), so it stays green.

Cosmetic: dropped the extra-space drift in the OUTSIDE_PROJECT_PATH
docblock noted in the same re-review.

Tests: 9227 passing, 0 failing.

* fix: [ENG-2964] M15.6 restore analyticsFlushScheduler.start() + Phase A e2e

E2E shake-down caught a regression my own M15.6 commit (6983e88)
introduced: when I refactored `setupFeatureHandlers`'s return capture
from destructured `{analyticsClient, analyticsFlushScheduler,
isAnalyticsEnabled}` to a bundled `featureHandlers` object, I dropped
the call to `analyticsFlushScheduler.start()` that lived right after
the destructure (originally landed in M4.3 / ENG-2645) AND lost the
`analyticsFinalFlush` shutdown hook bound to the ShutdownHandler.

Symptom: every `track()` call still landed in JSONL with status='pending',
but the periodic 30s flush tick never armed (start() un-called) and
shutdown never force-flushed (analyticsFinalFlush undefined). Events
sat on disk forever. Production-affecting; the existing
dev-beta.e2e.ts kept passing only because its CI run uses a real
backend that the periodic flush never reached.

Fix in brv-server.ts:
- Re-add `let analyticsFinalFlush: (() => Promise<void>) | undefined`
  before ShutdownHandler construction so the shutdown closure can
  reference it via late-bind (same mutable-holder pattern M4.3 used).
- Pass `analyticsFinalFlush: () => analyticsFinalFlush?.() ?? Promise.resolve()`
  to ShutdownHandler.
- After the M15.6 late-bind of setAnalyticsClient, call
  `featureHandlers.analyticsFlushScheduler.start()` and assign the
  final-flush closure (`stop() + flushFinal({timeoutMs: 3000})`).

E2E coverage (Phase A — 5 cancel scenarios):
- New test/e2e/analytics/lifecycle-wire.e2e.ts drives real daemon +
  transport client + in-process HTTP capture stub.
- Each scenario: task:create → 50ms wait → task:cancel → wait for the
  natural 30s flush tick → assert wire payload.
- Coverage: curate-html-direct (alias → curate-tool-mode on wire),
  curate-folder (via the same flow), query-tool-mode, dream-scan,
  dream-finalize, search.
- Assertions: task_created + task_failed{failure_kind=cancelled}
  reach the wire with the right task_type. dream / search assert no
  per-flavor M12 row (only the generic task_* pair).
- Wire-shape sanity per request: device-id header, brv-cli/* UA,
  super-properties (cli_version, os, node_version, environment,
  device_id) on every event payload.

Gated by `npm run test:e2e:lifecycle` (180s timeout) — not part of
default `npm test` glob. ~2m end-to-end. Matches dev-beta.e2e.ts
precedent.

Phase B (super-props deep checks, JSONL/HTTP parity) deferred to a
follow-up commit.

* test: [ENG-2964] M15.6 add DB-roundtrip e2e for analytics lifecycle

Adds `npm run test:e2e:db` covering the FULL chain from `brv` CLI through
real HTTP to the local telemetry container into postgres `raw_events`.
Complements the Phase A `test:e2e:lifecycle` in-process capture suite —
that one stops at the daemon's HTTP request body; this one proves the
records actually land in the partitioned `raw_events` table with the
documented event names, task_type alias translation, failure_kind, and
shared device_id.

Three scenarios:
- curate-tool-mode (alias-translated from curate-html-direct)
- query-tool-mode
- search (no per-flavor M12 event, only task_created + task_failed)

Test reaches postgres via `docker exec ... psql -t -A` against the
existing byterover-telemetry compose stack (no host psql dependency).
A `before()` precondition skips the suite with a clear migration hint
when raw_events is missing, and `afterEach` preserves the scenario
temp dirs on failure so daemon logs + JSONL can be inspected post-hoc.

Not picked up by `npm test` (default glob excludes test/e2e/).

* test: [ENG-2964] M15.6 expand DB e2e — dream/isolation/wire-shape coverage

Grows `npm run test:e2e:db` from 3 → 7 scenarios. Adds dream-scan +
dream-finalize roundtrips (previously absent from DB coverage), a
deep wire-shape sanity check on a curate scenario, and a two-task
isolation guard that confirms task_id selectors don't bleed events
across concurrent runs.

New per-row sanity helper (`assertRowShape`) is invoked from every
scenario and pins:
  - schema_version = 1
  - cli_version matches semver
  - os ∈ {darwin, linux, win32}
  - node_version starts with `v<digit>`
  - environment ∈ {development, production}
  - device_id non-empty + identical across all rows of a task
  - user_id null (anonymous emits — test never authenticates)

Per-flavor payload assertions added:
  - curate_run_completed: outcome='cancelled', duration_ms ≥ 0, all
    operations_* counters and pending_review_count = 0 (we cancel
    before any tool calls land)
  - query_completed: outcome='cancelled', duration_ms ≥ 0,
    read_tool_call_count = 0, search_call_count = 0
  - task_created: has_files=false, has_folder=false
  - task_failed: failure_kind='cancelled'

Wire-shape sanity scenario adds:
  - client_timestamp ≤ received_at + 5s budget
  - both timestamps fall in [test_start - 10s, test_end + 10s]
  - properties.device_id === identity_device_id column (M4.1 stamp
    matches the request header)

Isolation scenario:
  - drives a curate-tool-mode task and a query-tool-mode task in
    sequence against the same daemon
  - asserts no task_id bleed, task_type partitions cleanly along
    task_id, both share device_id (single daemon → single config)

Cutover note: switched all e2e dispatches from 'curate-html-direct'
to 'curate-tool-mode' after the proj branch merge — TaskTypeSchema
only accepts the canonical name on the wire now. The legacy alias
remains in AnalyticsHook's `toAnalyticsTaskType` and is exercised
by the integration test directly, not via the transport.

Suite runtime: ~5 min (7 scenarios × ~30s each), all green.

* fix: [ENG-2964] use 'curate-tool-mode' through transport in lifecycle tests

After the proj/analytics-system-tool-mode merge, TaskTypeSchema only
accepts the canonical 'curate-tool-mode' over the wire — ENG-2925's
rename fully landed and 'curate-html-direct' is no longer enumerated
on `TaskCreateRequest` / `TaskExecute`. Tests that dispatch through
TaskRouter (the integration lifecycle-wiring suite + the e2e wire
capture suite) were rejected at the Zod boundary, never reached
AnalyticsHook, and asserted on undefined emits.

The merge resolution I picked for the integration test was wrong — I
kept the alias in HEAD on the assumption it exercised the alias
translation, but the transport boundary now strips it long before
AnalyticsHook gets a chance to translate.

The AnalyticsHook unit tests (analytics-hook-m14.test.ts, et al)
still exercise the alias path by constructing TaskInfo directly and
invoking `hook.onTaskCreate(task)` — those bypass TaskRouter, so
'curate-html-direct' remains a valid test input there and the
`toAnalyticsTaskType` aliasing is still covered.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants