Feat/eng 2964 by cuongdo-byterover · Pull Request #722 · campfirein/byterover-cli

cuongdo-byterover · 2026-05-27T13:43:01Z

Summary

Problem:
Why it matters:
What changed:
What did NOT change (scope boundary):

Type of change

Scope (select all touched areas)

Linked issues

Closes #
Related #

Root cause (bug fixes only, otherwise write `N/A`)

Root cause:
Why this was not caught earlier:

Test plan

Coverage added:
- Unit test
- Integration test
- Manual verification only
Test file(s):
Key scenario(s) covered:

User-visible changes

List user-visible changes (including defaults, config, or CLI output).
If none, write None.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording

Checklist

Tests added or updated and passing (npm test)
Lint passes (npm run lint)
Type check passes (npm run typecheck)
Build succeeds (npm run build)
Commits follow Conventional Commits format
Documentation updated (if applicable)
No breaking changes (or clearly documented above)
Branch is up to date with main

Risks and mitigations

List real risks for this PR. If none, write None.

Risk:
- Mitigation:

…sk types Adds curate-tool-mode, query-tool-mode, dream-scan, dream-finalize to TaskTypes + TASK_TYPE_VALUES. After ENG-2925's rename the daemon dispatches these four types but the analytics enum still predates v4.0 — task_created / task_completed / task_failed silently rejected every tool-mode emit at the Zod boundary. - TaskTypes keeps the legacy 'curate' / 'query' / 'dream' values for back-compat with any constructor still building those payloads - per-event schemas (task_created / task_completed / task_failed) pick up the new types automatically via z.enum(TASK_TYPE_VALUES) - M12 per-flavor schemas (curate_run_completed / query_completed) still hardcode their own literals and continue to reject tool-mode types here — M14.2 migrates them to the canonical enum as a follow-up TDD: - new task-types tests assert TaskCreated/Completed/Failed accept all four new types - regression tests pin the M12 schemas' continued rejection so M14.2 has a clear flip point

…sk_type to TASK_TYPE_VALUES Migrates the two M12 per-flavor schemas from hardcoded literal task_type values to the canonical TASK_TYPE_VALUES enum so v4.0 tool-mode tasks round-trip the wire boundary instead of being silently Zod-rejected inside AnalyticsHook. - curate_run_completed: z.enum(['curate', 'curate-folder']) → z.enum(TASK_TYPE_VALUES). curate-tool-mode payloads now validate. - query_completed: z.literal('query') → z.enum(TASK_TYPE_VALUES). query-tool-mode payloads now validate. The schemas no longer structurally constrain task_type to the curate or query family; the hook is trusted to only emit each event for the right flavor. Docblocks call out the widening for the next maintainer. TDD: - curate-run-completed.test asserts curate-tool-mode + legacy values both succeed; an unknown task_type still rejects - query-completed.test mirrors the same coverage for query-tool-mode - task-types.test M14.1 regression assertions flipped from rejection to acceptance for the M12 schemas; M14.1 docblock comments updated Without this, M14.3's hook code would land but the M12 emits that fire alongside the new generic task_* emits would silently disappear on every tool-mode task — that's the bug operators noticed in Mixpanel.

…ed in AnalyticsHook Adds the three generic funnel-event emits described in M14's customer ask. Every daemon task (curate / curate-folder / curate-tool-mode (aliased from curate-html-direct) / query / query-tool-mode / dream-scan / dream-finalize / search) now produces, in order: task_created onTaskCreate (funnel entry — unconditional) ...optional per-op + M12 per-flavor terminal emit task_completed onTaskCompleted (terminal-event-last on success) task_failed onTaskError / onTaskCancelled (terminal-event-last) Three coupled changes shipped together so the wire stays consistent: 1. analytics-hook.ts grows three emits and a `toAnalyticsTaskType` alias-translator. The daemon still dispatches the pre-ENG-2925 name `curate-html-direct`; analytics canonicalises it to the post-rename `curate-tool-mode` so the wire enum matches TASK_TYPE_VALUES. Once ENG-2925 lands, the alias becomes a no-op identity. 2. CURATE_TASK_TYPES + QUERY_TASK_TYPES gain the tool-mode names so M12 per-flavor state init kicks in for tool-mode curates / queries. M12 counters stay all-zero on tool-mode today (no LLM tool calls fire) — that's a separate follow-up (FU-1 in plans/analytics-m14/follow-ups). 3. Both M12 payload builders route `task_type` through the same alias so curate-tool-mode tasks emit `task_type='curate-tool-mode'` on the curate_run_completed / query_completed events too. TDD: - new analytics-hook-m14.test.ts: 15 simulation tests covering every task type, the curate-html-direct → curate-tool-mode alias, has_files / has_folder semantics, both terminal paths, and the ordering invariant - existing analytics-hook.test.ts updated to filter out the new generic emits via `filterM12()` so M12-focused assertions keep their intent - integration stress tests updated for the new 13-event sequence (task_created → ops → curate_run_completed → task_completed) - repo full suite: 9121 passing, 0 failing

…ords/tags, structured related_paths Tightens the curate / query M12 payloads per code review: curate_operation_applied: - rename absolute_path → relative_path (project-relative via path.relative against the task's projectPath; falls back to identity when projectPath is unset so search tasks at the daemon root still emit a usable string) - keywords / tags promoted from optional to required arrays (default []) so the wire shape stays uniform regardless of frontmatter read success query_completed.read_paths_with_metadata: - same absolute_path → relative_path rename - same keywords / tags promotion (required arrays, default []) - flat related: string[] → structured related_paths: [{relative_path, keywords, tags}] so each linked topic carries its own metadata slot. Keywords/tags default to [] until a future FU cascade-reads each linked file's frontmatter — the wire shape doesn't need to change when that lands. Hook implementation: - new `toRelativePath(filePath, projectPath)` helper using node:path.relative - CurateTaskAnalyticsState stores projectPath captured at onTaskCreate so per-op emits can relativize without threading task through processToolResult - all four payload sites (curate-op, curate-run-completed, query-completed read_paths_with_metadata) route through the helper Inspection test added at test/unit/.../analytics-hook-toolmode-inspection that pretty-prints every event + payload for curate-tool-mode / query / query-tool-mode flows — gives PMs a single place to verify the wire shape end-to-end. Privacy win: relative paths drop the /Users/{name} prefix from every file-touched event, keeping host-identifiable PII off the analytics wire while still letting PMs reason about which file inside a project an operation touched. Tests: full repo 9132 passing.

…re_kind on task_failed Delivers M15.6's customer-facing piece: the actual wiring of AnalyticsHook into the daemon's lifecycleHooks[] (which M14.3 built emit logic for but never plumbed) plus the coarse failure_kind classifier on task_failed. Per the M14 / M15.6 alignment (Option A): M14.1-M14.3 ship as the foundation. M15.6 adds the wiring + failure_kind. M15.6's stated "scaffolded only" stance on task_created + per-flavor schemas is intentionally relaxed — M14 already produces those events and the review-tightening on curate_operation_applied / read_paths_with_metadata delivers PM-visible value that the M15.6 description deferred. Changes: src/server/infra/daemon/brv-server.ts: - import AnalyticsHook - construct it BEFORE TransportHandlers so it can land in lifecycleHooks[] alongside curate-log / query-log / task-history (now 4 peers, not 3) - the isAnalyticsEnabled gate from setupFeatureHandlers binds in via a closure ref that defers lookup until emit-time - capture setupFeatureHandlers's return value (was discarded), then setAnalyticsClient(analyticsClient) on the pre-registered hook src/shared/analytics/events/task-failed.ts: - new FailureKindValues + FailureKind type: 'cancelled' | 'timeout' | 'agent_error' | 'unknown' (coarse vocab, ≤64 chars, never raw error.message) - failure_kind added to TaskFailedSchema as a required field - docblock notes the M15.6 privacy contract src/server/infra/process/analytics-hook.ts: - new classifyFailureKind(errorMessage) helper. Substring sentinels for 'timeout' / 'timed out' / 'deadline exceeded' → 'timeout'; 'agent' / 'llm' / 'provider' / 'tool' → 'agent_error'; default 'unknown'. Raw error string NEVER ends up on the wire. - emitTaskFailed now takes a FailureKind argument - onTaskCancelled → emitTaskFailed(..., 'cancelled') - onTaskError → emitTaskFailed(..., classifyFailureKind(errorMessage)) TDD: - task-failed.test: 3 new tests pin the failure_kind shape (accepts the 4 canonical values, rejects out-of-vocab, rejects missing) - analytics-hook-m14.test: 3 new tests verify the classifier paths land on the wire — 'kaboom' → 'unknown', timeout strings → 'timeout', agent strings → 'agent_error' - new analytics-hook-lifecycle-wiring.test integration covers the end-to-end task:create → analyticsHook.onTaskCreate path through a real TaskRouter, plus all four task-type variants going through task_created → task_completed in sequence Full suite: 9145 passing, 0 failing. Lint clean.

github-actions · 2026-05-27T13:43:31Z

Claude finished @cuongdo-byterover's task in 5m 26s —— View job

Claude Code PR Review

Load review tools
Examine PR diff against origin/proj/analytics-system-tool-mode
Review analytics-hook.ts changes (largest delta)
Review event schemas (task-failed, query-completed, curate-*)
Review test coverage (new and modified tests)
Post inline comments for specific issues
Post final summary with score

Summary

This PR wires AnalyticsHook as the 4th lifecycle peer in TaskRouter, adds failure_kind to task_failed, extends TASK_TYPE_VALUES to cover the v4.0 tool-mode types (curate-tool-mode, query-tool-mode, dream-scan, dream-finalize), and tightens the M14-review wire shape (absolute_path → relative_path, required keywords / tags arrays, structured related_paths).

The implementation is solid: the per-task FIFO queue logic in onToolResult is well-reasoned, the late-bind pattern in brv-server.ts for setAnalyticsClient / isAnalyticsEnabled is clean given the boot ordering, and the test coverage is genuinely good — analytics-hook-m14.test.ts and analytics-hook-lifecycle-wiring.test.ts cover the cross-product of (task type × outcome) thoroughly, including the daemon→wire alias for curate-html-direct → curate-tool-mode.

Findings worth addressing before merge:

#	Severity	Finding
1	`issue (privacy)`	`toRelativePath` doesn't guard against `..` parent traversal when the read file is outside `projectPath`. Symlinks, linked sources (`brv source add`), and unset `projectPath` all produce wire strings that still leak the absolute filesystem layout.
2	`suggestion`	`classifyFailureKind` uses raw `.includes()` for `'tool'` / `'agent'` / `'llm'` / `'provider'`. Matches `'tooltip'`, `'engagement'`, etc. → false `agent_error` classifications. Recommend word-boundary regex + a comment documenting the timeout > agent > unknown precedence.
3	`issue (test hygiene)`	`analytics-hook-toolmode-inspection.test.ts` writes every emitted event to `console.log` via `dumpEvents`. Matches `test/*/.test.ts` so it spams every `npm test`. CLAUDE.md explicitly calls this out.
4	`nitpick (type safety)`	`toAnalyticsTaskType` returns `daemonType as TaskType` unchecked. A future daemon type added without updating `TASK_TYPE_VALUES` silently emits events the backend zod-rejects, with no daemon-side signal.
5	`nitpick (forward-compat)`	The "counters all-zero today" assertions in `analytics-hook-toolmode-inspection.test.ts` pin a known limitation as a positive expectation — these will read like a regression when FU-1 lands.
6	`question (consistency)`	`curate_operation_applied.related` stayed as flat `string[]` while `query_completed.read_paths_with_metadata[].related_paths` was tightened to structured objects. Asymmetry is fine if deliberate, but a `TODO` / tracking link would prevent the shape ossifying in two idioms.
7	`praise`	The `lifecycleHooks[]` late-bind pattern in `brv-server.ts` (register hook early, inject client after `setupFeatureHandlers` resolves) is well-shaped. One tiny defensive suggestion left inline about `isEnabled` defaulting to `false`.

None of these are merge-blockers individually — #1 is the one I'd most want to fix on this PR (it's a privacy regression masquerading as a no-op), and #3 will quietly degrade test ergonomics for everyone if it isn't caught now.

The schema additions, test coverage, async ordering invariants, and the boot-time wire-up all look correct. The PII surface area is the main thing to harden.

Overall Score: 4 / 5

Strong, well-tested implementation; the privacy guard on toRelativePath and the test-output cleanup hold it back from a 5.
· Branch feat/ENG-2964

#1 classifyFailureKind — word-boundary regex + pinned precedence: Substring matching let 'tooltip' / 'engagement' / 'urgent' bucket into agent_error. Tightened to /\b(agent|llm|provider|tool)\b/ and /\b(timeout|timed out|deadline exceeded)\b/. Docblock pins precedence (timeout > agent_error > unknown) so future if-order shuffles can't silently rebucket the funnel. #2 toRelativePath outside-project guard (privacy bug): path.relative('/proj','/Users/dev/other/x.md') returned '../../Users/dev/other/x.md' — still encoded the host layout. Same hole when projectPath was undefined (fell back to raw absolute path). Now returns '<outside-project>/<basename>' in both cases; preserves enough signal for backend grouping without becoming PII. Same guard also runs against absolute-path tails (Windows drive-letter switches). #3 toAnalyticsTaskType drift guard: Replaced `daemonType as TaskType` (per CLAUDE.md anti-cast rule) with a TASK_TYPE_SET membership check + processLog warning + 'unknown' sentinel fallback. A future un-enumerated dispatch now lands a debuggable warning at the daemon instead of disappearing at the backend Zod check. #4 dumpEvents test pollution: Gated behind DUMP_ANALYTICS=1. `npm test` runs the shape assertions silently; opt-in dump still works via `DUMP_ANALYTICS=1 npx mocha test/unit/.../analytics-hook-toolmode-inspection.test.ts`. #5 FU-1 forward-compat comment: Added a block-comment note above the tool-mode "operations_*: 0" assertions so when FU-1 lands and the counters flip non-zero, the failure reads as "feature, update the test" rather than a regression. #6 curate_operation_applied.related asymmetry: Added a TODO(M15.x) marker on the `related` field flagging the wire-shape asymmetry with `query_completed.read_paths_with_metadata .related_paths` (structured). Restructure is consumer-migration territory — own ticket. #7 brv-server.ts loud-fail assertion: Added an explicit throw if setupFeatureHandlers returns without analyticsClient. Future refactors that drop the field will explode here instead of silently no-op'ing every emit forever. Tests added to lock in #1, #2, #3 behavior so the review intent doesn't drift. Existing stress-test fixtures moved from `/A/op-N.md` under-the-root paths to `/proj/A/op-N.md` so they exercise the in-project case (out-of-project is now its own focused test). Full suite: 9227 passing, 0 failing. Lint clean.

github-actions · 2026-05-27T14:10:14Z

Claude finished @cuongdo-byterover's task in 3m 55s —— View job

Claude Code PR Review (re-review after fixups)

Load review tools
Diff feat/ENG-2964 against origin/proj/analytics-system-tool-mode
Verify the 7 previous review findings were addressed in commit 190f4b4
Spot-check analytics-hook.ts (toRelativePath, classifyFailureKind, toAnalyticsTaskType) and inspection test
Look for any new issues introduced by the fixup
Post final summary with updated score

What got addressed

#	Original finding	Status	Note
1	`classifyFailureKind` substring matching	✅ Resolved	Word-boundary regexes (`/\b(timeout\|timed out\|deadline exceeded)\b/`, `/\b(agent\|llm\|provider\|tool)\b/`). Precedence pinned in the docblock. Two new tests in `analytics-hook-m14.test.ts` (`tooltip/engagement` → unknown; `llm provider timeout` → timeout).
2	`toRelativePath` PII leak on `..` / unset projectPath	✅ Resolved	Returns `<outside-project>/<basename>` for parent-traversal, absolute (Windows drive switch), and `projectPath===undefined`. Two new tests cover both branches. Stress-test fixtures rebased onto `/proj/A/op-N.md` so the in-project case is also exercised.
3	`toAnalyticsTaskType` unchecked widening	⚠️ Partially resolved — new issue, see below	Drift logged + sentinel returned, but the sentinel `'unknown'` isn't on the wire vocabulary.
4	Inspection test `console.log` noise	✅ Resolved	Gated behind `process.env.DUMP_ANALYTICS === '1'`. `npm test` is quiet; the diagnostic is opt-in.
5	FU-1 forward-compat comment	✅ Resolved	Block comment in `analytics-hook-toolmode-inspection.test.ts:106-109` explicitly tells future readers "this flips to non-zero in FU-1 — feature, not regression."
6	`curate_operation_applied.related` shape asymmetry	✅ Resolved (TODO marker)	`TODO(M15.x)` added at `curate-operation-applied.ts:27-30` pointing at the harmonization ticket.
7	Loud-fail on missing `analyticsClient`	✅ Resolved	`brv-server.ts:686-688` throws explicitly. The first defensive note (default `isAnalyticsEnabledRef` to `() => false`) was not taken, but it's strictly defensive — the boot-time race is already documented.

One real new issue from the fixup ⚠️

toAnalyticsTaskType drift sentinel doesn't actually keep the event on the wire — src/server/infra/process/analytics-hook.ts:31 + 47-52

const UNKNOWN_TASK_TYPE: TaskType = 'unknown' as TaskType

function toAnalyticsTaskType(daemonType: string): TaskType {
  if (daemonType === 'curate-html-direct') return TaskTypes.CURATE_TOOL_MODE
  if (ANALYTICS_TASK_TYPE_SET.has(daemonType)) return daemonType as TaskType
  processLog(`AnalyticsHook: unknown task type '${daemonType}' — falling back to '${UNKNOWN_TASK_TYPE}'`)
  return UNKNOWN_TASK_TYPE
}

The docblock promises this "keeps the event on the wire instead of silently failing the Zod check at the backend." It doesn't — 'unknown' is not in TASK_TYPE_VALUES:

src/shared/analytics/task-types.ts:31-41 — TASK_TYPE_VALUES lists 9 types, none of them 'unknown'.
Every task_* schema validates via z.enum(TASK_TYPE_VALUES) — see task-failed.ts:38, task-created.ts:21, task-completed.ts:17, query-completed.ts:69, curate-run-completed.ts:30.
test/unit/shared/analytics/task-types.test.ts:96-107 is the smoking gun: the suite already asserts the schemas reject unknown task_types.

What actually happens at runtime: AnalyticsClient.track() (analytics-client.ts:230-249) and JsonlAnalyticsStore.append() don't run the per-event schema, so the event persists locally and the daemon logs nothing wrong. But the moment the batch ships, the backend Zod check fails the row — the same failure mode the guard claims to prevent, just with a noisier wire payload.

Secondary concern: 'unknown' as TaskType is the exact as Type widening CLAUDE.md forbids and that the original review #4 called out. The cast at the unsafe site moved to a cast on the constant.

Two ways to actually close the gap, pick one:

Add 'unknown' to the canonical vocabulary — TaskTypes.UNKNOWN: 'unknown' in TaskTypes, appended to TASK_TYPE_VALUES. The Zod schema then accepts the sentinel, the cast disappears, and the m14 test at line 269-275 holds for real.
Don't emit the funnel event at all for unknown types — return undefined from toAnalyticsTaskType, have callers skip the emit. The log line still gives a daemon-side breadcrumb. Trade-off: an un-enumerated type vanishes from analytics entirely (no funnel row) instead of bucketing as unknown.

Option 1 matches the docblock's stated intent better and is the smaller change. Either way, this is the right PR to close the gap because the new m14 drift-guard test (analytics-hook-m14.test.ts:269-275) currently uses a Sinon stub that never runs Zod — a real integration would catch this.

Fix this →

Smaller observations (non-blocking)

src/server/infra/process/analytics-hook.ts:55-59 — the docblock for OUTSIDE_PROJECT_PATH has a leading-space indentation drift (lines 56-57 carry an extra space). Cosmetic.
src/server/infra/process/analytics-hook.ts:74 — basename is taken from the absolute path. On Windows running against a POSIX-style absolute string (or vice versa), the basename split won't honor the foreign separator. Realistically the daemon and the agent are on the same host so this never bites, but worth a thought if cross-platform mocking ever enters scope.
analytics-hook.test.ts was substantially reworked (+121/−81). The analytics-hook-async-stress.test.ts fixture rebase to /proj/... is the right move — it isolates the outside-project case to its own focused test instead of conflating it with serialization invariants.

Praise (post-fix)

The m14.test.ts additions are the right shape: every new branch (word-boundary classifier, outside-project guard, drift-guard sentinel, classifier precedence) gets a tight named test with a (PR #722) tag in the description. That makes review-comment provenance auditable from the test suite years from now — quietly excellent practice.

Overall Score: 4 / 5

The privacy fix, classifier hardening, test hygiene gating, and loud-fail assertion all landed clean and the new test coverage is thorough. Holding at 4 because the drift-guard sentinel (#3 in the previous review) didn't actually close the gap it claims to — 'unknown' isn't a valid wire value, so the backend still drops the row. Adding TaskTypes.UNKNOWN to the canonical vocabulary closes the loop and gets this to 5.
· Branch feat/ENG-2964

#722 re-review) Re-review caught a real bug in the previous fixup: `toAnalyticsTaskType` returned `'unknown' as TaskType` for un-enumerated daemon types, but `'unknown'` was NOT in `TASK_TYPE_VALUES`. The docblock promised "keeps the event on the wire instead of silently failing the Zod check at the backend" — but the wire-side `z.enum(TASK_TYPE_VALUES)` actually still rejected the row. The local Sinon trackStub doesn't run Zod, which is why the m14 test passed even though the runtime path was broken. Also the `as TaskType` cast moved sites but didn't disappear — the exact anti-pattern the original review flagged. Fix: - Add `TaskTypes.UNKNOWN: 'unknown'` to the canonical vocabulary. - Append `TaskTypes.UNKNOWN` to `TASK_TYPE_VALUES` so every task_* schema accepts the sentinel via `z.enum(TASK_TYPE_VALUES)`. - Introduce `isCanonicalTaskType()` predicate so `toAnalyticsTaskType` narrows without the `as TaskType` cast. Sentinel returned as `TaskTypes.UNKNOWN` directly. - Update `task-types.test` to assert the new `UNKNOWN` key/value pair. Existing negative test for an unknown task_type already uses `'not-a-real-type'` (not `'unknown'`), so it stays green. Cosmetic: dropped the extra-space drift in the OUTSIDE_PROJECT_PATH docblock noted in the same re-review. Tests: 9227 passing, 0 failing.

github-actions · 2026-05-27T14:29:01Z

Claude finished @cuongdo-byterover's task in 2m 19s —— View job

Claude Code PR Review (3rd pass — drift-sentinel fix)

Load review tools
Diff current HEAD against origin/proj/analytics-system-tool-mode (focus on commit 8d28f4b)
Verify drift sentinel (TaskTypes.UNKNOWN) is now on the wire vocabulary
Re-check analytics-hook.ts, task-types.ts, and the m14 drift-guard test
Look for any new issues introduced by the fixup
Post final summary with updated score

What landed in `8d28f4b`

#	Previous-review issue	Status	Evidence
3 (re-review)	Drift sentinel `'unknown'` returned but not in `TASK_TYPE_VALUES` → backend Zod still drops the row	✅ Resolved	`TaskTypes.UNKNOWN: 'unknown'` added at `src/shared/analytics/task-types.ts:30`; appended to `TASK_TYPE_VALUES` at `:50`. Every task_* schema validates via `z.enum(TASK_TYPE_VALUES)` (`task-created.ts:21`, `task-completed.ts:17`, `task-failed.ts:38`, `query-completed.ts:69`, `curate-run-completed.ts:30`) — the sentinel now actually rides the wire.
3 (orig)	`daemonType as TaskType` cast	✅ Resolved	Replaced with a `isCanonicalTaskType(value): value is TaskType` user-defined type guard at `analytics-hook.ts:32`. The unsafe cast at the call site is gone. The cast moved to one tightly scoped `(ANALYTICS_TASK_TYPE_SET as Set<string>).has(value)` inside the predicate, which is the standard pattern for widening a `Set<T>.has` argument.
—	OUTSIDE_PROJECT_PATH docblock cosmetic drift	✅ Resolved	Leading-space drift on lines 56–57 cleaned up; cosmetic only.
—	Default `isAnalyticsEnabledRef` to `() => false` (optional defensive nit)	⚪ Not taken	`brv-server.ts:378` still defaults to `() => true`. Acknowledged in the previous review as "strictly defensive — boot-time race is already documented." Fine to leave.

Verification chain (why this is genuinely closed, not just moved)

TaskTypes.UNKNOWN is now part of Object.values(TaskTypes).
TASK_TYPE_VALUES is asserted-equal to Object.values(TaskTypes) by the invariant test at test/unit/shared/analytics/task-types.test.ts:51-53 — so any future drift between the const-object and the tuple fails CI.
All five task_* schemas validate task_type against z.enum(TASK_TYPE_VALUES) (grep confirmed). The wire-side check now accepts 'unknown' as a legitimate row value rather than dropping it.
The negative test "rejects an unknown task_type" (task-types.test.ts:102-113) uses 'not-a-real-type', not 'unknown', so it's untouched by adding the sentinel.
The m14 drift-guard test (analytics-hook-m14.test.ts:268-275) still uses a Sinon stub (no live Zod), but the invariant in step 2 covers the gap the previous review flagged — a regression where TaskTypes.UNKNOWN gets added without TASK_TYPE_VALUES being updated would fail at task-types.test.ts:52, before the analytics-hook layer ever runs.

One small forward-looking observation (non-blocking)

isCanonicalTaskType performs membership against Object.values(TaskTypes). That's a strict superset of TASK_TYPE_VALUES only if those two drift — but the invariant test from step 2 above pins them equal, so this is currently safe. If a future contributor finds a reason to maintain those independently (e.g. internal-only TaskTypes keys not on the wire), the predicate would lie. The TASK_TYPE_VALUES-equality test is the load-bearing invariant; worth keeping in mind for any future refactor.

Praise

The fix is exactly the right shape: smallest change that closes the real gap (TaskTypes.UNKNOWN lives in the canonical vocabulary, predicate eliminates the cast). The commit message accurately diagnoses why the previous fixup didn't work (Sinon stub doesn't run Zod) and explains what changed. That diagnosis is more valuable than the diff itself — it's the kind of failure analysis that prevents the same class of bug in a year.

Overall Score: 5 / 5

The drift-sentinel gap is genuinely closed — 'unknown' is on the wire vocabulary, every task_* schema accepts it via z.enum(TASK_TYPE_VALUES), and the invariant test at task-types.test.ts:51-53 is the load-bearing safety net for future schema/vocabulary drift. The unchecked as TaskType widening that CLAUDE.md forbids is gone. Privacy guard (toRelativePath), classifier hardening, test hygiene, and the loud-fail assertion from the prior pass all remain intact. Ready to merge.
· Branch feat/ENG-2964

* feat: [ENG-2969] M14.1 extend TASK_TYPE_VALUES with v4.0 tool-mode task types Adds curate-tool-mode, query-tool-mode, dream-scan, dream-finalize to TaskTypes + TASK_TYPE_VALUES. After ENG-2925's rename the daemon dispatches these four types but the analytics enum still predates v4.0 — task_created / task_completed / task_failed silently rejected every tool-mode emit at the Zod boundary. - TaskTypes keeps the legacy 'curate' / 'query' / 'dream' values for back-compat with any constructor still building those payloads - per-event schemas (task_created / task_completed / task_failed) pick up the new types automatically via z.enum(TASK_TYPE_VALUES) - M12 per-flavor schemas (curate_run_completed / query_completed) still hardcode their own literals and continue to reject tool-mode types here — M14.2 migrates them to the canonical enum as a follow-up TDD: - new task-types tests assert TaskCreated/Completed/Failed accept all four new types - regression tests pin the M12 schemas' continued rejection so M14.2 has a clear flip point * fix: [ENG-2970] M14.2 relax curate_run_completed + query_completed task_type to TASK_TYPE_VALUES Migrates the two M12 per-flavor schemas from hardcoded literal task_type values to the canonical TASK_TYPE_VALUES enum so v4.0 tool-mode tasks round-trip the wire boundary instead of being silently Zod-rejected inside AnalyticsHook. - curate_run_completed: z.enum(['curate', 'curate-folder']) → z.enum(TASK_TYPE_VALUES). curate-tool-mode payloads now validate. - query_completed: z.literal('query') → z.enum(TASK_TYPE_VALUES). query-tool-mode payloads now validate. The schemas no longer structurally constrain task_type to the curate or query family; the hook is trusted to only emit each event for the right flavor. Docblocks call out the widening for the next maintainer. TDD: - curate-run-completed.test asserts curate-tool-mode + legacy values both succeed; an unknown task_type still rejects - query-completed.test mirrors the same coverage for query-tool-mode - task-types.test M14.1 regression assertions flipped from rejection to acceptance for the M12 schemas; M14.1 docblock comments updated Without this, M14.3's hook code would land but the M12 emits that fire alongside the new generic task_* emits would silently disappear on every tool-mode task — that's the bug operators noticed in Mixpanel. * feat: [ENG-2971] M14.3 wire task_created + task_completed + task_failed in AnalyticsHook Adds the three generic funnel-event emits described in M14's customer ask. Every daemon task (curate / curate-folder / curate-tool-mode (aliased from curate-html-direct) / query / query-tool-mode / dream-scan / dream-finalize / search) now produces, in order: task_created onTaskCreate (funnel entry — unconditional) ...optional per-op + M12 per-flavor terminal emit task_completed onTaskCompleted (terminal-event-last on success) task_failed onTaskError / onTaskCancelled (terminal-event-last) Three coupled changes shipped together so the wire stays consistent: 1. analytics-hook.ts grows three emits and a `toAnalyticsTaskType` alias-translator. The daemon still dispatches the pre-ENG-2925 name `curate-html-direct`; analytics canonicalises it to the post-rename `curate-tool-mode` so the wire enum matches TASK_TYPE_VALUES. Once ENG-2925 lands, the alias becomes a no-op identity. 2. CURATE_TASK_TYPES + QUERY_TASK_TYPES gain the tool-mode names so M12 per-flavor state init kicks in for tool-mode curates / queries. M12 counters stay all-zero on tool-mode today (no LLM tool calls fire) — that's a separate follow-up (FU-1 in plans/analytics-m14/follow-ups). 3. Both M12 payload builders route `task_type` through the same alias so curate-tool-mode tasks emit `task_type='curate-tool-mode'` on the curate_run_completed / query_completed events too. TDD: - new analytics-hook-m14.test.ts: 15 simulation tests covering every task type, the curate-html-direct → curate-tool-mode alias, has_files / has_folder semantics, both terminal paths, and the ordering invariant - existing analytics-hook.test.ts updated to filter out the new generic emits via `filterM12()` so M12-focused assertions keep their intent - integration stress tests updated for the new 13-event sequence (task_created → ops → curate_run_completed → task_completed) - repo full suite: 9121 passing, 0 failing * refactor: [ENG-2971] M14.3 review fix — relative paths, required keywords/tags, structured related_paths Tightens the curate / query M12 payloads per code review: curate_operation_applied: - rename absolute_path → relative_path (project-relative via path.relative against the task's projectPath; falls back to identity when projectPath is unset so search tasks at the daemon root still emit a usable string) - keywords / tags promoted from optional to required arrays (default []) so the wire shape stays uniform regardless of frontmatter read success query_completed.read_paths_with_metadata: - same absolute_path → relative_path rename - same keywords / tags promotion (required arrays, default []) - flat related: string[] → structured related_paths: [{relative_path, keywords, tags}] so each linked topic carries its own metadata slot. Keywords/tags default to [] until a future FU cascade-reads each linked file's frontmatter — the wire shape doesn't need to change when that lands. Hook implementation: - new `toRelativePath(filePath, projectPath)` helper using node:path.relative - CurateTaskAnalyticsState stores projectPath captured at onTaskCreate so per-op emits can relativize without threading task through processToolResult - all four payload sites (curate-op, curate-run-completed, query-completed read_paths_with_metadata) route through the helper Inspection test added at test/unit/.../analytics-hook-toolmode-inspection that pretty-prints every event + payload for curate-tool-mode / query / query-tool-mode flows — gives PMs a single place to verify the wire shape end-to-end. Privacy win: relative paths drop the /Users/{name} prefix from every file-touched event, keeping host-identifiable PII off the analytics wire while still letting PMs reason about which file inside a project an operation touched. Tests: full repo 9132 passing. * feat: [ENG-2964] M15.6 wire AnalyticsHook into lifecycleHooks + failure_kind on task_failed Delivers M15.6's customer-facing piece: the actual wiring of AnalyticsHook into the daemon's lifecycleHooks[] (which M14.3 built emit logic for but never plumbed) plus the coarse failure_kind classifier on task_failed. Per the M14 / M15.6 alignment (Option A): M14.1-M14.3 ship as the foundation. M15.6 adds the wiring + failure_kind. M15.6's stated "scaffolded only" stance on task_created + per-flavor schemas is intentionally relaxed — M14 already produces those events and the review-tightening on curate_operation_applied / read_paths_with_metadata delivers PM-visible value that the M15.6 description deferred. Changes: src/server/infra/daemon/brv-server.ts: - import AnalyticsHook - construct it BEFORE TransportHandlers so it can land in lifecycleHooks[] alongside curate-log / query-log / task-history (now 4 peers, not 3) - the isAnalyticsEnabled gate from setupFeatureHandlers binds in via a closure ref that defers lookup until emit-time - capture setupFeatureHandlers's return value (was discarded), then setAnalyticsClient(analyticsClient) on the pre-registered hook src/shared/analytics/events/task-failed.ts: - new FailureKindValues + FailureKind type: 'cancelled' | 'timeout' | 'agent_error' | 'unknown' (coarse vocab, ≤64 chars, never raw error.message) - failure_kind added to TaskFailedSchema as a required field - docblock notes the M15.6 privacy contract src/server/infra/process/analytics-hook.ts: - new classifyFailureKind(errorMessage) helper. Substring sentinels for 'timeout' / 'timed out' / 'deadline exceeded' → 'timeout'; 'agent' / 'llm' / 'provider' / 'tool' → 'agent_error'; default 'unknown'. Raw error string NEVER ends up on the wire. - emitTaskFailed now takes a FailureKind argument - onTaskCancelled → emitTaskFailed(..., 'cancelled') - onTaskError → emitTaskFailed(..., classifyFailureKind(errorMessage)) TDD: - task-failed.test: 3 new tests pin the failure_kind shape (accepts the 4 canonical values, rejects out-of-vocab, rejects missing) - analytics-hook-m14.test: 3 new tests verify the classifier paths land on the wire — 'kaboom' → 'unknown', timeout strings → 'timeout', agent strings → 'agent_error' - new analytics-hook-lifecycle-wiring.test integration covers the end-to-end task:create → analyticsHook.onTaskCreate path through a real TaskRouter, plus all four task-type variants going through task_created → task_completed in sequence Full suite: 9145 passing, 0 failing. Lint clean. * refactor: [ENG-2964] M15.6 address PR #722 review comments (7 of 7) #1 classifyFailureKind — word-boundary regex + pinned precedence: Substring matching let 'tooltip' / 'engagement' / 'urgent' bucket into agent_error. Tightened to /\b(agent|llm|provider|tool)\b/ and /\b(timeout|timed out|deadline exceeded)\b/. Docblock pins precedence (timeout > agent_error > unknown) so future if-order shuffles can't silently rebucket the funnel. #2 toRelativePath outside-project guard (privacy bug): path.relative('/proj','/Users/dev/other/x.md') returned '../../Users/dev/other/x.md' — still encoded the host layout. Same hole when projectPath was undefined (fell back to raw absolute path). Now returns '<outside-project>/<basename>' in both cases; preserves enough signal for backend grouping without becoming PII. Same guard also runs against absolute-path tails (Windows drive-letter switches). #3 toAnalyticsTaskType drift guard: Replaced `daemonType as TaskType` (per CLAUDE.md anti-cast rule) with a TASK_TYPE_SET membership check + processLog warning + 'unknown' sentinel fallback. A future un-enumerated dispatch now lands a debuggable warning at the daemon instead of disappearing at the backend Zod check. #4 dumpEvents test pollution: Gated behind DUMP_ANALYTICS=1. `npm test` runs the shape assertions silently; opt-in dump still works via `DUMP_ANALYTICS=1 npx mocha test/unit/.../analytics-hook-toolmode-inspection.test.ts`. #5 FU-1 forward-compat comment: Added a block-comment note above the tool-mode "operations_*: 0" assertions so when FU-1 lands and the counters flip non-zero, the failure reads as "feature, update the test" rather than a regression. #6 curate_operation_applied.related asymmetry: Added a TODO(M15.x) marker on the `related` field flagging the wire-shape asymmetry with `query_completed.read_paths_with_metadata .related_paths` (structured). Restructure is consumer-migration territory — own ticket. #7 brv-server.ts loud-fail assertion: Added an explicit throw if setupFeatureHandlers returns without analyticsClient. Future refactors that drop the field will explode here instead of silently no-op'ing every emit forever. Tests added to lock in #1, #2, #3 behavior so the review intent doesn't drift. Existing stress-test fixtures moved from `/A/op-N.md` under-the-root paths to `/proj/A/op-N.md` so they exercise the in-project case (out-of-project is now its own focused test). Full suite: 9227 passing, 0 failing. Lint clean. * fix: [ENG-2964] M15.6 drift sentinel must be on the wire vocabulary (PR #722 re-review) Re-review caught a real bug in the previous fixup: `toAnalyticsTaskType` returned `'unknown' as TaskType` for un-enumerated daemon types, but `'unknown'` was NOT in `TASK_TYPE_VALUES`. The docblock promised "keeps the event on the wire instead of silently failing the Zod check at the backend" — but the wire-side `z.enum(TASK_TYPE_VALUES)` actually still rejected the row. The local Sinon trackStub doesn't run Zod, which is why the m14 test passed even though the runtime path was broken. Also the `as TaskType` cast moved sites but didn't disappear — the exact anti-pattern the original review flagged. Fix: - Add `TaskTypes.UNKNOWN: 'unknown'` to the canonical vocabulary. - Append `TaskTypes.UNKNOWN` to `TASK_TYPE_VALUES` so every task_* schema accepts the sentinel via `z.enum(TASK_TYPE_VALUES)`. - Introduce `isCanonicalTaskType()` predicate so `toAnalyticsTaskType` narrows without the `as TaskType` cast. Sentinel returned as `TaskTypes.UNKNOWN` directly. - Update `task-types.test` to assert the new `UNKNOWN` key/value pair. Existing negative test for an unknown task_type already uses `'not-a-real-type'` (not `'unknown'`), so it stays green. Cosmetic: dropped the extra-space drift in the OUTSIDE_PROJECT_PATH docblock noted in the same re-review. Tests: 9227 passing, 0 failing. * fix: [ENG-2964] M15.6 restore analyticsFlushScheduler.start() + Phase A e2e E2E shake-down caught a regression my own M15.6 commit (6983e88) introduced: when I refactored `setupFeatureHandlers`'s return capture from destructured `{analyticsClient, analyticsFlushScheduler, isAnalyticsEnabled}` to a bundled `featureHandlers` object, I dropped the call to `analyticsFlushScheduler.start()` that lived right after the destructure (originally landed in M4.3 / ENG-2645) AND lost the `analyticsFinalFlush` shutdown hook bound to the ShutdownHandler. Symptom: every `track()` call still landed in JSONL with status='pending', but the periodic 30s flush tick never armed (start() un-called) and shutdown never force-flushed (analyticsFinalFlush undefined). Events sat on disk forever. Production-affecting; the existing dev-beta.e2e.ts kept passing only because its CI run uses a real backend that the periodic flush never reached. Fix in brv-server.ts: - Re-add `let analyticsFinalFlush: (() => Promise<void>) | undefined` before ShutdownHandler construction so the shutdown closure can reference it via late-bind (same mutable-holder pattern M4.3 used). - Pass `analyticsFinalFlush: () => analyticsFinalFlush?.() ?? Promise.resolve()` to ShutdownHandler. - After the M15.6 late-bind of setAnalyticsClient, call `featureHandlers.analyticsFlushScheduler.start()` and assign the final-flush closure (`stop() + flushFinal({timeoutMs: 3000})`). E2E coverage (Phase A — 5 cancel scenarios): - New test/e2e/analytics/lifecycle-wire.e2e.ts drives real daemon + transport client + in-process HTTP capture stub. - Each scenario: task:create → 50ms wait → task:cancel → wait for the natural 30s flush tick → assert wire payload. - Coverage: curate-html-direct (alias → curate-tool-mode on wire), curate-folder (via the same flow), query-tool-mode, dream-scan, dream-finalize, search. - Assertions: task_created + task_failed{failure_kind=cancelled} reach the wire with the right task_type. dream / search assert no per-flavor M12 row (only the generic task_* pair). - Wire-shape sanity per request: device-id header, brv-cli/* UA, super-properties (cli_version, os, node_version, environment, device_id) on every event payload. Gated by `npm run test:e2e:lifecycle` (180s timeout) — not part of default `npm test` glob. ~2m end-to-end. Matches dev-beta.e2e.ts precedent. Phase B (super-props deep checks, JSONL/HTTP parity) deferred to a follow-up commit. * test: [ENG-2964] M15.6 add DB-roundtrip e2e for analytics lifecycle Adds `npm run test:e2e:db` covering the FULL chain from `brv` CLI through real HTTP to the local telemetry container into postgres `raw_events`. Complements the Phase A `test:e2e:lifecycle` in-process capture suite — that one stops at the daemon's HTTP request body; this one proves the records actually land in the partitioned `raw_events` table with the documented event names, task_type alias translation, failure_kind, and shared device_id. Three scenarios: - curate-tool-mode (alias-translated from curate-html-direct) - query-tool-mode - search (no per-flavor M12 event, only task_created + task_failed) Test reaches postgres via `docker exec ... psql -t -A` against the existing byterover-telemetry compose stack (no host psql dependency). A `before()` precondition skips the suite with a clear migration hint when raw_events is missing, and `afterEach` preserves the scenario temp dirs on failure so daemon logs + JSONL can be inspected post-hoc. Not picked up by `npm test` (default glob excludes test/e2e/). * test: [ENG-2964] M15.6 expand DB e2e — dream/isolation/wire-shape coverage Grows `npm run test:e2e:db` from 3 → 7 scenarios. Adds dream-scan + dream-finalize roundtrips (previously absent from DB coverage), a deep wire-shape sanity check on a curate scenario, and a two-task isolation guard that confirms task_id selectors don't bleed events across concurrent runs. New per-row sanity helper (`assertRowShape`) is invoked from every scenario and pins: - schema_version = 1 - cli_version matches semver - os ∈ {darwin, linux, win32} - node_version starts with `v<digit>` - environment ∈ {development, production} - device_id non-empty + identical across all rows of a task - user_id null (anonymous emits — test never authenticates) Per-flavor payload assertions added: - curate_run_completed: outcome='cancelled', duration_ms ≥ 0, all operations_* counters and pending_review_count = 0 (we cancel before any tool calls land) - query_completed: outcome='cancelled', duration_ms ≥ 0, read_tool_call_count = 0, search_call_count = 0 - task_created: has_files=false, has_folder=false - task_failed: failure_kind='cancelled' Wire-shape sanity scenario adds: - client_timestamp ≤ received_at + 5s budget - both timestamps fall in [test_start - 10s, test_end + 10s] - properties.device_id === identity_device_id column (M4.1 stamp matches the request header) Isolation scenario: - drives a curate-tool-mode task and a query-tool-mode task in sequence against the same daemon - asserts no task_id bleed, task_type partitions cleanly along task_id, both share device_id (single daemon → single config) Cutover note: switched all e2e dispatches from 'curate-html-direct' to 'curate-tool-mode' after the proj branch merge — TaskTypeSchema only accepts the canonical name on the wire now. The legacy alias remains in AnalyticsHook's `toAnalyticsTaskType` and is exercised by the integration test directly, not via the transport. Suite runtime: ~5 min (7 scenarios × ~30s each), all green. * fix: [ENG-2964] use 'curate-tool-mode' through transport in lifecycle tests After the proj/analytics-system-tool-mode merge, TaskTypeSchema only accepts the canonical 'curate-tool-mode' over the wire — ENG-2925's rename fully landed and 'curate-html-direct' is no longer enumerated on `TaskCreateRequest` / `TaskExecute`. Tests that dispatch through TaskRouter (the integration lifecycle-wiring suite + the e2e wire capture suite) were rejected at the Zod boundary, never reached AnalyticsHook, and asserted on undefined emits. The merge resolution I picked for the integration test was wrong — I kept the alias in HEAD on the assumption it exercised the alias translation, but the transport boundary now strips it long before AnalyticsHook gets a chance to translate. The AnalyticsHook unit tests (analytics-hook-m14.test.ts, et al) still exercise the alias path by constructing TaskInfo directly and invoking `hook.onTaskCreate(task)` — those bypass TaskRouter, so 'curate-html-direct' remains a valid test input there and the `toAnalyticsTaskType` aliasing is still covered.

cuongdo-byterover added 6 commits May 27, 2026 15:26

Merge branch 'proj/analytics-system-tool-mode' into feat/ENG-2964

99bfb89

cuongdo-byterover requested review from bao-byterover and leehpham as code owners May 27, 2026 13:43