Skip to content

feat(parser): migrate bespoke providers (openhands, cursor, vibe, hermes, claude, cowork)#882

Open
mariusvniekerk wants to merge 7 commits into
fam/opencode-familyfrom
fam/bespoke
Open

feat(parser): migrate bespoke providers (openhands, cursor, vibe, hermes, claude, cowork)#882
mariusvniekerk wants to merge 7 commits into
fam/opencode-familyfrom
fam/bespoke

Conversation

@mariusvniekerk

@mariusvniekerk mariusvniekerk commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Migrates the providers with bespoke source shapes onto concrete facade providers: openhands (directory snapshot), cursor, vibe, hermes, claude (recursive transcripts with incremental append parsing and subagents), and cowork (reuses the Claude transcript format).

Because Claude's migration routes parse-diff through the provider path, this branch also folds in the parse-diff raced-source reclassification — gating reliability on parseDiffAgentDiscoverable — so a concurrent daemon write is classified Raced rather than Changed and cannot trip --fail-on-change.

Where to look: claude_provider.go (the largest change) and the parse-diff change in internal/sync/parsediff.go.

OpenHands stores each conversation as a directory with metadata and event files, so the provider needs a directory source facade rather than a JSONL file wrapper. This keeps the legacy discovery and dashed/undashed ID lookup behavior while making the composite snapshot fingerprint explicit at the provider boundary.

The provider uses the existing OpenHands parser and snapshot helpers so freshness, shallow watch planning, changed-path classification, and normalized parse output stay aligned with the legacy sync path.

test(parser): opt openhands into provider shadow

OpenHands now has a concrete facade provider on this branch, so its migration mode should enter shadow comparison instead of remaining legacy-only and additive.

Earlier provider opt-ins stay inherited and later provider branches own their modes.

Validation: go test -tags "fts5" ./internal/parser -run TestProviderMigrationModes -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

test(sync): compare openhands shadow parity

OpenHands is shadow-compared on this branch, so add source-level migration coverage that compares provider observation with ParseOpenHandsSession.

The test uses the directory snapshot source shape so the provider fingerprint path and planned data-version behavior stay visible while the branch migrates away from legacy dispatch.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesOpenHandsLegacyParser|TestOpenHandsProvider|TestParseOpenHands|TestDiscoverAndFindOpenHands|TestClassifyOnePath_OpenHands|TestProcessFileOpenHandsUsesSnapshotMtimeForRetryCache' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...

refactor(parser): fold openhands into provider

Move OpenHands discovery, source lookup, and parse ownership onto the
concrete provider and delete the package-level DiscoverOpenHandsSessions,
FindOpenHandsSourceFile, and ParseOpenHandsSession free functions.
Discovery now walks conversation roots directly in the provider source
set, raw-session-ID lookup folds the literal/dash-stripped/normalized
matching into sessionDirForID, and parsing runs on a provider receiver
method. The provider-neutral snapshot, session-dir predicate, and event
parse helpers stay as shared free functions.

Make OpenHands provider-authoritative and remove its legacy sync
dispatch: the classifyOnePath block, the processFile case arm, the
OpenHands snapshot-mtime branch, and processOpenHands are gone. Sync now
classifies and processes OpenHands through provider changed-path
handling, which preserves the base_state.json/TASKS.json/events companion
remap to the session directory and keeps the snapshot mtime driving the
skip-retry cache via the provider fingerprint.

Drop the OpenHands AgentDef DiscoverFunc/FindSourceFunc hooks, remove the
shadow baseline test, exempt the provider file from the shim scan, and
add a guard asserting the legacy entrypoints stay deleted.
Cursor transcript sources have two legacy layouts and select .jsonl over .txt when both exist for a session. Moving Cursor behind a concrete provider keeps that selection policy explicit at the provider boundary instead of relying on the legacy parser adapter.\n\nThe provider preserves recursive project discovery, raw/full ID lookup, stale .txt path promotion, changed-path classification, content-hash fingerprinting, and parser output normalization while using the same Cursor discovery and parsing helpers as the previous sync path.

fix(parser): preserve cursor project-scoped source selection

Cursor session IDs are only unique within an encoded project directory, but the provider was resolving stored and changed paths through a root-wide lookup. That could silently select the same transcript stem from a different project and drop valid sources during discovery.

Resolve Cursor source promotion inside the project derived from the incoming path, add duplicate-stem coverage, and mark model output unsupported until the parser actually fills message models. This lets the Cursor branch enter shadow comparison as a real migration step.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(CursorProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

test(sync): compare cursor shadow parity

Cursor is shadow-compared on this branch, so add source-level migration coverage that compares provider observation with ParseCursorSession.

The test uses duplicate transcript stems in different encoded project directories to lock in the current parser ID behavior while proving provider source observation stays project-scoped.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesCursorLegacyParser|TestCursorProvider|TestParseCursor|TestCursorSessionID' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...

test(sync): assert cursor provider hash parity

Roborev job 2709 caught that the Cursor shadow parity fixture normalized the legacy session hash before proving the provider fingerprint matched the legacy parser hash. That left the test unable to detect a provider fingerprint regression that propagated into parsed output.

Assert hash parity before normalizing the legacy session for the full struct comparison, keeping the existing duplicate-stem fixture focused on provider/legacy equivalence.

Validation: go test -tags "fts5" ./internal/sync -run TestObserveProviderSourceMatchesCursorLegacyParser -count=1; go fmt ./...; go vet ./...; git diff --check

refactor(parser): fold cursor into provider

Move Cursor source discovery, lookup, and parse ownership onto the
concrete cursorProvider and remove the package-level
DiscoverCursorSessions, FindCursorSourceFile, and ParseCursorSession
free functions. Discovery and find-source bodies now live as
provider-owned helpers (discoverTranscriptPaths, cursorAddSeen,
cursorFindSourceFile) on the cursor source set, and parseSession is a
receiver method.

Make Cursor provider-authoritative and drop its legacy sync dispatch:
the classifyOnePath transcript block, the processFile case arm, the
processCursor method, and its now-orphaned validateCursorContainment
and findContainingDir helpers. Source classification, containment,
.txt/.jsonl precedence, and project-hint decoding are all reproduced
through the provider's changed-path and discovery paths, so runtime
behavior is preserved. ParseCursorTranscriptRelPath stays a shared
provider-neutral path validator used by both the engine's project
enrichment and the provider.

Replace the shadow-baseline test with provider API coverage plus a
guard asserting the legacy entrypoints stay gone, and remove cursor
from the pending-shim list.

fix(parser): cap cursor provider fingerprinting

Cursor parsing already rejects transcripts over 10 MiB, but the migrated provider fingerprint path still hashed the full source before parse. That made oversized files pay an unbounded read cost in the provider freshness path even though parse would never accept them.\n\nKeep normal-size content hashing intact and return only metadata for oversized Cursor transcripts so parse remains the sole place that reads up to the guarded cap.\n\nValidation: go test -tags "fts5" ./internal/parser -run 'TestCursorProvider' -count=1; go vet ./...; git diff --check
Vibe stores transcript content in messages.jsonl while canonical session identity, title, timestamps, model, and usage can live in a sibling meta.json. Moving it behind a concrete provider keeps that companion relationship explicit at the provider boundary.\n\nThe provider preserves recursive session discovery, symlinked session directories, raw and full ID lookup through meta.json, meta-sidecar changed-path classification, effective size and mtime freshness, transcript hashing, fallback-ID exclusion, and parser output normalization through the existing Vibe parser wrapper.

fix(parser): classify removed vibe transcripts

Vibe source events need to keep working after the primary messages.jsonl has already disappeared. Routing deletion and rename-style events through the existing file check meant the watcher could ignore the exact event that should refresh or remove the stored session.

Synthesize source refs only for missing-path removal semantics, keep ordinary lookups existence-checked, and pin the intentionally shallow session directory layout in provider tests. This lets the Vibe provider enter shadow comparison as a real migration step.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(VibeProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

test(sync): compare vibe shadow parity

Vibe is shadow-compared on this branch, so add source-level migration coverage that compares provider observation with ParseVibeSessionWrapper.

The test includes meta.json canonical ID promotion, provider-adjusted fingerprint metadata, usage events, and excluded fallback IDs so reviewers can see the migration preserves the composite source behavior.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesVibeLegacyParser|TestVibeProvider|TestParseVibe|TestClassifyOnePath_Vibe|TestSyncVibe|TestSourceMtimeVibe|TestProcessVibe' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...

test(sync): cover vibe provider usage parity

Roborev job 2711 caught that the Vibe shadow parity fixture compared empty usage slices, so it could not detect regressions in aggregate usage emission.

Seed the fixture with real Vibe metadata fields for active model and nonzero stats, then assert both legacy and provider paths emit usage before comparing them.

Validation: go test -tags "fts5" ./internal/sync -run TestObserveProviderSourceMatchesVibeLegacyParser -count=1; go fmt ./...; go vet ./...; git diff --check

refactor(parser): fold vibe into provider

Move Vibe source discovery, lookup, and parse ownership onto the concrete
vibeProvider and delete the package-level DiscoverVibeSessions,
FindVibeSourceFile, and ParseVibeSessionWrapper free functions. Discovery
and find-source bodies now live as provider-owned helpers
(discoverSessionPaths, findSourceFile) on the vibe source set, the
isVibeMessagesFile guard moves to the provider file, and the messages.jsonl
parser becomes the provider parseVibeResult/parseSession methods.

Make Vibe provider-authoritative and drop its legacy sync dispatch: the
classifyContainerPath classifyVibePath call and method, the processFile case
arm, the processVibe method, and its now-orphaned isSessionBlocked and
isSessionTrashed helpers. vibeEffectiveInfo stays as a shared composite-mtime
helper used by the skip-cache and fingerprint paths.

Because a provider has no database handle, the engine reproduces Vibe's
DB-aware, file-path-scoped bookkeeping in applyProviderFilePathPolicies for
single-session-per-file providers: stale stored IDs at the same source path
are excluded, and a freshly parsed row is suppressed when the user already
removed (trashed or deleted) the session occupying that path, so a canonical
ID flipping between the meta.json session_id and the directory-name fallback
no longer resurrects a hidden session. This is a no-op for stable-ID
providers and skipped for multi-session sources.

Drop the Vibe AgentDef DiscoverFunc/FindSourceFunc hooks, remove it from the
pending shim scan list, replace the shadow-baseline test with provider API
coverage plus a guard that the legacy entrypoints stay gone, and route the
package and engine tests through the provider methods. The obsolete
classifyOnePath Vibe test is removed; the provider's SourcesForChangedPath
coverage replaces it.
Hermes can represent a configured root as either individual transcript files or as a state.db archive that fans out into multiple sessions. Moving it behind a concrete provider makes that source choice explicit instead of leaving archive behavior inside the legacy adapter path.\n\nThe provider preserves transcript discovery and lookup while treating state.db as a multi-session, force-replace source. Its fingerprint covers the archive database plus sibling transcripts so transcript-quality changes can refresh the archive source that ParseHermesArchive reads.

fix(parser): preserve hermes archive event coverage

Hermes archive discovery can normalize a configured sessions directory or direct state.db path into a sibling archive source, but the watch plan and changed-path classifier still assumed the configured root was the only event root. That left state.db updates and removed primary files invisible to provider-path sync.

Normalize archive watch roots, map delete and rename-style events syntactically when primary files are gone, and cover archive-parent, sessions-directory, and direct-state roots. This lets Hermes enter shadow comparison as an actual migration branch.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(HermesProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

fix(parser): watch hermes archive roots syntactically

Hermes archive configs can point at the archive parent, its sessions directory, or the state.db file before the sibling archive components have been created. Watch planning needs to treat those shapes as archive roots from their paths, not from startup-time existence checks, otherwise late-created metadata or transcripts are invisible until a full sync.

The transcript watch root is now retained for archive-shaped roots even when sessions/ is not present yet, while ordinary transcript-only roots keep their recursive file watch.

Validation: go test -tags "fts5" ./internal/parser -run 'TestHermesProvider' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check

fix(parser): feed hermes archive roots to runtime watcher

Hermes provider watch planning now knows how to follow archive-shaped roots, but the actual serve-time watcher still reads registry watch resolvers. Without a matching Hermes resolver there, the default .hermes/sessions config can miss sibling state.db creation or updates in live sync.

Expose Hermes shallow archive-parent watch roots through the registry while keeping transcript roots recursive, and add shadow parity coverage so this branch remains a migration rather than an additive provider implementation.

Validation: go test -tags "fts5" ./cmd/agentsview ./internal/parser ./internal/sync -run 'TestCollectWatchRootsHermesSessionsWatchesStateDBParent|TestHermesProvider|TestParseHermes|TestProviderMigrationModes|TestObserveProviderSourceMatchesHermesLegacyParser' -count=1; go test -tags "fts5" ./cmd/agentsview ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./cmd/agentsview/... ./internal/parser/... ./internal/sync/...; git diff --check

fix(sync): classify hermes archive watcher events

Roborev jobs 2715 and 2716 caught that Hermes archive watch roots were subscribed but the legacy SyncPaths classifier still ignored sibling state.db events. That meant live sync could wait for a periodic full sync even though the watcher saw the change.

Map configured Hermes archive roots, state.db events, and direct archive transcript events back to the state.db source that processHermes already parses, while preserving transcript-only root classification for standalone Hermes session files.

Validation: go test -tags "fts5" ./internal/sync -run TestSyncPathsHermesStateDBEventRefreshesArchive -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -run 'Test(HermesProvider|ObserveProviderSourceMatchesHermesLegacyParser|SyncPathsHermesStateDBEventRefreshesArchive)' -count=1; go fmt ./...; go vet ./...; git diff --check

fix(sync): include hermes transcripts in archive skips

Roborev job 2803 caught that Hermes transcript watcher events could still be suppressed by state.db-only skip metadata after being routed to the archive source. In mixed state-db/transcript archives, state.db can be unchanged while a sibling transcript is new or updated.

Use archive-effective size and mtime for state.db skip checks by folding direct transcript files from the sibling sessions directory into the snapshot, and add a regression where a transcript event refreshes an already-indexed archive.

Validation: go test -tags "fts5" ./internal/sync -run 'TestSyncPathsHermes(ArchiveTranscriptEventRefreshesArchive|StateDBEventRefreshesArchive)' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -run 'Test(HermesProvider|ObserveProviderSourceMatchesHermesLegacyParser|SyncPathsHermes)' -count=1; go fmt ./...; go vet ./...; git diff --check

fix(sync): use aggregate hermes archive fingerprints

Hermes archive freshness needs the state.db sync path to compare the same aggregate fingerprint it persists. Discovering through the public Hermes session lister reselected state.db and missed sibling transcripts, so state.db events could avoid real skip-cache parity.\n\nEnumerate direct transcript files for the archive snapshot and stamp archive parse results with the aggregate state.db fingerprint before writing. This keeps unchanged archive syncs comparable while still refreshing when sibling transcripts change.\n\nValidation: go test -tags "fts5" ./internal/parser ./internal/sync; go vet ./...; make nilaway

fix(sync): apply hermes archive fingerprints consistently

Hermes archive refresh paths need to compare and persist the same aggregate fingerprint for state.db plus sibling transcripts. Otherwise cached parse skips and single-session refreshes can fall back to raw state.db metadata and miss transcript-only archive changes.

Use the aggregate archive file info before generic skip-cache checks and share the archive parse-and-stamp helper between full archive processing and single-session refreshes. The regression coverage now persists the metadata, checks unchanged archive skips, and covers transcript discovery/removal behavior.

Validation: go test -tags "fts5" ./internal/sync -run 'TestHermesArchive|TestProcessFileHermes|TestProcessHermesArchive|TestSyncSingleHermesArchive' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync; go vet ./...; make nilaway

refactor(parser): fold hermes into provider

Move Hermes source discovery, lookup, and parse ownership onto the
concrete hermesProvider and delete the package-level
DiscoverHermesSessions, FindHermesSourceFile, ParseHermesArchive, and
ParseHermesSession free functions. Discovery and find-source bodies now
live as provider-owned helpers (discoverHermesSessions,
findHermesSourceFile); parse, archive parse, the state-db reader, and the
transcript-archive fallback become hermesProvider methods (parseSession,
parseArchive, parseStateDB, parseTranscriptArchive).

Reproduce Hermes archive behavior on the provider. The provider's archive
Parse now stamps every state.db session with the state.db path plus the
aggregate (state.db + direct transcripts) size and mtime, replacing the
engine's stampHermesArchiveResults/hermesArchiveEffectiveInfo so a
transcript-only change still refreshes the archive's stored freshness. The
new provider helpers hermesArchiveEffectiveFileInfo and
hermesArchiveTranscriptFiles mirror the legacy engine aggregation (every
.jsonl and session_*.json directly under the sessions directory, no
dedup). The existing composite archive Fingerprint and archive watch/
classify source-set methods already carried the rest.

Make Hermes provider-authoritative and drop its legacy sync dispatch:
remove classifyHermesPath (and its hermesSyncArchivePaths,
hermesSyncDirExists, hermesSyncTranscriptPath helpers), the processFile
hermesArchiveEffectiveInfo stat hook and case arm, processHermes,
parseHermesArchive, stampHermesArchiveResults, hermesArchiveEffectiveInfo,
hermesArchiveTranscriptFiles, hermesArchiveSourcePaths, and the
syncSingleHermesArchive special-case plus its method. Single-session
resync of an archive now falls through to the generic provider path, which
reparses the whole archive (ForceReplace) the same way a full sync does.

Drop the Hermes AgentDef DiscoverFunc/FindSourceFunc hooks (the
provider-owned WatchRootsFunc/ShallowWatchRootsFunc stay), remove
hermes_provider.go from the pending shim scan list, replace the
shadow-baseline test with provider-API coverage plus a guard that the
legacy entrypoints stay gone, and route the package and engine archive
tests through provider methods and the provider-authoritative processFile/
SyncPaths paths.

Add internal/sync/provider_shadow_support_test.go defining the shared
writeProviderShadowSourceFile test helper that the remaining vibe shadow
test still references, which was orphaned by a predecessor commit.

test(sync): drop unused shadow source-file helper

The hermes fold left writeProviderShadowSourceFile in a dedicated test
support file, but every shadow test writes its fixtures inline, so the
helper has no callers and trips the unused linter. Remove the dead
scaffolding.
Claude has both regular project transcripts and nested subagent transcripts, plus an existing append-only incremental parser. Moving it behind a concrete provider keeps those source shapes and optional incremental capability explicit at the provider boundary.\n\nThe provider preserves recursive project discovery, symlinked project directories, standard and subagent raw-ID lookup, changed-path classification, content hashing, project-name normalization, excluded-session reporting, relationship inference, and incremental append parsing for linear JSONL growth.

fix(parser): preserve claude provider edge events

Claude provider sync must distinguish true append idleness from files that were truncated or replaced, and watcher classification must still identify deleted primary and subagent transcripts after the file is gone. Otherwise provider-path sync can retain stale messages or miss removals.

Return full-parse status for truncated incremental inputs, add missing-path classification for valid Claude source shapes, and make raw subagent lookup follow symlinked project directories like discovery does. This branch now opts Claude into shadow comparison.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(ClaudeProvider|FindClaudeSourceFile|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

fix(sync): replace claude content after file rewrites

Claude incremental parsing is append-oriented, so any fallback caused by truncation or file replacement must replace persisted messages instead of flowing through the append-preserving write path. Otherwise stale higher ordinals or stale tool rows can survive a full parse fallback.

The provider now marks truncated incremental inputs as force-replace, and the legacy engine path carries forceReplace when file identity changes or the file shrinks before falling back to a full parse.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestClaudeProviderParseIncremental|TestIncrementalSync_Claude(FileReplaced|TruncatedFileReplacesStoredMessages|SameSizeFileReplaceUsesFullParse|MidStreamSplitFallsBackToFullParse|AgentIDFallbackUpdatesStoredToolCall)' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check

fix(sync): replace claude same-size rewrites

A same-size rewrite can reach the full-parse fallback when the normal skip check did not skip the file, which means the content changed even though the byte count did not. That fallback must replace persisted rows, or stale higher ordinals and tool rows can survive the parse.

The regression rewrites a Claude file in place to the same byte length with fewer logical messages and verifies the stale assistant row is deleted.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesClaudeLegacyParser|TestClaudeProviderParseIncremental|TestIncrementalSync_Claude(FileReplaced|TruncatedFileReplacesStoredMessages|SameSizeFileReplaceUsesFullParse|SameSizeInPlaceRewriteClearsStaleRows|MidStreamSplitFallsBackToFullParse|AgentIDFallbackUpdatesStoredToolCall)' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check

test(sync): compare claude shadow parity

Claude is shadow-compared on this branch, so add source-level migration coverage that compares provider observation with ParseClaudeSessionWithExclusions.

The fixture exercises the project-directory source shape and verifies session, message, usage, exclusion, and data-version planning parity while preserving provider-computed file hashes.

Validation: go test -tags "fts5" ./internal/sync -run TestObserveProviderSourceMatchesClaudeLegacyParser -count=1

test(sync): cover claude provider usage exclusions

Roborev job 2721 caught that the Claude shadow parity fixture only compared a plain exchange, so it did not prove provider parity for per-message token usage or /usage-only session exclusions.

Add assistant message usage metadata to the normal fixture and a separate /usage-only source discovered by the provider, then assert non-empty token metadata and excluded IDs against the legacy parser.

Validation: go test -tags "fts5" ./internal/sync -run TestObserveProviderSourceMatchesClaudeLegacyParser -count=1; go fmt ./...; go vet ./...; git diff --check

refactor(parser): fold claude into provider

Move Claude source discovery, lookup, full parse, exclusion handling,
and append-only incremental parse ownership onto the concrete
claudeProvider and delete the package-level DiscoverClaudeProjects,
FindClaudeSourceFile, ParseClaudeSessionFrom, and
ParseClaudeSessionWithExclusions free functions. The discover and
find-source bodies stay as provider-neutral helpers
(ClaudeProjectSessionFiles, claudeFindSourceFile) and the parse bodies
become claudeParseWithExclusions and claudeParseSessionFrom; the public
ParseClaudeSession wrapper and the Cowork parser (which reuses the
Claude transcript format) call the shared helper, so no provider file
references a legacy Discover/Find/Parse entrypoint.

Make Claude provider-authoritative and drop its legacy sync dispatch:
the classifyOnePath Claude block, the processFile case arm, and the
processClaude method. Source classification, project resolution, and
exclusion handling are reproduced through the provider's changed-path
and parse paths. The provider's SourcesForChangedPath also reproduces
the legacy "classify despite a transient stat error" behavior so a
changed path under a momentarily unreadable parent is not dropped.

Wire the provider-authoritative engine path to preserve Claude's
DB-aware single-file semantics, which a stateless provider cannot do
alone:
- tryProviderIncrementalAppend drives the provider's ParseIncremental
  through the shared tryIncrementalJSONL bookkeeping (session lookup,
  data-version and inode/device identity guards, ordinal resume,
  cross-sync split detection, cumulative counters, and forceReplace
  fallback), so append-only syncs keep the stored file hash and append
  rows instead of recomputing and rewriting.
- providerSingleSessionFresh reproduces the shouldSkipFile gate so an
  unchanged, already-synced session is skipped instead of re-parsed
  every full sync and a single-session resync does not reapply a
  worktree project mapping to an unchanged file.
- stampProviderFileIdentity stamps inode/device on parsed results so
  the incremental path can later detect an atomic file replacement.
- processProviderFile honors a caller-supplied file.Project as the
  source ProjectHint when no explicit ProviderSource was given, so a
  SyncSingleSession does not revert a user's project override.

The engine's expandClaudeDuplicateCandidates and
dedupeClaudeDiscoveredFiles stay as provider-neutral engine-level dedup
plumbing; expansion now enumerates via ClaudeProjectSessionFiles. The
duplicate-candidate expansion and session-ID dedup/precedence behavior
is unchanged.

Because dropping the Claude DiscoverFunc would otherwise remove Claude
from surfaces that gate on DiscoverFunc != nil, parse-diff (engine and
CLI flag validation) and the SSH remote resolve script now also include
file-based agents that have left legacy-only mode through the provider
facade, restoring Claude (and the other already-folded agents) to those
surfaces.

Drop the Claude AgentDef DiscoverFunc/FindSourceFunc hooks, set its
provider migration mode to ProviderAuthoritative, remove
claude_provider.go from the pending shim scan list, replace the shadow
baseline test with provider-API coverage plus a guard asserting the
four legacy entrypoints stay gone, and re-vehicle the generic
shadow-mechanism caller tests onto the still-legacy Cowork agent since
Claude no longer has a legacy process arm to observe in shadow.

refactor(parser): fold ParseClaudeSession onto the Claude provider

Delete the ParseClaudeSession free function and route its only production
caller (the session upload handler) plus the test suite through the Claude
provider's new ParseUploadedTranscript method, exposed via the
ClaudeUploadParser interface. Uploads live outside any configured root, so
the method parses the staged transcript directly under the caller-supplied
project. That project stays authoritative rather than being overridden by
the transcript's recorded cwd, matching the prior upload behavior and
unlike the discovered-session Parse path.

Unexport ClassifyClaudeSystemMessage to classifyClaudeSystemMessage; it is
a Claude-internal classifier with no callers outside the package. Both
removals clear the last provider-specific legacy parse/classify entrypoints
this branch owned.

fix(sync): skip fresh claude before fingerprinting

The Claude provider migration preserved DB freshness skipping, but only after provider fingerprinting had already hashed the whole transcript. That lost the legacy cheap size/mtime/data-version gate for unchanged files.\n\nRun the single-session freshness check before provider fingerprinting, and pass the computed fingerprint into incremental parsing so truncation detection can distinguish appended files from zero-byte rewrites. Zero-byte truncation now forces a full replacement parse instead of reporting no new data.\n\nValidation: go test -tags "fts5" ./internal/parser -run 'TestClaudeProviderParseIncremental(Truncated|EmptyTruncation)NeedsFullParse' -count=1; go test -tags "fts5" ./internal/sync -run 'TestIncrementalSync_ClaudeAppend|TestProcessFileProviderAuthoritativeSkipsFreshClaudeBeforeFingerprint' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go vet ./...; git diff --check
Cowork stores Claude-shaped transcripts behind local-agent metadata, so the provider boundary needs to preserve that metadata-to-transcript relationship instead of treating the files as plain Claude JSONL sources.

The concrete provider keeps shallow metadata watching, metadata change classification, subagent transcript discovery, raw/full ID lookup, composite mtime freshness, and hash propagation explicit for the sync path.

fix(parser): cover cowork nested watch events

Cowork metadata and transcripts live below org/workspace/session directories, so a shallow root watch could not deliver the paths the provider claimed to classify. Deleted metadata also lost the JSON needed to resolve the transcript, leaving stale provider state after remove or rename events.

Make the watch plan recursive for Cowork source globs, recover deleted metadata from the local session directory shape, cover removed metadata/main/subagent paths, and move Cowork into shadow comparison as its branch-local migration step.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(CoworkProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

fix(parser): reject ambiguous cowork metadata removal

Deleted Cowork metadata can only be recovered from the local session directory shape. If that directory contains multiple main transcripts, choosing the first filesystem match would attach the event to an arbitrary source and leave the real stale source unresolved.

Refuse ambiguous deleted-metadata recovery unless exactly one main transcript is present, and cover the multi-transcript case. The regular single-transcript metadata removal path remains supported.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(CoworkProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

fix(parser): validate cowork deleted metadata candidates

Cowork metadata deletion recovery scans project directories after the metadata file is gone, so it cannot rely on the normal metadata-guided resolution path. It still needs the same transcript validity rules as normal discovery: regular files only, and symlink targets must stay inside the local session directory.

Apply that validation before selecting or counting fallback candidates so symlink escapes are ignored and broken symlinks do not create false ambiguity.

Validation: go test -tags "fts5" ./internal/parser -run 'TestCoworkProvider|TestResolveCoworkSessionRejectsSymlinkEscape|TestClassifyCoworkPath|TestParseCowork' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check

test(sync): compare cowork shadow parity

Cowork is a sidecar-backed Claude transcript provider, so add source-level migration coverage that compares provider observation with ParseCoworkSession.

The fixture includes local-agent metadata plus the nested Claude transcript and verifies session, messages, usage, excluded IDs, and data-version planning parity while preserving provider-computed hashes.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesCoworkLegacyParser|TestCoworkProvider|TestParseCowork|TestClassifyCoworkPath' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check

refactor(parser): fold cowork into provider

Move Cowork source discovery, lookup, parse, and changed-path classification onto the concrete coworkProvider and delete the package-level DiscoverCoworkSessions, FindCoworkSourceFile, ParseCoworkSession, and ClassifyCoworkPath free functions. Discovery and find-source bodies now live as provider-owned helpers (discoverTranscriptPaths, coworkFindSourceFile), parseSession is a receiver method, and the metadata-to-transcript classifier moves onto SourcesForChangedPath as classifyCoworkPath so a sibling local_<uuid>.json change still resolves to the session's main transcript.

Make Cowork provider-authoritative and drop its legacy sync dispatch: the classifyOnePath cowork block, the processFile case arm, and the processCowork method. The sibling-meta composite freshness is preserved on the provider's Fingerprint, which already folds CoworkSessionMtime (the max of transcript and metadata mtime) into the freshness identity so a title-only rename triggers a reparse through processProviderFile. CoworkSessionMtime stays exported and the engine's skip-cache and SourceMtime watcher-fallback blocks keep calling it, mirroring how the commandcode fold retained commandCodeEffectiveInfo.

Replace the legacy free-function tests with provider API coverage plus a guard asserting the four entrypoints stay gone, drop the shadow-baseline comparison test, relocate the shared writeProviderShadowSourceFile helper into provider_shadow_support_test.go, and remove cowork_provider.go from the pending-shim scan list.

test(sync): drop obsolete cowork shadow-legacy tests

Folding cowork into its provider removes its legacy processFile arm, so
the two shadow-compare tests that built fixtures via the deleted
parser.ParseCoworkSession and asserted a legacy result coexisting with
the shadow provider can no longer pass: a non-authoritative cowork file
now falls through to the unknown-agent default. The shadow machinery
keeps coverage through provider_shadow_test.go and the cached-skip
not-comparable case.

fix(sync): skip fresh cowork provider sources

Cowork moved behind the provider-authoritative sync path, but the migrated path still fingerprinted and parsed unchanged transcripts before checking the stored file metadata. That dropped the cheap DB freshness gate the legacy Cowork path relied on and made full syncs rewrite fresh sessions unnecessarily.\n\nRestore that gate for Cowork before provider fingerprinting, using the same transcript size plus CoworkSessionMtime identity stored in the database. Per-file force parses still bypass the gate so metadata-driven refreshes and explicit reparses continue to reach the provider.\n\nValidation: go test -tags "fts5" ./internal/sync -run 'TestProcessFileProviderAuthoritative(SkipsFreshCoworkBeforeFingerprint|ForceParseBypassesFreshCoworkSkip)|TestSyncAllSinceCoworkMetaUpdateTriggersResync|TestSyncPathsCoworkReplacesUpdatedMessageOrdinal' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go vet ./...; git diff --check
…scan list

The Claude provider migration routes parse-diff through the provider path,
which regressed live-write skew detection: a concurrently rewritten source
was classified as Changed instead of Raced, tripping --fail-on-change on a
daemon write. Gate raced-source reliability on parseDiffAgentDiscoverable so
provider-folded agents keep the raced reclassification.

Also clear pendingShimProviderFiles: every provider in this stack is folded
on the branch that introduces it, so no provider file is a standing shim and
the exempt list must be empty.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant