feat(parser): migrate SQLite/container providers#884
Open
mariusvniekerk wants to merge 5 commits into
Open
Conversation
Zed and Shelley both store multiple conversations in a shared SQLite database, so their provider boundary needs to model the physical database source separately from per-session virtual paths. Keeping that source shape explicit makes lookup, watch classification, and force-replace parse behavior available through the shared facade instead of the legacy adapter path. The providers preserve physical DB discovery, WAL/SHM change classification, raw and full ID lookup, virtual source fingerprints, multi-session parse fan-out, per-session Shelley content fingerprints, and parser output normalization. fix(parser): define sqlite provider deletion semantics Shared SQLite providers need to treat deletion events as source-level state, not as unclassifiable paths. Without that, a deleted Zed or Shelley database can disappear before the provider facade reports a complete empty source, leaving future cleanup behavior under-specified. Classifying syntactically valid DB, WAL, and SHM paths even after the main DB is gone preserves the watcher-to-parse path. A missing backing DB now produces a complete force-replace SkipNoSession outcome, and the Zed fingerprint comment records the intentional legacy whole-DB hash tradeoff. fix(parser): align zed shelley capabilities Provider capabilities are used as a contract for what normalized content a provider can actually emit. Shelley currently records token totals from messages but does not emit aggregate usage events, and Zed filters child threads before relationship fields can surface. Keep the declarations conservative so the facade does not advertise unsupported content features while the providers continue to preserve the parser behavior they actually expose today. fix(parser): tighten sqlite sidecar classification Deleted DB sidecar events need to remain classifiable, but basename-only matching is too broad once missing DB parses become complete force-replace outcomes. Unrelated files under the same provider root should not synthesize the canonical shared DB source. Restrict Zed sidecars to the watched threads directory and Shelley sidecars to the provider root, with regression tests for unrelated matching basenames. fix(parser): shadow zed shelley providers The Zed and Shelley branch had concrete providers and passing provider coverage, but still left both agents marked legacy-only. That kept the stack additive and prevented provider changed-path classification from participating while both shapes run. Move both agents into shadow compare on their migration branch so the runtime bridge can exercise the concrete providers before the legacy path is removed later in the stack. Validation: go test -tags "fts5" ./internal/parser -run 'Test(ZedProvider|ShelleyProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/sync -run 'TestSync.*(Zed|Shelley)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go test -tags "fts5" ./internal/sync -count=1; go vet ./...; git diff --check fix(sync): tolerate deleted sqlite provider sources Zed and Shelley providers can classify removed physical database paths while shadow comparison is enabled. Legacy sync still owns writes on this branch, so those forced remove events must not fail at the pre-parse stat step. Treat provider-classified deleted physical SQLite sources as an OK no-result parse. This keeps the archive-preserving legacy behavior while avoiding watcher failures until provider-authoritative deletion semantics are implemented at the stack tip. Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestEngine_(ClassifyPathsProviderRemoveKeepsDeletedSQLiteSources|ProcessFileProviderDeletedSQLiteSourcesDoNotFail)|Test(Zed|Shelley)ProviderClassifiesDeletedPhysicalDB|Test(Zed|Shelley)Provider' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check test(sync): compare zed shelley shadow parity Zed and Shelley are shadow-compared database-backed providers, so provider method tests are not enough to prove the sync migration preserves legacy parser output. Cover physical DB source observation for both providers and compare sessions, messages, force-replace intent, and data-version planning against the legacy DB parsers. Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatches(Zed|Shelley)LegacyParser|Test(Zed|Shelley)Provider' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check test(parser): cover sqlite provider stored hints SQLite fan-out providers need to treat stored virtual source paths differently depending on caller intent. Fresh lookups should reject deleted rows or malformed/stale virtual paths, while non-fresh lookup still needs to preserve the virtual source identity so changed-path cleanup can observe a SkipNoSession tombstone. This closes part of the stored-hint compatibility gap for the Zed/Shelley branch without making provider writes authoritative. The tests document row deletion, invalid virtual paths, stale DB-path hints, and tombstone parse behavior while legacy sync remains the write path. Validation: go test -tags "fts5" ./internal/parser -run 'Test(Zed|Shelley)Provider' -count=1 -v; go test -tags "fts5" ./internal/sync -run 'TestObserveProviderSourceMatches(Zed|Shelley)LegacyParser|TestEngine_ClassifyPathsProviderRemoveKeepsDeletedSQLiteSources|TestEngine_ProcessFileProviderDeletedSQLiteSourcesDoNotFail' -count=1 -v; go test -tags "fts5" ./internal/parser ./internal/sync -count=1 (sync fails only known TestSyncPathsCodexIndexEventRefreshesStoredDuplicate on this branch); go fmt ./...; go vet ./...; manual ./custom-gcl package loop with GOMAXPROCS=1 GOGC=5 GOMEMLIMIT=128MiB; git diff --check Generated with Codex Co-authored-by: Codex <codex@openai.com> refactor(parser): fold zed and shelley providers Move Zed and Shelley source ownership onto their concrete providers and delete the ten package-level legacy entrypoints (DiscoverZedSessions, FindZedSourceFile, ParseZedSQLiteVirtualPath, ParseZedThreadDirect, ParseZedThreadFromDB, DiscoverShelleySessions, FindShelleySourceFile, ParseShelleyConversationDirect, ParseShelleyConversationFromDB, ParseShelleyVirtualPath) plus the now-orphaned FindShelleyDBPath helper. Both agents become provider-authoritative so runtime sync routes through provider discovery, changed-path classification, and processProviderFile instead of the removed processZed/processShelley methods and syncSingleZed. Both providers keep multiple conversations in one shared SQLite database addressed by a "<dbPath>#<id>" virtual path. The fold preserves that shape: - Discovery surfaces the single physical threads.db / shelley.db as one source; Parse fans it out to one session per thread/conversation. - Virtual-path resolution flows through the provider-neutral ParseVirtualSourcePathForBase helper. parseZedVirtualPath restores the legacy IsValidSessionID guard the bespoke parser enforced; parseShelley VirtualPath maps directly onto the shared helper. Every engine call site that split a virtual path now uses the neutral helper too, and the surviving ZedSQLiteSourceMtime / ShelleySourceMtime watchers were repointed at it. - The Zed and Shelley single-conversation direct parses move onto the providers as parseThreadDirect / parseConversationDirect over the unexported parseZedThreadFromDB / parseShelleyConversationFromDB. Because a provider has no database handle, the engine reproduces the per-session skip the legacy fan-out loops performed in dropUnchangedSharedSQLiteResults: the provider re-parses every session on any database change, and the engine drops results whose stored file_mtime (plus the content fingerprint in file_hash for Shelley's second-precision timestamps) and data_version already match, applying the path rewriter so remote stored paths resolve. Force-parse runs keep every result. A forced parse on a deleted shared database now completes as an empty force-replace in processProviderFile so the engine retires the removed sessions instead of failing. ParseDiff synthesizes the Zed/Shelley database source the way it already does for Kiro/OpenCode/Kilo so --agent zed/shelley keeps working without a DiscoverFunc. Tests move from the deleted free functions to provider API coverage, add a guard asserting the legacy entrypoints stay gone, drop both provider files from the pending shim scan list, and remove the shadow comparison test. The shared writeProviderShadowSourceFile helper is rehomed into a dedicated support file so the sync package keeps compiling after the shadow test is deleted. refactor(parser): delete zed legacy whole-database parser ParseZedSessions parsed every top-level thread in a Zed threads.db, but the provider routes per-thread through parseZedThreadFromDB, so the free function survived only as test-exercised dead production code. Delete ParseZedSessions; the retained parse tests reproduce the whole-database walk with the provider's own primitives (ListZedThreadMetas + parseZedThreadFromDB), which share the top-level parent_id filter and ordering, so they exercise the production path without the deleted shim. fix(sync): preserve zed shelley force replaces Zed and Shelley share one physical SQLite database across many virtual session paths. Provider-authoritative sync needs source-level force-replace behavior to retire those virtual sessions when a physical database disappears or when a provider reports a complete empty outcome. SyncSingleSession also marks the discovered source with ForceParse, so unchanged-result filtering must respect the per-file flag instead of only the engine-wide flag. Otherwise a targeted resync can silently keep corrupted stored rows because the shared SQLite fingerprint has not changed. Validation: go test -tags "fts5" ./internal/sync -run 'TestSync(SingleSessionZedForce|PathsZedDeleted|SingleSessionShelleyForce|PathsShelleyDeleted)' -count=1; go test -tags "fts5" ./internal/parser -run 'Test(Zed|Shelley)Provider' -count=1; go test -tags "fts5" ./internal/sync -run 'Test.*(Zed|Shelley).*' -count=1; go vet ./...; git diff --check
Kiro has two source families that were still coupled to the legacy sync adapter: CLI JSONL plus current-store SQLite for Kiro, and old .chat plus workspace-session JSON for Kiro IDE. Moving them behind concrete providers keeps those source shapes explicit at the facade boundary. The Kiro provider preserves current-store fan-out, per-session SQLite virtual lookup, legacy JSONL shadowing, source hashing, changed-path classification, force-replace SQLite parses, per-session source errors, and Kiro IDE old/new session parsing through the existing parsers. fix(parser): prefer kiro sqlite lookup Kiro sessions can migrate from legacy JSONL files into the current-store SQLite database while the persisted row still points at the old source path. Source lookup needs to treat the session ID as authoritative in that case, otherwise explicit resyncs can resolve the shadowed JSONL file and skip the current SQLite session. fix(parser): align kiro provider shadowing The legacy Kiro sync path treated current-store SQLite sessions as globally shadowing legacy JSONL files across all configured roots. The provider needs the same behavior so multi-root setups do not parse the same logical session from both source families. Deleted SQLite DBs and per-session rows also need to fingerprint as tombstones so the provider caller can still reach parse and produce the force-replace SkipNoSession outcome used for archive cleanup. fix(parser): shadow kiro providers The Kiro provider branch had concrete Kiro and Kiro IDE providers, but the migration manifest still held both agents on legacy-only mode. That prevented the provider bridge from exercising their changed-path behavior during the dual-run phase. Move both Kiro agents into shadow compare on their migration branch so the stack remains a runtime migration instead of an additive provider implementation. Validation: go test -tags "fts5" ./internal/parser -run 'Test(KiroProvider|KiroIDEProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/sync -run 'Test.*Kiro' -count=1; go test -tags "fts5" ./internal/parser -count=1; go test -tags "fts5" ./internal/sync -count=1; go vet ./...; git diff --check test(sync): compare kiro family shadow parity Kiro and Kiro IDE are shadow-compared on this branch, so the migration should prove provider observation still matches the legacy parsers that currently feed sync writes. Cover Kiro SQLite database sources and Kiro IDE workspace-session JSON sources through ObserveProviderSource, including force-replace intent and data-version planning. Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatches(KiroSQLite|KiroIDE)LegacyParser|TestKiroProvider|TestKiroIDEProvider' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check test(parser): cover kiro stored source hints Kiro's SQLite provider already supports tombstone parsing for missing database rows and deleted databases, but fresh stored-source lookup still accepted those stale hints. During the dual-run migration, callers use RequireFreshSource to distinguish explicit fresh lookup from changed-path cleanup, so the provider needs to honor that contract before legacy dispatch can be removed. Fresh stored SQLite paths now require the physical DB or virtual row to exist, while non-fresh lookup still preserves source identity for SkipNoSession tombstones. The tests also reject malformed and stale SQLite virtual paths under the Kiro root. Validation: go test -tags "fts5" ./internal/parser -run 'TestKiro' -count=1; go test -tags "fts5" ./internal/sync -run 'TestObserveProviderSourceMatchesKiro(SQLite|IDE)LegacyParser|TestProcessFileProviderAuthoritativeSourceErrorsOnlyForceParse' -count=1 -v; go fmt ./...; go vet ./...; GOMAXPROCS=1 GOGC=5 GOMEMLIMIT=128MiB ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser; git diff --check Generated with Codex Co-authored-by: Codex <codex@openai.com> refactor(parser): fold kiro providers Kiro and Kiro IDE were still dual-running: concrete providers existed but the migration manifest held both on shadow-compare, so the legacy package-level entrypoints and a large legacy sync dispatch still owned writes. Promote both agents to provider-authoritative and delete that legacy surface so the providers are the single source of truth. The eight legacy free functions are removed: DiscoverKiroSessions, FindKiroSourceFile, ParseKiroSession, FindKiroSQLiteDBPath, ParseKiroSQLiteVirtualPath, ParseKiroSQLiteSession (Kiro) and FindKiroIDESourceFile, ParseKiroIDESession (Kiro IDE). Discovery, legacy-JSONL source lookup, and both parse paths move onto the concrete providers; the orphaned DiscoverKiroIDESessions helper goes with them. SQLite virtual-path handling is preserved through the provider-neutral resolver. The Kiro provider continues to give each conversation row a stable identity via KiroSQLiteVirtualPath/VirtualSourcePath and resolves a "<db>#<sessionID>" path back through ParseVirtualSourcePathForBase (now via the unexported kiroSQLiteVirtualPathParts in the parser and a sync-package equivalent). Current-store fan-out, per-session virtual lookup, cross-root legacy shadowing, source hashing, force-replace SQLite parses, and per-session source errors all keep their existing behavior. The engine loses its kiro legacy dispatch: the bulk syncKiroSQLite phase, classifyKiroSQLitePath plus the legacy-JSONL classifyOnePath block, the processKiro/processKiroIDE arms and methods, syncSingleKiroSQLite, and the now-redundant per-session count/shadow helpers. Provider discovery now emits the data.sqlite3 source and processProviderFile fans it out, so the DB is counted once via normal file sync instead of the separate DB-backed accounting. The cross-root legacy shadow filter stays in the engine because a scoped sync configures the provider with only the in-scope roots and cannot otherwise see a current-store DB in an out-of-scope root. providerChangedPathEventKind now resolves a virtual source path to its physical container before the existence check so a per-session SQLite resync via SyncPaths is treated as a write rather than a phantom remove. Parse-diff discovers kiro through the provider facade (parseDiffProviderDiscover) now that it carries no DiscoverFunc hook. The two shadow-baseline assertions that encoded the old bulk-sync idempotency (a no-op resync counting zero) are updated for the authoritative model, where the database is rediscovered and re-parsed every full sync; archive preservation on a malformed update is unchanged. The shadow parity test is replaced with provider-API coverage, a parser guard asserts the legacy entrypoints stay gone, and both provider files leave the pending-shim scan list. fix(parser): thread ctx through kiro_ide source lookups
Move Antigravity IDE and CLI source discovery, lookup, and parse ownership onto concrete antigravityProvider and antigravityCLIProvider types, deleting the package-level legacy free functions and their legacy sync dispatch. Both agents become provider-authoritative. Sidecar and freshness semantics are preserved through the providers' SourcesForChangedPath fan-out and composite fingerprints rather than engine-level classifiers: the IDE provider maps annotations and brain artifacts back to the conversation DB, and the CLI provider maps history, brain, trajectory, and db/pb-precedence sidecars to every affected source. Drop the obsolete engine-level TestClassifyOnePath_AntigravityCLI, which exercised the removed classifyOnePath antigravity arm. The antigravity provider unit tests cover the per-path sidecar-to-source mappings and the engine integration tests cover the engine-to-provider routing, so the test asserted removed behavior without adding coverage. fix(parser): preserve antigravity history invalidation Antigravity CLI history changes are watched and classified through fresh provider instances, so provider-local history snapshots cannot reliably detect rows that were removed or retagged. Treat history.jsonl writes conservatively and fan out to all current CLI sources, which preserves stale-metadata cleanup at the cost of a broader reparse on history-only updates. The file watcher now consumes provider watch plans for agents that only had plain WatchSubdirs wiring, so provider-owned roots such as Antigravity CLI's history.jsonl parent are observed by the real watcher setup while bespoke legacy watch-root functions keep their existing behavior. Validation: go test -tags fts5 ./internal/parser -count=1; go test -tags fts5 ./internal/sync -run 'Test.*AntigravityCLI|TestProcessAntigravity|TestSyncPathsAntigravity' -count=1; go test -tags fts5 ./cmd/agentsview -run TestCollectWatchRoots -count=1; go vet ./...; git diff --check
Move Forge, Piebald, and Warp DB discovery and per-session parse ownership onto their shared db-backed provider implementation, deleting the package-level legacy entrypoints and per-agent engine sync dispatch. The three become provider-authoritative. Full-sync change detection runs through syncProviderDBBackedAgent, which enumerates provider sources and skips those whose fingerprint mtime matches the stored data-version mtime, so a repeat sync of unchanged data stays a no-op. FindSourceFile, SourceMtime, and SyncSingleSession route these agents through the provider facade, preserving Piebald's chat-source-resolves-fork semantics including rejection of unknown forks. Assert the provider-authoritative skip in the Piebald process test: an unchanged chat skips on its per-chat updated_at fingerprint, matching the legacy piebaldPendingSessionIDs skip and the Forge sibling. The prior test asserted the opposite, a stale shadow-compare expectation that reparsed an untouched session on every full sync. refactor(parser): delete db-backed legacy whole-database parsers The db-backed provider migration left the exported whole-database and single-session parse free functions (ParseForgeDB, ParsePiebaldDB, ParsePiebaldSession, ParseWarpDB) in place: the provider routes through the lowercase per-session helpers (parseForgeSession, parsePiebaldSessionResults, parseWarpSession), so these survived only as dead production code kept alive by their own tests. ParsePiebaldDB had no references at all. Delete the four functions and the now-orphaned chain (loadForgeConversations, loadWarpConversations, loadPiebaldChats, and the ForgeSession/WarpSession/PiebaldSession bundle types). The retained parse tests now drive the provider facade (Discover + Fingerprint + Parse) via a shared parseDBBackedAll helper instead of the deleted free functions, so they exercise the production path. Extend the db-backed deletion guard to assert the four names stay gone. fix(parser): honor sqlite fanout watch roots SQLite fanout providers can publish a filesystem watch root that differs from the configured source root when FindDB resolves a canonical database under a subdirectory. Changed-path classification still compared WatchRoot to the configured root, so real DB/WAL/SHM events from the planned watch root produced no sources.\n\nAccept the emitted canonical DB directory as the matching watch root while keeping the configured-root compatibility path, and cover the FindDB subdirectory case with a WatchPlan-driven WAL event regression. The commit also removes two unused Codex fixture restats that blocked the existing staticcheck hook.\n\nValidation: go test -tags "fts5" ./internal/parser -run 'TestSQLiteFanoutSourceSet|TestDBBacked' -count=1 -v; go test -tags "fts5" ./internal/parser -count=1; go test -tags "fts5" ./internal/sync -run 'Test.*Codex' -count=1; go fmt ./...; go vet ./... style(docs): mdformat provider facade design spec
A Kiro SQLite store is discovered as one container source but fans out into one session per row, so the file tally counted it once. Add the extra sessions it produced to keep TotalSessions a session count, matching the per-session tally the legacy syncKiroSQLite phase reported.
Collaborator
Author
|
This change is part of the following stack:
Change managed by git-spice. |
This was referenced Jun 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Migrates the container and DB-backed providers — zed, shelley, kiro, antigravity, and the db-backed family — onto facade providers.
Includes the Kiro SQLite fan-out fix so each row a store produces counts toward
TotalSessions, matching the per-session tally the legacy sync phase reported.