Skip to content

feat(parser): migrate SQLite/container providers#884

Open
mariusvniekerk wants to merge 5 commits into
fam/codex-editorsfrom
fam/sqlite-containers
Open

feat(parser): migrate SQLite/container providers#884
mariusvniekerk wants to merge 5 commits into
fam/codex-editorsfrom
fam/sqlite-containers

Conversation

@mariusvniekerk

@mariusvniekerk mariusvniekerk commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Migrates the container and DB-backed providers — zed, shelley, kiro, antigravity, and the db-backed family — onto facade providers.

Includes the Kiro SQLite fan-out fix so each row a store produces counts toward TotalSessions, matching the per-session tally the legacy sync phase reported.

Zed and Shelley both store multiple conversations in a shared SQLite database, so their provider boundary needs to model the physical database source separately from per-session virtual paths. Keeping that source shape explicit makes lookup, watch classification, and force-replace parse behavior available through the shared facade instead of the legacy adapter path.

The providers preserve physical DB discovery, WAL/SHM change classification, raw and full ID lookup, virtual source fingerprints, multi-session parse fan-out, per-session Shelley content fingerprints, and parser output normalization.

fix(parser): define sqlite provider deletion semantics

Shared SQLite providers need to treat deletion events as source-level state, not as unclassifiable paths. Without that, a deleted Zed or Shelley database can disappear before the provider facade reports a complete empty source, leaving future cleanup behavior under-specified.

Classifying syntactically valid DB, WAL, and SHM paths even after the main DB is gone preserves the watcher-to-parse path. A missing backing DB now produces a complete force-replace SkipNoSession outcome, and the Zed fingerprint comment records the intentional legacy whole-DB hash tradeoff.

fix(parser): align zed shelley capabilities

Provider capabilities are used as a contract for what normalized content a provider can actually emit. Shelley currently records token totals from messages but does not emit aggregate usage events, and Zed filters child threads before relationship fields can surface.

Keep the declarations conservative so the facade does not advertise unsupported content features while the providers continue to preserve the parser behavior they actually expose today.

fix(parser): tighten sqlite sidecar classification

Deleted DB sidecar events need to remain classifiable, but basename-only matching is too broad once missing DB parses become complete force-replace outcomes. Unrelated files under the same provider root should not synthesize the canonical shared DB source.

Restrict Zed sidecars to the watched threads directory and Shelley sidecars to the provider root, with regression tests for unrelated matching basenames.

fix(parser): shadow zed shelley providers

The Zed and Shelley branch had concrete providers and passing provider coverage, but still left both agents marked legacy-only. That kept the stack additive and prevented provider changed-path classification from participating while both shapes run.

Move both agents into shadow compare on their migration branch so the runtime bridge can exercise the concrete providers before the legacy path is removed later in the stack.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(ZedProvider|ShelleyProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/sync -run 'TestSync.*(Zed|Shelley)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go test -tags "fts5" ./internal/sync -count=1; go vet ./...; git diff --check

fix(sync): tolerate deleted sqlite provider sources

Zed and Shelley providers can classify removed physical database paths while shadow comparison is enabled. Legacy sync still owns writes on this branch, so those forced remove events must not fail at the pre-parse stat step.

Treat provider-classified deleted physical SQLite sources as an OK no-result parse. This keeps the archive-preserving legacy behavior while avoiding watcher failures until provider-authoritative deletion semantics are implemented at the stack tip.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestEngine_(ClassifyPathsProviderRemoveKeepsDeletedSQLiteSources|ProcessFileProviderDeletedSQLiteSourcesDoNotFail)|Test(Zed|Shelley)ProviderClassifiesDeletedPhysicalDB|Test(Zed|Shelley)Provider' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check

test(sync): compare zed shelley shadow parity

Zed and Shelley are shadow-compared database-backed providers, so provider method tests are not enough to prove the sync migration preserves legacy parser output.

Cover physical DB source observation for both providers and compare sessions, messages, force-replace intent, and data-version planning against the legacy DB parsers.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatches(Zed|Shelley)LegacyParser|Test(Zed|Shelley)Provider' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check

test(parser): cover sqlite provider stored hints

SQLite fan-out providers need to treat stored virtual source paths differently depending on caller intent. Fresh lookups should reject deleted rows or malformed/stale virtual paths, while non-fresh lookup still needs to preserve the virtual source identity so changed-path cleanup can observe a SkipNoSession tombstone.

This closes part of the stored-hint compatibility gap for the Zed/Shelley branch without making provider writes authoritative. The tests document row deletion, invalid virtual paths, stale DB-path hints, and tombstone parse behavior while legacy sync remains the write path.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(Zed|Shelley)Provider' -count=1 -v; go test -tags "fts5" ./internal/sync -run 'TestObserveProviderSourceMatches(Zed|Shelley)LegacyParser|TestEngine_ClassifyPathsProviderRemoveKeepsDeletedSQLiteSources|TestEngine_ProcessFileProviderDeletedSQLiteSourcesDoNotFail' -count=1 -v; go test -tags "fts5" ./internal/parser ./internal/sync -count=1 (sync fails only known TestSyncPathsCodexIndexEventRefreshesStoredDuplicate on this branch); go fmt ./...; go vet ./...; manual ./custom-gcl package loop with GOMAXPROCS=1 GOGC=5 GOMEMLIMIT=128MiB; git diff --check

Generated with Codex
Co-authored-by: Codex <codex@openai.com>

refactor(parser): fold zed and shelley providers

Move Zed and Shelley source ownership onto their concrete providers and
delete the ten package-level legacy entrypoints
(DiscoverZedSessions, FindZedSourceFile, ParseZedSQLiteVirtualPath,
ParseZedThreadDirect, ParseZedThreadFromDB, DiscoverShelleySessions,
FindShelleySourceFile, ParseShelleyConversationDirect,
ParseShelleyConversationFromDB, ParseShelleyVirtualPath) plus the
now-orphaned FindShelleyDBPath helper. Both agents become
provider-authoritative so runtime sync routes through provider
discovery, changed-path classification, and processProviderFile instead
of the removed processZed/processShelley methods and syncSingleZed.

Both providers keep multiple conversations in one shared SQLite
database addressed by a "<dbPath>#<id>" virtual path. The fold preserves
that shape:

- Discovery surfaces the single physical threads.db / shelley.db as one
  source; Parse fans it out to one session per thread/conversation.
- Virtual-path resolution flows through the provider-neutral
  ParseVirtualSourcePathForBase helper. parseZedVirtualPath restores the
  legacy IsValidSessionID guard the bespoke parser enforced; parseShelley
  VirtualPath maps directly onto the shared helper. Every engine call
  site that split a virtual path now uses the neutral helper too, and the
  surviving ZedSQLiteSourceMtime / ShelleySourceMtime watchers were
  repointed at it.
- The Zed and Shelley single-conversation direct parses move onto the
  providers as parseThreadDirect / parseConversationDirect over the
  unexported parseZedThreadFromDB / parseShelleyConversationFromDB.

Because a provider has no database handle, the engine reproduces the
per-session skip the legacy fan-out loops performed in
dropUnchangedSharedSQLiteResults: the provider re-parses every session on
any database change, and the engine drops results whose stored file_mtime
(plus the content fingerprint in file_hash for Shelley's second-precision
timestamps) and data_version already match, applying the path rewriter so
remote stored paths resolve. Force-parse runs keep every result. A forced
parse on a deleted shared database now completes as an empty force-replace
in processProviderFile so the engine retires the removed sessions instead
of failing. ParseDiff synthesizes the Zed/Shelley database source the way
it already does for Kiro/OpenCode/Kilo so --agent zed/shelley keeps
working without a DiscoverFunc.

Tests move from the deleted free functions to provider API coverage, add
a guard asserting the legacy entrypoints stay gone, drop both provider
files from the pending shim scan list, and remove the shadow comparison
test. The shared writeProviderShadowSourceFile helper is rehomed into a
dedicated support file so the sync package keeps compiling after the
shadow test is deleted.

refactor(parser): delete zed legacy whole-database parser

ParseZedSessions parsed every top-level thread in a Zed threads.db, but
the provider routes per-thread through parseZedThreadFromDB, so the
free function survived only as test-exercised dead production code.

Delete ParseZedSessions; the retained parse tests reproduce the
whole-database walk with the provider's own primitives (ListZedThreadMetas
+ parseZedThreadFromDB), which share the top-level parent_id filter and
ordering, so they exercise the production path without the deleted shim.

fix(sync): preserve zed shelley force replaces

Zed and Shelley share one physical SQLite database across many virtual session paths. Provider-authoritative sync needs source-level force-replace behavior to retire those virtual sessions when a physical database disappears or when a provider reports a complete empty outcome.

SyncSingleSession also marks the discovered source with ForceParse, so unchanged-result filtering must respect the per-file flag instead of only the engine-wide flag. Otherwise a targeted resync can silently keep corrupted stored rows because the shared SQLite fingerprint has not changed.

Validation: go test -tags "fts5" ./internal/sync -run 'TestSync(SingleSessionZedForce|PathsZedDeleted|SingleSessionShelleyForce|PathsShelleyDeleted)' -count=1; go test -tags "fts5" ./internal/parser -run 'Test(Zed|Shelley)Provider' -count=1; go test -tags "fts5" ./internal/sync -run 'Test.*(Zed|Shelley).*' -count=1; go vet ./...; git diff --check
Kiro has two source families that were still coupled to the legacy sync adapter: CLI JSONL plus current-store SQLite for Kiro, and old .chat plus workspace-session JSON for Kiro IDE. Moving them behind concrete providers keeps those source shapes explicit at the facade boundary.

The Kiro provider preserves current-store fan-out, per-session SQLite virtual lookup, legacy JSONL shadowing, source hashing, changed-path classification, force-replace SQLite parses, per-session source errors, and Kiro IDE old/new session parsing through the existing parsers.

fix(parser): prefer kiro sqlite lookup

Kiro sessions can migrate from legacy JSONL files into the current-store SQLite database while the persisted row still points at the old source path. Source lookup needs to treat the session ID as authoritative in that case, otherwise explicit resyncs can resolve the shadowed JSONL file and skip the current SQLite session.

fix(parser): align kiro provider shadowing

The legacy Kiro sync path treated current-store SQLite sessions as globally shadowing legacy JSONL files across all configured roots. The provider needs the same behavior so multi-root setups do not parse the same logical session from both source families.

Deleted SQLite DBs and per-session rows also need to fingerprint as tombstones so the provider caller can still reach parse and produce the force-replace SkipNoSession outcome used for archive cleanup.

fix(parser): shadow kiro providers

The Kiro provider branch had concrete Kiro and Kiro IDE providers, but the migration manifest still held both agents on legacy-only mode. That prevented the provider bridge from exercising their changed-path behavior during the dual-run phase.

Move both Kiro agents into shadow compare on their migration branch so the stack remains a runtime migration instead of an additive provider implementation.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(KiroProvider|KiroIDEProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/sync -run 'Test.*Kiro' -count=1; go test -tags "fts5" ./internal/parser -count=1; go test -tags "fts5" ./internal/sync -count=1; go vet ./...; git diff --check

test(sync): compare kiro family shadow parity

Kiro and Kiro IDE are shadow-compared on this branch, so the migration should prove provider observation still matches the legacy parsers that currently feed sync writes.

Cover Kiro SQLite database sources and Kiro IDE workspace-session JSON sources through ObserveProviderSource, including force-replace intent and data-version planning.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatches(KiroSQLite|KiroIDE)LegacyParser|TestKiroProvider|TestKiroIDEProvider' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check

test(parser): cover kiro stored source hints

Kiro's SQLite provider already supports tombstone parsing for missing database rows and deleted databases, but fresh stored-source lookup still accepted those stale hints. During the dual-run migration, callers use RequireFreshSource to distinguish explicit fresh lookup from changed-path cleanup, so the provider needs to honor that contract before legacy dispatch can be removed.

Fresh stored SQLite paths now require the physical DB or virtual row to exist, while non-fresh lookup still preserves source identity for SkipNoSession tombstones. The tests also reject malformed and stale SQLite virtual paths under the Kiro root.

Validation: go test -tags "fts5" ./internal/parser -run 'TestKiro' -count=1; go test -tags "fts5" ./internal/sync -run 'TestObserveProviderSourceMatchesKiro(SQLite|IDE)LegacyParser|TestProcessFileProviderAuthoritativeSourceErrorsOnlyForceParse' -count=1 -v; go fmt ./...; go vet ./...; GOMAXPROCS=1 GOGC=5 GOMEMLIMIT=128MiB ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser; git diff --check

Generated with Codex
Co-authored-by: Codex <codex@openai.com>

refactor(parser): fold kiro providers

Kiro and Kiro IDE were still dual-running: concrete providers existed but the migration manifest held both on shadow-compare, so the legacy package-level entrypoints and a large legacy sync dispatch still owned writes. Promote both agents to provider-authoritative and delete that legacy surface so the providers are the single source of truth.

The eight legacy free functions are removed: DiscoverKiroSessions, FindKiroSourceFile, ParseKiroSession, FindKiroSQLiteDBPath, ParseKiroSQLiteVirtualPath, ParseKiroSQLiteSession (Kiro) and FindKiroIDESourceFile, ParseKiroIDESession (Kiro IDE). Discovery, legacy-JSONL source lookup, and both parse paths move onto the concrete providers; the orphaned DiscoverKiroIDESessions helper goes with them.

SQLite virtual-path handling is preserved through the provider-neutral resolver. The Kiro provider continues to give each conversation row a stable identity via KiroSQLiteVirtualPath/VirtualSourcePath and resolves a "<db>#<sessionID>" path back through ParseVirtualSourcePathForBase (now via the unexported kiroSQLiteVirtualPathParts in the parser and a sync-package equivalent). Current-store fan-out, per-session virtual lookup, cross-root legacy shadowing, source hashing, force-replace SQLite parses, and per-session source errors all keep their existing behavior.

The engine loses its kiro legacy dispatch: the bulk syncKiroSQLite phase, classifyKiroSQLitePath plus the legacy-JSONL classifyOnePath block, the processKiro/processKiroIDE arms and methods, syncSingleKiroSQLite, and the now-redundant per-session count/shadow helpers. Provider discovery now emits the data.sqlite3 source and processProviderFile fans it out, so the DB is counted once via normal file sync instead of the separate DB-backed accounting. The cross-root legacy shadow filter stays in the engine because a scoped sync configures the provider with only the in-scope roots and cannot otherwise see a current-store DB in an out-of-scope root.

providerChangedPathEventKind now resolves a virtual source path to its physical container before the existence check so a per-session SQLite resync via SyncPaths is treated as a write rather than a phantom remove. Parse-diff discovers kiro through the provider facade (parseDiffProviderDiscover) now that it carries no DiscoverFunc hook.

The two shadow-baseline assertions that encoded the old bulk-sync idempotency (a no-op resync counting zero) are updated for the authoritative model, where the database is rediscovered and re-parsed every full sync; archive preservation on a malformed update is unchanged. The shadow parity test is replaced with provider-API coverage, a parser guard asserts the legacy entrypoints stay gone, and both provider files leave the pending-shim scan list.

fix(parser): thread ctx through kiro_ide source lookups
Move Antigravity IDE and CLI source discovery, lookup, and parse
ownership onto concrete antigravityProvider and antigravityCLIProvider
types, deleting the package-level legacy free functions and their legacy
sync dispatch. Both agents become provider-authoritative.

Sidecar and freshness semantics are preserved through the providers'
SourcesForChangedPath fan-out and composite fingerprints rather than
engine-level classifiers: the IDE provider maps annotations and brain
artifacts back to the conversation DB, and the CLI provider maps history,
brain, trajectory, and db/pb-precedence sidecars to every affected source.

Drop the obsolete engine-level TestClassifyOnePath_AntigravityCLI, which
exercised the removed classifyOnePath antigravity arm. The antigravity
provider unit tests cover the per-path sidecar-to-source mappings and the
engine integration tests cover the engine-to-provider routing, so the
test asserted removed behavior without adding coverage.

fix(parser): preserve antigravity history invalidation

Antigravity CLI history changes are watched and classified through fresh provider instances, so provider-local history snapshots cannot reliably detect rows that were removed or retagged. Treat history.jsonl writes conservatively and fan out to all current CLI sources, which preserves stale-metadata cleanup at the cost of a broader reparse on history-only updates.

The file watcher now consumes provider watch plans for agents that only had plain WatchSubdirs wiring, so provider-owned roots such as Antigravity CLI's history.jsonl parent are observed by the real watcher setup while bespoke legacy watch-root functions keep their existing behavior.

Validation: go test -tags fts5 ./internal/parser -count=1; go test -tags fts5 ./internal/sync -run 'Test.*AntigravityCLI|TestProcessAntigravity|TestSyncPathsAntigravity' -count=1; go test -tags fts5 ./cmd/agentsview -run TestCollectWatchRoots -count=1; go vet ./...; git diff --check
Move Forge, Piebald, and Warp DB discovery and per-session parse ownership
onto their shared db-backed provider implementation, deleting the
package-level legacy entrypoints and per-agent engine sync dispatch. The
three become provider-authoritative.

Full-sync change detection runs through syncProviderDBBackedAgent, which
enumerates provider sources and skips those whose fingerprint mtime
matches the stored data-version mtime, so a repeat sync of unchanged data
stays a no-op. FindSourceFile, SourceMtime, and SyncSingleSession route
these agents through the provider facade, preserving Piebald's
chat-source-resolves-fork semantics including rejection of unknown forks.

Assert the provider-authoritative skip in the Piebald process test: an
unchanged chat skips on its per-chat updated_at fingerprint, matching the
legacy piebaldPendingSessionIDs skip and the Forge sibling. The prior test
asserted the opposite, a stale shadow-compare expectation that reparsed an
untouched session on every full sync.

refactor(parser): delete db-backed legacy whole-database parsers

The db-backed provider migration left the exported whole-database and
single-session parse free functions (ParseForgeDB, ParsePiebaldDB,
ParsePiebaldSession, ParseWarpDB) in place: the provider routes through
the lowercase per-session helpers (parseForgeSession,
parsePiebaldSessionResults, parseWarpSession), so these survived only as
dead production code kept alive by their own tests. ParsePiebaldDB had no
references at all.

Delete the four functions and the now-orphaned chain
(loadForgeConversations, loadWarpConversations, loadPiebaldChats, and the
ForgeSession/WarpSession/PiebaldSession bundle types). The retained parse
tests now drive the provider facade (Discover + Fingerprint + Parse) via a
shared parseDBBackedAll helper instead of the deleted free functions, so
they exercise the production path. Extend the db-backed deletion guard to
assert the four names stay gone.

fix(parser): honor sqlite fanout watch roots

SQLite fanout providers can publish a filesystem watch root that differs from the configured source root when FindDB resolves a canonical database under a subdirectory. Changed-path classification still compared WatchRoot to the configured root, so real DB/WAL/SHM events from the planned watch root produced no sources.\n\nAccept the emitted canonical DB directory as the matching watch root while keeping the configured-root compatibility path, and cover the FindDB subdirectory case with a WatchPlan-driven WAL event regression. The commit also removes two unused Codex fixture restats that blocked the existing staticcheck hook.\n\nValidation: go test -tags "fts5" ./internal/parser -run 'TestSQLiteFanoutSourceSet|TestDBBacked' -count=1 -v; go test -tags "fts5" ./internal/parser -count=1; go test -tags "fts5" ./internal/sync -run 'Test.*Codex' -count=1; go fmt ./...; go vet ./...

style(docs): mdformat provider facade design spec
A Kiro SQLite store is discovered as one container source but fans out into
one session per row, so the file tally counted it once. Add the extra
sessions it produced to keep TotalSessions a session count, matching the
per-session tally the legacy syncKiroSQLite phase reported.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant