Skip to content

feat(parser): add directory JSONL source helper#756

Closed
mariusvniekerk wants to merge 4 commits into
provider-jsonl-source-setfrom
provider-directory-jsonl-source-set
Closed

feat(parser): add directory JSONL source helper#756
mariusvniekerk wants to merge 4 commits into
provider-jsonl-source-setfrom
provider-directory-jsonl-source-set

Conversation

@mariusvniekerk

Copy link
Copy Markdown
Collaborator

Several remaining parser migrations share the same one-project-directory JSONL layout. Keeping that source shape in a thin helper lets provider implementations focus on parse behavior while still explicitly forwarding the source methods they support.

The helper embeds JSONLSourceSet and only adds the directory-shape constraint plus default project hints, so it avoids introducing a second provider framework.

@roborev-ci

roborev-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

roborev: Combined Review (7184a46)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 1m46s), codex_security (codex/security, done, 13s) | Total: 1m59s

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (55cfa0e)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (claude-code/default, done, 1m59s), codex_security (claude-code/security, done, 1m0s) | Total: 2m59s

@mariusvniekerk mariusvniekerk force-pushed the provider-jsonl-source-set branch from fa9cef2 to 611917e Compare June 21, 2026 00:40
@mariusvniekerk mariusvniekerk force-pushed the provider-directory-jsonl-source-set branch from 55cfa0e to 5f1b70c Compare June 21, 2026 00:40
@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (5f1b70c)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 2m15s), codex_security (codex/security, done, 34s) | Total: 2m49s

@mariusvniekerk

mariusvniekerk commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

This change is part of the following stack:

Change managed by git-spice.

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (23d383e)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 2m4s), codex_security (codex/security, done, 42s) | Total: 2m46s

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (5ef4189)

No Medium, High, or Critical findings to report.

The only reported issue was Low severity and is omitted per instructions.


Panel: ci_default_security | Synthesis: codex, 4s | Members: codex_default (codex/default, done, 6m16s), codex_security (codex/security, done, 1m7s) | Total: 7m27s

@mariusvniekerk mariusvniekerk force-pushed the provider-jsonl-source-set branch from 611917e to 216e363 Compare June 23, 2026 23:55
@mariusvniekerk mariusvniekerk force-pushed the provider-directory-jsonl-source-set branch from 5ef4189 to a302cd4 Compare June 23, 2026 23:55
@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (a302cd4)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 4m39s), codex_security (codex/security, done, 14s) | Total: 4m53s

Parser integration has grown across registry callbacks, sync-engine switches, and provider-specific source handling. This design captures a shared provider facade so future parser work has one contract for discovery, fingerprinting, parsing, capabilities, and source lookup while preserving current normalized output types and sync semantics.

It also records the decision to keep source shape provider-owned, provide reusable JSONL source helpers, migrate every existing provider, and use enumer-generated capability support values for readable JSON reporting.

docs(parser): clarify provider embedding pattern

The provider facade design was intended to use embedded defaults, but the written spec did not show that pattern concretely enough. Add explicit provider examples so new provider implementations start from ProviderBase and embed or delegate source helper types instead of treating the facade as a loose collection of functions.

docs(parser): harden provider facade contract

The provider facade spec needed sharper boundaries before implementation: providers must be root-bound instances, source lookup cannot assume persisted paths are real files, and parse outcomes need to preserve retry eligibility without corrupting data-version state.

This keeps ProviderBase as the single embedded no-op surface while making reusable source discovery plain composition with explicit forwarding, so new provider work has a predictable contract without a hidden hook layer.

Validation: rg stale contract terms; git diff --check.

docs(parser): clarify provider facade edge cases

The provider facade contract still had ambiguity around mixed multi-session outcomes, fresh source lookup, changed-path filtering, and migration staging. Those gaps would let different provider migrations handle retries, diagnostics, and lookup callers differently.

This follow-up makes data-version state per parsed result, defines fresh-source resolution without assuming filesystem paths, gives fingerprint performance criteria, and splits caller migration into reviewable stages.

Validation: rg stale contract terms; git diff --check. Go tests not run because this is docs-only.

docs(parser): require provider config snapshots

Provider instances are meant to be root-bound and immutable after construction. The examples now show ProviderConfig cloning so helper roots and ProviderBase.Config cannot share a mutable caller slice.

Validation: rg stale contract terms; git diff --check. Go tests not run because this is docs-only.

docs(parser): define partial source failure semantics

Multi-session providers need an unambiguous rule for retryable session failures. The spec now requires SourceError.SessionID for per-session errors, routes unknown-scope failures to whole-source errors, and preserves absent rows during partial parses unless the provider explicitly excludes or cleanly replaces them.

Validation: rg stale contract terms; git diff --check. Go tests not run because this is docs-only.

docs(parser): tighten provider retry cache semantics

Provider results now have per-session retry state, but skip-cache persistence remains source-scoped. The spec now makes that aggregate rule explicit and also requires independent root-slice ownership between ProviderBase config and source helpers.

Validation: rg stale contract terms; git diff --check. Go tests not run because this is docs-only.

docs(parser): require complete results for clean source skips

Per-result retry state does not make skip-cache entries per-result. The spec now requires providers to declare a complete result set before the engine can persist a clean source/fingerprint skip, and keeps failures as diagnostic or failure-cache state instead of clean source state.

Validation: rg stale contract terms; git diff --check. Go tests not run because this is docs-only.

docs(parser): define provider contract edge cases

Provider migration depends on source references being fingerprintable, outcome IDs being comparable to persisted rows, and incremental parsing having unambiguous fallback semantics. Without those rules, a new provider could satisfy the facade shape while diverging in skip-cache, retry, or caller migration behavior.

This also records provider concurrency and SourceRef lifetime requirements so helpers can stay plain data holders while engine callers can safely share one provider instance.

docs(parser): require dual-run provider migration

Provider branches need to prove migration, not just add parallel provider implementations. Documenting the root dual-run harness, manifest opt-in, and stack-tip-only legacy removal gives each PR an obvious review surface while preserving legacy sync as the writer during the stack.

Validation: git diff --check; spec self-review for placeholders and stale defaulting language; mdformat hook formatting.
Introduce the Provider interface, ProviderBase/ProviderFactory, and source-set
helpers; own provider discovery and lookup at the root; and add the
legacy-call shim scan that gates provider files.

fix(parser): cover Aider, OMP, Reasonix in migration manifest

These agents live in the registry but were absent from the provider
migration manifest, so ValidateProviderMigrationModes failed once the
registry began enforcing that every agent has a mode. They remain on the
legacy path here; later stack commits migrate them to concrete providers
and flip these entries to provider-authoritative.

fix(sync): keep shadow provider discovery observational

Shadow provider mode must not add provider-only work or satisfy source lookups that the legacy runtime would miss. Otherwise a migration comparison can change live sync behavior before the provider becomes authoritative.\n\nProvider-authoritative discovery now reports discovery failures as sync failures and suppresses the provider completion watermark for that run, preserving the next incremental pass. The shim scan also keeps pending exemptions honest by failing stale entries while ignoring provider-owned selector methods.\n\nValidation: go test -tags "fts5" ./internal/parser -run 'TestProviderFilesDoNotReferenceLegacyEntrypoints' -count=1; go test -tags "fts5" ./internal/sync -run 'Test(DiscoverProviderSourcesOnlyRunsAuthoritativeProviders|SyncAllProviderDiscoveryFailureSkipsFinishedWatermark|FindSourceFileFallsBackToAuthoritativeNonFileProvider|ClassifyProviderChangedPath|ProcessFileShadow|ProcessFileProviderAuthoritative|ProviderVirtualSourceBackedByEvent)' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go vet ./...; git diff --check

docs(parser): clarify provider freshness contract

The facade spec still described successful parses as eligible for a clean skip-cache entry, which conflicts with the no-schema-change data-version model and can leave unchanged sessions stale after parser upgrades.\n\nDocument stored changed-path hints explicitly and keep successful unchanged-source freshness tied to DB metadata plus parser data version, reserving skipped_files for retry, failure, and explicit skip cases.\n\nValidation: go test -tags "fts5" ./internal/parser -run 'TestProviderFilesDoNotReferenceLegacyEntrypoints' -count=1; git diff --check. mdformat ran via commit hook.

docs(parser): pin provider source identity semantics

The facade contract needs to say exactly which provider source key is persisted because the migration intentionally avoids a schema change. Without that rule, providers could diverge between SourceRef, SourceFingerprint, and sessions.file_path identities.\n\nAlso define capability conformance by meaningful field presence so unsupported zero-value fields are treated consistently in provider tests.\n\nValidation: git diff --check. mdformat is unavailable on PATH, but the commit hook ran.

style(docs): mdformat provider dual-run harness plan
Most provider migrations will need the same source plumbing before they can focus on parser-specific behavior: list JSONL files, map changed paths back to sources, resolve stored source hints, and produce freshness fingerprints. Keeping that logic in one composable helper reduces the chance that every provider reimplements subtly different discovery and lookup rules.

The helper intentionally stops short of being a Provider. Concrete providers can hold it as a named field and explicitly forward the source methods they support, preserving the non-hook provider model from the facade design.

fix(parser): harden JSONL source lookup

Provider migrations will depend on this helper for persisted source lookup and watcher events, so stale fingerprint hints and delete/rename notifications need to produce the same source identity as discovery. Without that, a migrated provider could leave rows stale or fail to resolve custom fingerprint keys after sync state is persisted.

This also documents the helper's key, display, fingerprint, include, and symlink traversal contracts so provider-specific wrappers do not have to infer those rules independently.

fix(parser): reject non-regular JSONL event paths

Watcher events may arrive for directories or symlinks whose names match a provider's JSONL pattern. The helper should only synthesize a source from a path-only event when the path is missing or the event is explicitly a remove/rename, otherwise migrated providers can try to parse sources discovery would never return.

fix(parser): align jsonl duplicate source lookup

JSONLSourceSet discovery keeps the first Provider+Key owner when multiple physical files produce the same logical source identity. Direct stored-path lookup and changed-path classification bypassed that owner rule, so provider-authoritative sync could depend on whether a file was reached by full discovery or by an event.\n\nResolve direct lookups back through the discovered source owner for the computed key. Later duplicate paths now resolve to the first configured source for persisted lookup, and watcher events from the dropped duplicate do not emit a source under the wrong watch root.\n\nValidation: go test -tags "fts5" ./internal/parser -run TestJSONLSourceSetDuplicateKeysKeepFirstConfiguredRoot -count=1 -v; go test -tags "fts5" ./internal/parser -run TestJSONLSourceSet -count=1; go test -tags "fts5" ./internal/parser -count=1; go fmt ./...; go vet ./...; git diff --check
Several remaining parser migrations share the same one-project-directory JSONL layout. Keeping that source shape in a thin helper lets provider implementations focus on parse behavior while still explicitly forwarding the source methods they support.

The helper embeds JSONLSourceSet and only adds the directory-shape constraint plus default project hints, so it avoids introducing a second provider framework.

feat(parser): share virtual source path helpers

SQLite and trace-backed providers use the same physical-container plus logical-source path shape. Keeping each parser's split/join logic local makes stored hint compatibility more fragile as providers migrate onto the facade.

Introduce one reusable virtual source path helper and move the existing Kiro SQLite virtual path handling onto it on the low helper branch, so later provider branches can reuse the same contract as the stack restacks.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(VirtualSourcePath|ParseVirtualSourcePath|ParseKiroSQLiteVirtualPath|JSONLSourceSet|DirectoryJSONLSourceSet)' -count=1 -v; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; go fmt ./...; git diff --check

feat(parser): add sibling metadata source helper

Several provider migrations have a primary transcript source plus sibling metadata files that affect watch routing and freshness. Leaving that pattern entirely provider-local keeps the facade migration dependent on ad hoc sidecar handling even after the JSONL helper exists.

Add a small composable source helper that embeds JSONLSourceSet, maps sibling file events back to the primary source, and fingerprints sibling freshness alongside the source. This keeps concrete providers explicit while giving sidecar-backed migrations a shared source-layer contract.

Validation: go test -tags "fts5" ./internal/parser -run TestSiblingMetadataSourceSetMapsSiblingEventsToPrimarySource -count=1 -v; go test -tags "fts5" ./internal/parser -run TestSiblingMetadataSourceSetFingerprintsSourceWithoutOpaque -count=1 -v; go test -tags "fts5" ./internal/parser -run 'Test(SiblingMetadataSourceSet|VirtualSourcePath|ParseVirtualSourcePath|ParseKiroSQLiteVirtualPath|JSONLSourceSet|DirectoryJSONLSourceSet)' -count=1 -v; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser; go fmt ./...; git diff --check

fix(parser): thread ctx through sibling metadata source lookups
@mariusvniekerk mariusvniekerk force-pushed the provider-jsonl-source-set branch from 216e363 to 5e06aa5 Compare June 25, 2026 05:47
@mariusvniekerk mariusvniekerk force-pushed the provider-directory-jsonl-source-set branch from a302cd4 to 6c48133 Compare June 25, 2026 05:47
@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (6c48133)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 4m25s), codex_security (codex/security, done, 1m7s) | Total: 5m32s

@mariusvniekerk mariusvniekerk force-pushed the provider-jsonl-source-set branch from 5e06aa5 to 8a4b201 Compare June 25, 2026 17:27
@mariusvniekerk

Copy link
Copy Markdown
Collaborator Author

Superseded by #752. The two source-set helper PRs were folded into a single early "add reusable source-set bases and functional options" commit, so every provider migration now constructs its source set through the functional with*() options natively instead of a struct literal retrofitted later in the stack. The directory JSONL helper is included in that commit.

generated by a clanker

@mariusvniekerk mariusvniekerk deleted the provider-directory-jsonl-source-set branch June 25, 2026 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant