Skip to content

feat(parser): migrate cursor provider#769

Closed
mariusvniekerk wants to merge 1 commit into
provider-openhandsfrom
provider-cursor
Closed

feat(parser): migrate cursor provider#769
mariusvniekerk wants to merge 1 commit into
provider-openhandsfrom
provider-cursor

Conversation

@mariusvniekerk

Copy link
Copy Markdown
Collaborator

Cursor now uses a concrete provider instead of the legacy adapter. The provider keeps the existing transcript selection rules, including recursive project discovery, nested transcript layouts, .jsonl preference over .txt, stale-path lookup promotion, changed-path classification, and content-hash fingerprinting.

The implementation reuses the existing Cursor discovery and parsing helpers so parser output remains aligned with the previous sync path while making Cursor source behavior explicit at the provider boundary.

@roborev-ci

roborev-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

roborev: Combined Review (81ea358)

No Medium, High, or Critical findings were reported.

The only finding was Low severity, so it is omitted per review rules.


Panel: ci_default_security | Synthesis: codex, 4s | Members: codex_default (codex/default, done, 5m4s), codex_security (codex/security, done, 1m53s) | Total: 7m1s

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Review Unavailable (01391b0)

The review agent repeatedly failed to run (likely an agent or configuration error). roborev will try again on the next commit.

Last error: agent: claude-code failed stream: stream errors: You've hit your session limit · resets 5:50am (UTC): exit status 1

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (a1b0f7b)

Summary verdict: No Medium, High, or Critical findings to report.

Both reviews found no reportable issues at Medium severity or above.


Panel: ci_default_security | Synthesis: codex, 4s | Members: codex_default (codex/default, done, 7m47s), codex_security (codex/security, done, 3m8s) | Total: 10m59s

@mariusvniekerk

mariusvniekerk commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (47d5c6b)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 5m39s), codex_security (codex/security, done, 2m1s) | Total: 7m40s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (a1b425c)

Summary verdict: Two medium regressions remain; no high or critical findings were reported.

Medium

  • internal/parser/cursor_provider.go:416
    Fingerprint hashes the entire Cursor transcript before parseSession applies the 10 MiB size cap, so an oversized transcript now gets fully read on every provider parse attempt before being rejected.
    Fix: Do not compute Cursor hashes in Fingerprint; let parseSession keep setting File.Hash from the capped read, or enforce the same size limit before hashing.

  • cmd/agentsview/token_use.go:95
    resolveRawSessionID only probes AgentDef.FindSourceFunc, but Cursor’s registry entry no longer has one. agentsview token-use / session usage can no longer recognize and on-demand sync an unsynced Cursor session ID from disk.
    Fix: Add provider-backed source lookup to the disk probe path, similar to Engine.findProviderSourceFile, or retain a compatible Cursor lookup hook for this command.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 5m32s), codex_security (codex/security, done, 1m33s) | Total: 7m13s

@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (1e17845)

Review verdict: one medium-severity compatibility regression should be addressed before merge.

Medium

  • Location: internal/parser/types.go:222, internal/ssh/resolve.go:77, cmd/agentsview/token_use.go:92
  • Problem: Cursor now drops DiscoverFunc/FindSourceFunc, but SSH remote sync and session usage disk resolution still only consider agents with those legacy hooks. Remote Cursor directories are no longer emitted by the resolve script, and unsynced local Cursor IDs can no longer be resolved for on-demand usage sync.
  • Fix: Update those callers to include provider-authoritative file-based agents and use provider discovery/FindSource, or keep compatibility hooks until all remaining callers are provider-aware.

Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 6m44s), codex_security (codex/security, done, 2m12s) | Total: 9m4s

Cursor transcript sources have two legacy layouts and select .jsonl over .txt when both exist for a session. Moving Cursor behind a concrete provider keeps that selection policy explicit at the provider boundary instead of relying on the legacy parser adapter.\n\nThe provider preserves recursive project discovery, raw/full ID lookup, stale .txt path promotion, changed-path classification, content-hash fingerprinting, and parser output normalization while using the same Cursor discovery and parsing helpers as the previous sync path.

fix(parser): preserve cursor project-scoped source selection

Cursor session IDs are only unique within an encoded project directory, but the provider was resolving stored and changed paths through a root-wide lookup. That could silently select the same transcript stem from a different project and drop valid sources during discovery.

Resolve Cursor source promotion inside the project derived from the incoming path, add duplicate-stem coverage, and mark model output unsupported until the parser actually fills message models. This lets the Cursor branch enter shadow comparison as a real migration step.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(CursorProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

test(sync): compare cursor shadow parity

Cursor is shadow-compared on this branch, so add source-level migration coverage that compares provider observation with ParseCursorSession.

The test uses duplicate transcript stems in different encoded project directories to lock in the current parser ID behavior while proving provider source observation stays project-scoped.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesCursorLegacyParser|TestCursorProvider|TestParseCursor|TestCursorSessionID' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...

test(sync): assert cursor provider hash parity

Roborev job 2709 caught that the Cursor shadow parity fixture normalized the legacy session hash before proving the provider fingerprint matched the legacy parser hash. That left the test unable to detect a provider fingerprint regression that propagated into parsed output.

Assert hash parity before normalizing the legacy session for the full struct comparison, keeping the existing duplicate-stem fixture focused on provider/legacy equivalence.

Validation: go test -tags "fts5" ./internal/sync -run TestObserveProviderSourceMatchesCursorLegacyParser -count=1; go fmt ./...; go vet ./...; git diff --check

refactor(parser): fold cursor into provider

Move Cursor source discovery, lookup, and parse ownership onto the
concrete cursorProvider and remove the package-level
DiscoverCursorSessions, FindCursorSourceFile, and ParseCursorSession
free functions. Discovery and find-source bodies now live as
provider-owned helpers (discoverTranscriptPaths, cursorAddSeen,
cursorFindSourceFile) on the cursor source set, and parseSession is a
receiver method.

Make Cursor provider-authoritative and drop its legacy sync dispatch:
the classifyOnePath transcript block, the processFile case arm, the
processCursor method, and its now-orphaned validateCursorContainment
and findContainingDir helpers. Source classification, containment,
.txt/.jsonl precedence, and project-hint decoding are all reproduced
through the provider's changed-path and discovery paths, so runtime
behavior is preserved. ParseCursorTranscriptRelPath stays a shared
provider-neutral path validator used by both the engine's project
enrichment and the provider.

Replace the shadow-baseline test with provider API coverage plus a
guard asserting the legacy entrypoints stay gone, and remove cursor
from the pending-shim list.

fix(parser): cap cursor provider fingerprinting

Cursor parsing already rejects transcripts over 10 MiB, but the migrated provider fingerprint path still hashed the full source before parse. That made oversized files pay an unbounded read cost in the provider freshness path even though parse would never accept them.\n\nKeep normal-size content hashing intact and return only metadata for oversized Cursor transcripts so parse remains the sole place that reads up to the guarded cap.\n\nValidation: go test -tags "fts5" ./internal/parser -run 'TestCursorProvider' -count=1; go vet ./...; git diff --check
@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (e16bf2d)

Medium issue found; no high or critical findings.

Medium

  • Location: internal/parser/cursor_provider.go:318
  • Problem: FindSource only resolves stored Cursor file_path values through currently configured roots. SyncSingleSession still prefers the DB-stored path when the file exists, so a Cursor session whose root was removed or changed can now fail with source not found instead of reparsing the stored file as the legacy path did.
  • Fix: Add a stored-path fallback for Cursor that derives the implicit projects root from <root>/<project>/agent-transcripts/..., builds a SourceRef, and add a single-session regression test for a stored path outside configured roots.

Panel: ci_default_security | Synthesis: codex, 13s | Members: codex_default (codex/default, done, 6m3s), codex_security (codex/security, done, 54s) | Total: 7m10s

@mariusvniekerk

Copy link
Copy Markdown
Collaborator Author

Superseded by the family-grouped collapse of this stack into 10 green branches (#876#885), which preserves every commit while cutting the branch count for review. Closing in favor of that stack.

generated by a clanker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant