Skip to content

Add Google Calendar sync support#418

Open
danshapiro wants to merge 7 commits into
kenn-io:mainfrom
danshapiro:codex/gcal-docs-pr-readiness
Open

Add Google Calendar sync support#418
danshapiro wants to merge 7 commits into
kenn-io:mainfrom
danshapiro:codex/gcal-docs-pr-readiness

Conversation

@danshapiro

Copy link
Copy Markdown
Contributor

Summary

  • add read-only Google Calendar sync with CLI commands, daemon scheduling, and [[gcal]] config
  • store calendar events as searchable messages with message_type filtering and scoped vector refresh support
  • update README, setup, OAuth, CLI, config, search, and changelog docs for PR readiness

Testing

  • timeout 45m env GOMAXPROCS=2 GOFLAGS=-p=1 make test
  • timeout 10m make docs-check

@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (feefe2e)

Medium issues found; no Critical or High findings were reported.

Medium

  • cmd/msgvault/cmd/calendar.go:231
    sync-calendar --all-calendars does not add reader/subscribed calendars after the account already has any registered calendar. The command switches to Incremental() whenever existing is non-empty, and incremental sync only iterates stored sources, so the documented command in docs/usage/calendar.md has no effect after the first default sync.
    Fix: When selection flags can expand the calendar set, enumerate/register missing calendars or force a full sync; alternatively require and document --full.

  • docs/usage/calendar.md:103
    Docs and release notes advertise msgvault search ... --message-type calendar_event, but search only registers --limit, --offset, --json, --account, --collection, --mode, and --explain. Users following the docs will get an unknown flag instead of a filtered search.
    Fix: Add a --message-type search flag that populates search.Query.MessageTypes, or update all examples to use the implemented message_type:calendar_event query operator.

  • cmd/msgvault/cmd/build_cache.go:496
    The cache verifier still treats maxID > 0 as proof that message Parquet rows must exist, but calendar events are now excluded from the messages export. A database containing only calendar_event rows will produce no messages Parquet and build-cache will fail forever instead of recording an empty email analytics cache.
    Fix: Base the verification on exportable non-calendar live messages, or allow zero exported message rows when all messages are excluded calendar events.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 8m0s), codex_security (codex/security, done, 5m56s) | Total: 14m6s

@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (6a7b30b)

This PR has several Medium issues to address before merge.

Medium

  • cmd/msgvault/cmd/calendar.go:196: sync-calendar EMAIL ignores the oauth_app stored on calendar sources created by add-calendar --oauth-app, so the printed next command can use the default OAuth app and fail to refresh/use the named-app token.
    Fix: Load existing calendar sources before building the client and default oauthApp from their stored OAuthApp when no flag/config value is provided; also include --oauth-app in the add command’s next-step output when used.

  • cmd/msgvault/cmd/deletions.go:824: Permanent-deletion re-consent requests only mail.google.com; for accounts that already have calendar.readonly, this replaces the grant set and drops Calendar access.
    Fix: Preserve existing non-deletion scopes during deletion escalation, especially oauth.ScopeCalendarReadonly, when building requiredScopes.

  • internal/query/sqlite.go:1748: Merging a MessageFilter.MessageType appends to existing q.MessageTypes, but the SQL uses IN (...), so a scoped search can widen instead of narrow. For example, SMS view plus message_type:email returns SMS and email.
    Fix: Treat the context message type as an intersection/override; if it conflicts with query message types, return no matches.

  • cmd/msgvault/cmd/add_synctech_sms_drive.go:144: A vector enqueue failure aborts Synctech SMS scheduled sync after data has already imported and before cache rebuild, unlike Gmail/Calendar where enqueue is best-effort and recoverable by a full vector rebuild.
    Fix: Log enqueue failures and continue to rebuild the cache; do not fail the completed import solely because optional embedding enqueue failed.


Panel: ci_default_security | Synthesis: codex, 11s | Members: codex_default (codex/default, done, 6m49s), codex_security (codex/security, done, 3m39s) | Total: 10m39s

@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (3d3c4fa)

Medium-risk issues found in the default review; security review found no additional issues.

Medium

  • cmd/msgvault/cmd/calendar.go:383: Calendar token reauth uses fixed oauth.ScopesGmailCalendar, so if an expired or revoked token is reauthorized here, previously granted non-Gmail/Calendar scopes such as Drive readonly can be dropped.
    Fix: Reauthorize with the union of existing GrantedScopes(accountEmail) and required Calendar/Gmail scopes, or make getTokenSourceWithReauth preserve stored grants.

  • cmd/msgvault/cmd/tui.go:262: cacheNeedsBuild never compares state.SchemaVersion with cacheSchemaVersion, so existing v5 Parquet caches can look fresh after the v6 calendar-event exclusion and continue leaking calendar rows/junctions into analytics until some unrelated cache trigger runs.
    Fix: After decoding _last_sync.json, return a full rebuild staleness result when state.SchemaVersion != cacheSchemaVersion.

  • cmd/msgvault/cmd/calendar.go:244: sync-calendar ignores --limit, --after, and --before on already-registered accounts because the full-vs-incremental decision does not consider those flags, while incremental sync does not apply them.
    Fix: Force a full sync when limit/date bounds are supplied, or reject those flags unless --full is also supplied.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 10m2s), codex_security (codex/security, done, 6m41s) | Total: 16m53s

Squashed branch commits:

- feat(vector): support scoped embedding builds
- fix(vector): enqueue synctech sms imports
- fix(search): honor message_type filter in local FTS query path
- feat(store): add SetMessageMetadata and GetSourcesByTypeAndAccount
- feat(gmail): add NewRateLimiterWithCapacity + Calendar operations
- feat(gcal): add read-only Google Calendar API v3 client
- feat(oauth): add Calendar scopes + parameterize scope escalation
- feat(calsync): calendar sync orchestration (full + incremental)
- feat(build-cache): exclude calendar events from the email Parquet
- feat(config): add [[gcal]] calendar sync configuration
- feat(cli): add add-calendar and sync-calendar commands + daemon scheduling
- docs(gcal): document calendar sync commands and [[gcal]] config
- test(gcal): loopback integration test + testify-helper compliance
- fix(calsync): address adversarial review findings (data integrity)
- docs(calsync): correct incrementalCalendar header comment
- fix(oauth): never delete the existing token before re-consent; harden headless add-calendar
- fix(store): dedupe recipients to prevent UNIQUE-constraint sync abort
- fix(calsync): bounded full sync must not set incremental baseline; series title from master
- docs(gcal): add Google Calendar usage guide and surface the feature
- docs(gcal): clarify Calendar auth setup
- docs(gcal): add user-facing release notes
- fix(gcal): address PR readiness blockers
- fix(gcal): stabilize calendar-only cache freshness
- fix(gcal): count deleted hidden cache rows
- fix: address PR readiness regressions
- fix: preserve OAuth and message type scopes
- fix: close review gaps in scoped sync and search
- test: gate SQLite FTS failure regression
- fix: preserve Calendar OAuth scopes
- docs: preserve grants in headless Calendar setup
- docs: seed tokens before headless Calendar consent
- fix: forward remote message type searches
- fix: avoid stale Calendar resume tokens
- fix: version Calendar resume checkpoints

Co-authored-by: Wes McKinney <wesmckinn+git@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wesm wesm force-pushed the codex/gcal-docs-pr-readiness branch from 3d3c4fa to 16482cb Compare June 25, 2026 21:54
Calendar date bounds are only applied by the full Calendar sync path, so a registered account with stored cursors must not silently run incremental sync when --after or --before is present. Treating bounded sync as full preserves the user's requested window instead of reporting success with ignored limits.

Remote CLI search already has server-side support for message_type filters through query syntax, so --message-type should be serialized into message_type: terms instead of being rejected before the request leaves the client.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (16482cb)

Medium findings found.

  • Medium: cmd/msgvault/cmd/calendar.go:69 add-calendar requests oauth.ScopesGmailCalendar before determining whether the account already has a Gmail token. Calendar-only accounts with no existing token can end up storing Gmail read/modify scopes unnecessarily. Use a Calendar-only OAuth manager for new calendar-only tokens, and reserve combined Gmail+Calendar scopes for re-consent when preserving an existing Gmail grant.

  • Medium: cmd/msgvault/cmd/calendar.go:232 sync-calendar --after, --before, and --limit are ignored when calendar sources already exist because the command chooses incremental sync, which does not use those options. Force a full sync when those flags are present, or reject them unless --full is supplied.

  • Medium: internal/calsync/incremental.go:42 Incremental calendar sync only applies explicit calendar ID filters and ignores the current access-role selection. Reader/freeBusy calendars registered by a prior --all-calendars run can continue syncing even when the current/default config should only sync owner+writer calendars. Reconstruct the calendar from sync_config and apply includeCalendar before syncing each registered source.

  • Medium: internal/calsync/persist.go:171 Calendar sync treats FTS indexing as fatal after message/body/raw/recipients have already been written, so a broken FTS table or indexing error can wedge sync and prevent cursor advancement. Match other import paths by logging UpsertFTS failures and continuing.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 12m11s), codex_security (codex/security, done, 4m56s) | Total: 17m17s

Calendar-only onboarding should not request Gmail scopes unless there is an existing Gmail or legacy token to preserve during re-consent. Keeping the scope choice tied to the stored token avoids over-granting new Calendar accounts while still protecting existing Gmail grants.

Bounded and limited calendar sync options only affect the full-sync path, so the CLI now treats any of them as full-only. Incremental sync also reapplies the configured calendar selection from stored source metadata so prior all-calendar registrations do not keep reader calendars syncing under the default selection.

Calendar event persistence now treats FTS indexing like other import paths: the durable message/body/raw/recipient writes stay authoritative, and an FTS failure is logged for later repair instead of wedging cursor advancement.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
@roborev-ci

roborev-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

roborev: Combined Review (030fa3e)

High/Medium issues found in the Calendar OAuth token handling paths.

High

  • cmd/msgvault/cmd/calendar.go:90
    The new-account add-calendar path requests only calendar.readonly, then calls oauth.Manager.Authorize, whose validation path calls the Gmail profile API. A calendar-only token cannot satisfy Gmail-profile validation, so adding Calendar for an account without existing Gmail authorization fails after consent and never saves the token.
    Fix: Use a Calendar-safe account verification path for calendar-only OAuth, such as userinfo/openid email or tokeninfo, or request an explicit verification scope and test the no-existing-token browser flow.

Medium

  • cmd/msgvault/cmd/calendar.go:74
    Existing Calendar tokens are reused based only on HasToken and HasScope, without checking TokenMatchesClient. If a user runs add-calendar --oauth-app work while the token file was minted by the default OAuth client, the source is registered with oauth_app=work but the stored refresh token belongs to a different client and can later fail refresh in scheduled sync.
    Fix: Mirror add-account’s client binding check for Calendar flows and force reauth or headless token-copy instructions when the stored token does not match the selected OAuth app.

  • cmd/msgvault/cmd/addaccount.go:211
    Calendar-only tokens are now possible, but add-account still treats any existing token as reusable. If a user archives Calendar first and later adds Gmail, the CLI creates the Gmail source without obtaining Gmail scopes, so sync-full fails; forcing reauth also risks replacing the Calendar grant with Gmail-only scopes.
    Fix: When reusing a token for Gmail, verify the required Gmail scopes or legacy-token status, and if reauth is needed request the union of existing scopes plus Gmail scopes.


Panel: ci_default_security | Synthesis: codex, 11s | Members: codex_default (codex/default, done, 12m59s), codex_security (codex/security, done, 4m15s) | Total: 17m25s

wesm and others added 2 commits June 25, 2026 19:33
The review-fix commit left two CI-only regressions: the calendar option test crossed the testify-helper package-call threshold, and the pgvector test schema declared message_type twice so the live Postgres lane failed before exercising backend behavior. Keeping these test fixtures valid lets CI cover the Calendar and pgvector changes instead of failing during setup.

Validation: ran the pgvector-tagged CI package set against a temporary pgvector/pgvector:pg16 container.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
Calendar-only add-calendar should not request Gmail scopes, but the shared OAuth authorize path still validated new grants through Gmail profile. A Calendar-only token cannot call that endpoint, so new Calendar accounts could fail before their token was saved.

Calendar grants now validate through the Calendar primary calendar endpoint, while Gmail-capable grants keep the existing Gmail profile validation. The Calendar OAuth manager also derives its configured reauth scopes from stored grants so Drive or other non-Gmail scopes survive replacement consent.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
@roborev-ci

roborev-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

roborev: Combined Review (5c44f8d)

Medium issue found: find_similar_messages bypasses the new scoped embedding index guard.

Medium

  • internal/mcp/handlers.go:532: find_similar_messages calls backend.Search directly, bypassing the new scoped embedding index guard in hybrid.Engine.Search. With [vector.embed.scope] message_types = [...], this MCP tool can return results from a partial index as if it covered the full archive, and it has no message_type argument to make the query scope explicit.

    Fix: Add matching scope validation for find_similar_messages or route it through a shared helper. Expose/apply a message_type filter, and reject the tool when a scoped index is configured without a matching filter.


Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 14m10s), codex_security (codex/security, done, 5m55s) | Total: 20m12s

wesm and others added 2 commits June 25, 2026 20:07
find_similar_messages bypassed the hybrid engine's scoped-index guard by calling the vector backend directly. With a message-type-scoped embedding index, MCP clients could ask an unscoped nearest-neighbor question and receive results from a partial corpus as if the index covered every message.

The tool now exposes an explicit message_type filter and validates that filter with the same build-scope rules used by hybrid search before dispatching backend ANN work. This keeps scoped indexes from serving ambiguous similar-message queries while preserving the direct backend path for valid filtered requests.

Validation: ran the pgvector-tagged vector/scheduler/CLI package set against a temporary pgvector/pgvector:pg16 container.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
find_similar_messages now shares the same active-generation freshness contract as hybrid search. A direct ActiveGeneration lookup could serve a scoped or otherwise stale active generation after vector configuration changed, which made the previous scoped-index guard depend on current config while the backend still searched the old index.

ResolveActiveForFingerprint keeps the MCP similar-search path from answering with vectors built under a different model, preprocessing policy, or embed scope. The tests now construct matching generations for valid similar-search cases and cover the stale-generation rejection before backend search runs.

Validation: ran the pgvector-tagged vector/scheduler/CLI package set against a temporary pgvector/pgvector:pg16 container.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants