feat: Microsoft Teams ingestion (delegated Graph sync)#398
Conversation
roborev: Review Unavailable (
|
roborev: Review Unavailable (
|
roborev: Combined Review (
|
|
Two unrelated bug fixes that surfaced while building this were split out into their own PRs to keep this one scoped to Teams ingestion:
Neither depends on this PR; both branch off current |
roborev: Combined Review (
|
## What Several importers built `time.Time` values from epoch timestamps with `time.Unix`/`time.UnixMilli` but **without `.UTC()`**, leaving them in the runner's local zone — while the rest of each importer stores dates in UTC. Any code reading the calendar day (or the Parquet year partition) is then off by one in zones east of UTC. Fixes: - `internal/sync/sync.go` — `processBatch` oldest-message date (progress tracking). - `internal/whatsapp/mapping.go` — message `SentAt`. - `internal/whatsapp/importer.go` — reaction `createdAt`. ## Why it matters `TestProcessBatch_OldestDatePropagation` fails on any machine east of UTC (e.g. NZ): the fixture `2024-01-10T12:00:00Z` reads back as Jan 11 local. The tests are correct; the production code was the bug. Adds `TestMapMessageSentAtIsUTC` (asserts the stored zone is UTC, machine-independent). ## Possible later fixes (out of scope here) The same `time.Unix(...)`-without-`.UTC()` pattern also appears in the embedding-generation status timestamps, but these are **operator-facing status values** round-tripped from unix-int columns (not message dates), so they don't affect partitioning/dedup/cross-system date semantics. Local-time display is arguably fine; normalizing them to UTC would be a consistency-only follow-up. Sites: - `cmd/msgvault/cmd/embeddings_manage.go` — `StartedAt`, `SeededAt`, `CompletedAt`, `ActivatedAt`. - `internal/vector/pgvector/backend.go` — `StartedAt`, `CompletedAt`, `ActivatedAt`. - `internal/vector/sqlitevec/backend.go` — `StartedAt`, `CompletedAt`, `ActivatedAt`. Left unchanged here to avoid churning working code on a style call; documented so a future pass can decide. ## Scope Independent of the Teams PR (#398) — branched from `main`, touches only `internal/sync` and `internal/whatsapp`. Co-authored-by: Nat Torkington <njt@users.noreply.github.com>
|
looking at this |
13c4591 to
7e1ecd2
Compare
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
Squash the Teams ingestion branch into a single commit before rebasing onto origin/main. The branch adds delegated Microsoft Graph OAuth, Teams source commands, chat/channel import, sync state, hosted-content media handling, daemon scheduling, and the recovery/backfill paths needed to repair already-imported inline media. After rebasing onto origin/main, Teams messages are also included in the new message_type search/help surface and text-mode message-type allowlists so `message_type:teams` works consistently with the main-branch query changes. Included branch commits: - fix(teams): close ingestion review gaps - fix(teams): migrate legacy raw message ids - fix(teams): repair legacy id migration references - fix(teams): make Teams tests portable across CI backends - fix(teams): keep URL attachments as links - fix(teams): constrain Graph URL requests - fix(teams): preserve channel backfill on delta prime errors - fix(teams): reject stale Graph token scopes Co-authored-by: Wes McKinney <wesmckinn+git@gmail.com> Co-authored-by: Codex <codex@openai.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
c74f849 to
546a95c
Compare
roborev: Combined Review (
|
What
Sync your own Microsoft Teams 1:1/group/meeting chats and channel messages into msgvault via delegated Microsoft Graph, searchable alongside mail through the existing TUI / FTS / Parquet analytics.
Highlights
add-teams(delegated Graph OAuth) andsync-teams(full + incremental, with streamed per-conversation progress) commands; Teams also runs underservescheduled syncs — and the daemon now syncs all source types on an identifier (so Teams + Outlook/IMAP on one address both run).to) +@mentionrows, identity resolution (AAD object id → email dedup, unifying with mail identities), inline images downloaded to content-addressed storage, and shared-file links recorded.lastModifiedDateTimelist filtering (no delegated per-chat delta endpoint exists), channels via/messages/delta; per-conversation cursors persisted insync_runs.cursor_after, flushed after each conversation so an interrupted long backfill resumes mid-stream.teams_<email>.jsontoken with Graph scopes only, so IMAP and Teams can each be used alone or together.Use
Chat.Read,ChannelMessage.Read.All,Team.ReadBasic.All,Channel.ReadBasic.All,User.Read) and grant admin consent.config.toml:msgvault add-teams you@tenant.comthenmsgvault sync-teams you@tenant.com(--no-channels/--limitfor scoped runs). Pressainsidemsgvault tuito filter to the Teams account.Notes
🤖 Generated with Claude Code