Background
Following #45/#46 (thanks for the quick merge + the CI/dbus fixes — both shipped in 0.5.51), credentials persist and Gmail's [Gmail] container no longer aborts sync. But on Gmail/IMAP, archived mail still isn't synced: syncable_folders excludes the \All (All Mail) folder, and All Mail is the only place archived-only messages live.
Including All Mail naively isn't safe today. The store's identity is UNIQUE (account_id, provider_id) with provider_id = "mailbox:uid" and no RFC Message-ID dedup, so the same Gmail message in INBOX and All Mail (different UIDs per folder) becomes two rows. That duplication is presumably why \All is excluded — at the cost of archived mail. The Gmail API provider avoids this entirely by being message-centric (one entity, labelIds as attributes); the IMAP path can't naturally match that.
I have a working implementation and wanted to check the approach with you before opening a sizeable PR.
Proposed approach: make Gmail/IMAP message-centric via X-GM-*
For servers advertising X-GM-EXT-1:
- Sync All Mail only. Each message is in All Mail exactly once → one row per message (no folder duplicates), and archived mail is covered.
- Derive labels/threads/flags from
X-GM-LABELS / X-GM-THRID rather than folder membership: \Inbox→INBOX, \Sent→[Gmail]/Sent Mail, user labels by name (nested = Parent/Child), drop \All/unknown. (Note: Gmail quotes system labels inconsistently — \Sent vs \\Sent — so the mapping normalizes leading backslashes.)
- Keep
provider_id = "mailbox:uid" so the existing mutation path is unchanged.
For all IMAP servers (independently useful):
- Initial sync becomes a probe + paginated backfill. Today's single-shot
UID FETCH 1:* loads an entire folder into memory (multi-GB on a large All Mail). Instead, a probe records per-folder watermarks and backfill_sync walks UIDs newest→oldest via UID SEARCH, fetching ≤400/batch through the existing has_more loop — flat memory, resumable from the per-batch cursor. Plus a generous per-batch timeout so Gmail throttling surfaces as retry/backoff, not a wedge.
The one dependency: mxr-async-imap Gmail FETCH accessors
imap-proto already parses AttributeValue::GmailLabels/GmailMsgId/GmailThrId, but mxr-async-imap's Fetch only surfaces uid/size/modseq. It needs three small accessors on Fetch reading from self.response.parsed() (same shape as flags()):
pub fn gmail_labels(&self) -> Option<Vec<String>> { /* find AttributeValue::GmailLabels */ }
pub fn gmail_msgid(&self) -> Option<u64> { /* GmailMsgId */ }
pub fn gmail_thrid(&self) -> Option<u64> { /* GmailThrId */ }
Since mxr-async-imap is published from this project, would you prefer to add those accessors yourself, or take them as part of the change?
Status / validation
I have this implemented and running against a live Gmail account (~80k-message All Mail): full backfill with bounded memory, zero duplicate rows, labels resolving (Inbox/Sent/user labels), surviving daemon restarts. It includes unit tests (label mapping, the Gmail parse branch, backfill cursor round-trip, etc.) and updates the shared sync-conformance harness to drive has_more (verified across imap/gmail/fake).
Happy to open the PR once you're good with (a) the All-Mail-only + X-GM-LABELS model and (b) how you'd like the mxr-async-imap accessors landed. Are there constraints I'm missing (e.g. servers where you'd want to keep per-folder sync, or a preferred label-mapping source)?
Background
Following #45/#46 (thanks for the quick merge + the CI/dbus fixes — both shipped in 0.5.51), credentials persist and Gmail's
[Gmail]container no longer aborts sync. But on Gmail/IMAP, archived mail still isn't synced:syncable_foldersexcludes the\All(All Mail) folder, and All Mail is the only place archived-only messages live.Including All Mail naively isn't safe today. The store's identity is
UNIQUE (account_id, provider_id)withprovider_id = "mailbox:uid"and no RFC Message-ID dedup, so the same Gmail message in INBOX and All Mail (different UIDs per folder) becomes two rows. That duplication is presumably why\Allis excluded — at the cost of archived mail. The Gmail API provider avoids this entirely by being message-centric (one entity,labelIdsas attributes); the IMAP path can't naturally match that.I have a working implementation and wanted to check the approach with you before opening a sizeable PR.
Proposed approach: make Gmail/IMAP message-centric via
X-GM-*For servers advertising
X-GM-EXT-1:X-GM-LABELS/X-GM-THRIDrather than folder membership:\Inbox→INBOX,\Sent→[Gmail]/Sent Mail, user labels by name (nested =Parent/Child), drop\All/unknown. (Note: Gmail quotes system labels inconsistently —\Sentvs\\Sent— so the mapping normalizes leading backslashes.)provider_id = "mailbox:uid"so the existing mutation path is unchanged.For all IMAP servers (independently useful):
UID FETCH 1:*loads an entire folder into memory (multi-GB on a large All Mail). Instead, a probe records per-folder watermarks andbackfill_syncwalks UIDs newest→oldest viaUID SEARCH, fetching ≤400/batch through the existinghas_moreloop — flat memory, resumable from the per-batch cursor. Plus a generous per-batch timeout so Gmail throttling surfaces as retry/backoff, not a wedge.The one dependency:
mxr-async-imapGmail FETCH accessorsimap-protoalready parsesAttributeValue::GmailLabels/GmailMsgId/GmailThrId, butmxr-async-imap'sFetchonly surfacesuid/size/modseq. It needs three small accessors onFetchreading fromself.response.parsed()(same shape asflags()):Since
mxr-async-imapis published from this project, would you prefer to add those accessors yourself, or take them as part of the change?Status / validation
I have this implemented and running against a live Gmail account (~80k-message All Mail): full backfill with bounded memory, zero duplicate rows, labels resolving (Inbox/Sent/user labels), surviving daemon restarts. It includes unit tests (label mapping, the Gmail parse branch, backfill cursor round-trip, etc.) and updates the shared sync-conformance harness to drive
has_more(verified across imap/gmail/fake).Happy to open the PR once you're good with (a) the All-Mail-only +
X-GM-LABELSmodel and (b) how you'd like themxr-async-imapaccessors landed. Are there constraints I'm missing (e.g. servers where you'd want to keep per-folder sync, or a preferred label-mapping source)?