feat: offline disk overflow and network-aware persistence [rn]#29
Merged
Conversation
When the memory queue reaches capacity while offline, evicted events now overflow to a batched in-memory buffer that flushes to a separate disk file (via the native bridge). On reconnect, two independent flush paths run: the memory queue flushes normally, and overflow events drain directly from disk to network in batches of 100 — avoiding loading 10K events into a 2K memory queue. Key changes: - Dispatcher: onOverflow callback redirects evicted events, sendBatchDirect bypasses the queue for disk-to-network drain - PersistentEventQueue: handleOverflow batched buffer, flushOverflowBufferToDisk, drainDiskToNetwork with TTL filtering - NativeQueueStorage: overflow snapshot read/write/delete methods - MetaRouterAnalyticsClient: wires overflow callback, dual flush on reconnect, disk drain on launch, maxOfflineDiskEvents passthrough
Collapses the dual-file (snapshot + overflow-snapshot) JS abstraction into one file backed by the existing native module's queue.v1.json. The overflow native methods were never implemented in iOS/Android, so in production the old "offline overflow" path silently dropped events; consolidation fixes this latent bug by routing all persistence through the single native API that already exists. - NativeQueueStorage: drop read/write/deleteOverflowSnapshot; single read/write/deleteSnapshot interface - PersistentEventQueue.flushEventsToDisk (was flushEventsToOverflowDisk) uses read-merge-cap-write semantics so crash-safety and offline-overflow writes share one disk store - flushToDisk now drains the memory queue and appends to disk (was: overwrite snapshot with current queue contents), avoiding duplication when both memory and disk hold events - Dispatcher.onFlushToOfflineStorage renamed to onFlushToDisk; the store is no longer offline-specific - New Dispatcher.drainQueue() helper for the persistence layer to empty memory and reset byte counter atomically - DEFAULT_MAX_OFFLINE_DISK_EVENTS renamed to DEFAULT_MAX_DISK_EVENTS internally; public InitOptions surface (maxOfflineDiskEvents) stays until the API rename commit Native modules are unchanged (already target queue.v1.json and already do atomic writes). Public InitOptions surface is unchanged.
Replaces rehydrate-into-memory with a cheap file-existence probe so a large on-disk backlog no longer forces an allocation spike on cold start. Memory queue stays empty through init; disk events are drained directly to the network by the dispatcher when online. - Native (iOS/Android): new exists() method returning a boolean without reading file contents - PersistentEventQueue: rehydrate() removed, replaced by checkForPersistedEvents() which sets the hasDiskData flag from the cheap exists probe; hasDiskData is kept in sync on write/delete/drain - MetaRouterAnalyticsClient.init(): calls checkForPersistedEvents instead of rehydrate; triggers drainDiskToNetwork only when online AND hasDiskData is true (no unconditional cold-start flush) - getDebugInfo exposes hasDiskData (boolean) in place of rehydratedEvents (count); we no longer parse the file just to report its size - TTL filter: preserve events with unparseable timestamps (conservative, matches iOS/Android) — previously they were silently dropped
Wraps the raw NetworkMonitor so consumers see: - Offline transitions immediately (pause HTTP ASAP — no wasted retries on a down network) - Online transitions only after 2 seconds of stable connectivity (absorb WiFi/cellular flaps that would otherwise trigger flush attempts that immediately fail) A pending online debounce is cancelled if offline arrives during the wait, and the debounce window resets on repeated raw online signals. MetaRouterAnalyticsClient.init() now wraps the default NetworkMonitor in DebouncedNetworkMonitor. An injected monitor (tests, custom reachability providers) is used as-is, so callers control their own debounce.
Aligns the RN public surface with iOS/Android: InitOptions now exposes `maxQueueEvents` (count cap, default 2000) and `maxDiskEvents` (default 10000). `maxQueueBytes` moves to an internal 5MB constant — byte caps stay internal on all platforms until a customer needs to tune them. - InitOptions: add maxQueueEvents, add maxDiskEvents, remove maxQueueBytes, remove maxOfflineDiskEvents - Constructor: throws on negative maxDiskEvents (use 0 to disable); silently clamps maxQueueEvents to >= 1; warns when a non-zero maxDiskEvents is smaller than maxQueueEvents (memory could exceed what disk can preserve on background flush) - Dispatcher: enforces BOTH count AND byte caps on enqueue and enqueueFront. New isPersistenceEnabled callback; when it returns false (maxDiskEvents === 0), capacity overflow becomes a ring buffer (drop oldest one-by-one) instead of splicing the whole queue to a disk handler that would silently drop - Dispatcher.flush() offline branch skips the splice-to-disk callback when persistence is disabled — events stay in memory for the next foreground retry - PersistentEventQueue.flushToDisk / flushEventsToDisk are no-ops when maxDiskEvents === 0 (documented kill-loss tradeoff) - getDebugInfo: exposes maxQueueEvents and maxDiskEvents; drops maxQueueBytes from the returned shape
Before: the disk drain path silently deleted the disk store on 401/403/404 while the main flush path disabled the client. Asymmetric — the client would stay "ready" but drop events until the next main flush also hit the same error. Now both paths converge on a shared Dispatcher.handleFatalConfig helper that clears the queue, stops the flush loop, and fires onFatalConfig. Also adds test coverage for: - drain FATAL_CONFIG triggers onFatalConfig - enqueue at the event-count cap (count-cap overflow parity with byte-cap overflow) - ring-buffer drop-oldest when persistence is disabled (maxDiskEvents=0)
Matches the iOS/Android wording from the port guide so support can grep the same strings across all platforms: - "Network status changed: <old> -> <new>" - "Dispatcher paused — device is offline" - "Dispatcher resumed — device is online, triggering flush" No behavior change. networkStatus + hasDiskData are already in getDebugInfo from earlier commits; log set already omits the "deleted during iteration" noise (reconcile, per-update, debounce-state).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 2 of 2 of Network Awareness (builds on #24). Adds offline disk overflow, cold-start drain, debounced reachability, and a cleaned-up public capacity API.
Ticket: SC-36910 · Pt. 2: SC-37597
What changed
Offline disk overflow
PersistentEventQueueviaonOverflowinstead of being dropped.hasDiskData), overflow events drain directly from disk to network in batches of 100 viasendBatchDirect, bypassing the memory queue. Prevents memory spikes when 10K events need to ship through a 2K queue.onFatalConfigand stops the client, matching the main flush path (previously silent delete).Consolidated disk storage
Collapsed the dual-file (snapshot + overflow-snapshot) JS abstraction into one file backed by
queue.v1.json. The overflow-specific native methods were never implemented on iOS/Android — in production the old "offline overflow" path silently dropped events. Consolidation fixes that latent bug by routing all persistence through the single native API that already exists.NativeQueueStorage: droppedread/write/deleteOverflowSnapshot; singleread/write/deleteSnapshotinterface.flushToDisknow drains memory and appends to disk (read-merge-cap-write), so memory + disk never double-hold the same events.Dispatcher.onFlushToOfflineStorage→onFlushToDisk(no longer offline-specific). NewDispatcher.drainQueue()helper.Cold-start: no rehydration
Replaced rehydrate-into-memory with a cheap
exists()probe so a large on-disk backlog no longer forces an allocation spike on cold start. Memory queue stays empty throughinit(); disk is drained directly to network when online.exists()returning a boolean without reading file contents.PersistentEventQueue.rehydrate()removed; replaced bycheckForPersistedEvents()which sets ahasDiskDataflag kept in sync on write/delete/drain.getDebugInfoexposeshasDiskData: boolean(replacesrehydratedEvents: number).Debounced network monitor (2s asymmetric)
New
DebouncedNetworkMonitorwraps the raw monitor so consumers see:A pending online debounce is cancelled if offline arrives during the wait; the window resets on repeated raw online signals. Injected monitors (tests, custom providers) are used as-is.
Public capacity API
Aligns the RN surface with iOS/Android:
maxQueueBytes(public)maxOfflineDiskEventsmaxDiskEvents(public, default 10000)maxQueueEvents(public, default 2000)maxDiskEvents(use0to disable persistence); silently clampsmaxQueueEventsto ≥ 1; warns when a non-zeromaxDiskEvents<maxQueueEvents.enqueue/enqueueFront.isPersistenceEnabledcallback: whenmaxDiskEvents === 0, capacity overflow becomes a ring buffer (drop-oldest one-by-one) instead of splicing the whole queue to a no-op disk handler.flush()offline branch skips the splice-to-disk callback when persistence is disabled — events stay in memory for the next foreground retry.flushToDisk/flushEventsToDiskare no-ops whenmaxDiskEvents === 0.Log phrasing alignment
Matches iOS/Android canonical wording from the port guide so support can grep the same strings across platforms:
Network status changed: <old> -> <new>Dispatcher paused — device is offlineDispatcher resumed — device is online, triggering flushTest plan
npx jest)handleOverflowbuffers + respects disk cap; flushes at 100-event thresholdflushOverflowBufferToDiskmerges with existing disk overflow and enforces capdrainDiskToNetworksends directly to network without entering memory queueonFatalConfig(parity with main flush path)onOverflowfires when queue drops events (replaces warn-and-drop)maxDiskEvents=0)hasDiskDataflag stays in sync on write/delete/drain; no unconditional rehydratemaxDiskEvents; warns onmaxDiskEvents<maxQueueEvents