Skip to content

feat: offline disk overflow and network-aware persistence [rn]#29

Merged
choudlet merged 11 commits into
mainfrom
chrish/sc-36910/network-monitoring-dispatcher
Apr 22, 2026
Merged

feat: offline disk overflow and network-aware persistence [rn]#29
choudlet merged 11 commits into
mainfrom
chrish/sc-36910/network-monitoring-dispatcher

Conversation

@choudlet
Copy link
Copy Markdown
Collaborator

@choudlet choudlet commented Apr 10, 2026

Summary

PR 2 of 2 of Network Awareness (builds on #24). Adds offline disk overflow, cold-start drain, debounced reachability, and a cleaned-up public capacity API.

Ticket: SC-36910 · Pt. 2: SC-37597

What changed

Offline disk overflow

  • Dispatcher overflow callback — when the memory queue hits capacity, evicted events are redirected to PersistentEventQueue via onOverflow instead of being dropped.
  • Batched overflow buffer — events accumulate in-memory and flush at a 100-event threshold, avoiding per-event disk I/O.
  • Direct disk-to-network drain — on offline→online transition (and at cold start when hasDiskData), overflow events drain directly from disk to network in batches of 100 via sendBatchDirect, bypassing the memory queue. Prevents memory spikes when 10K events need to ship through a 2K queue.
  • Dual flush paths on reconnect — memory queue flushes via the existing path; disk overflow drains independently.
  • TTL enforcement — events older than 7 days are filtered during drain. Events with unparseable timestamps are preserved (conservative, matches iOS/Android).
  • Fatal-config parity — 401/403/404 during drain now fires onFatalConfig and stops the client, matching the main flush path (previously silent delete).

Consolidated disk storage

Collapsed the dual-file (snapshot + overflow-snapshot) JS abstraction into one file backed by queue.v1.json. The overflow-specific native methods were never implemented on iOS/Android — in production the old "offline overflow" path silently dropped events. Consolidation fixes that latent bug by routing all persistence through the single native API that already exists.

  • NativeQueueStorage: dropped read/write/deleteOverflowSnapshot; single read/write/deleteSnapshot interface.
  • flushToDisk now drains memory and appends to disk (read-merge-cap-write), so memory + disk never double-hold the same events.
  • Dispatcher.onFlushToOfflineStorageonFlushToDisk (no longer offline-specific). New Dispatcher.drainQueue() helper.

Cold-start: no rehydration

Replaced rehydrate-into-memory with a cheap exists() probe so a large on-disk backlog no longer forces an allocation spike on cold start. Memory queue stays empty through init(); disk is drained directly to network when online.

  • Native (iOS/Android): new exists() returning a boolean without reading file contents.
  • PersistentEventQueue.rehydrate() removed; replaced by checkForPersistedEvents() which sets a hasDiskData flag kept in sync on write/delete/drain.
  • getDebugInfo exposes hasDiskData: boolean (replaces rehydratedEvents: number).

Debounced network monitor (2s asymmetric)

New DebouncedNetworkMonitor wraps the raw monitor so consumers see:

  • Offline transitions immediately (pause HTTP ASAP — no wasted retries on a down network).
  • Online transitions only after 2s of stable connectivity (absorbs WiFi/cellular flaps that would otherwise trigger flush attempts that immediately fail).

A pending online debounce is cancelled if offline arrives during the wait; the window resets on repeated raw online signals. Injected monitors (tests, custom providers) are used as-is.

Public capacity API

Aligns the RN surface with iOS/Android:

Before After
maxQueueBytes (public) internal 5MB constant
maxOfflineDiskEvents maxDiskEvents (public, default 10000)
maxQueueEvents (public, default 2000)
  • Constructor throws on negative maxDiskEvents (use 0 to disable persistence); silently clamps maxQueueEvents to ≥ 1; warns when a non-zero maxDiskEvents < maxQueueEvents.
  • Dispatcher enforces both count and byte caps on enqueue / enqueueFront.
  • New isPersistenceEnabled callback: when maxDiskEvents === 0, capacity overflow becomes a ring buffer (drop-oldest one-by-one) instead of splicing the whole queue to a no-op disk handler.
  • flush() offline branch skips the splice-to-disk callback when persistence is disabled — events stay in memory for the next foreground retry.
  • flushToDisk / flushEventsToDisk are no-ops when maxDiskEvents === 0.

Log phrasing alignment

Matches iOS/Android canonical wording from the port guide so support can grep the same strings across platforms:

  • Network status changed: <old> -> <new>
  • Dispatcher paused — device is offline
  • Dispatcher resumed — device is online, triggering flush

Test plan

  • 237/237 tests pass across 16 suites (npx jest)
  • TypeScript compiles cleanly
  • Overflow: handleOverflow buffers + respects disk cap; flushes at 100-event threshold
  • Overflow: flushOverflowBufferToDisk merges with existing disk overflow and enforces cap
  • Drain: drainDiskToNetwork sends directly to network without entering memory queue
  • Drain: deletes file after completion; stops on network failure; filters expired events
  • Drain: FATAL_CONFIG fires onFatalConfig (parity with main flush path)
  • Dispatcher: onOverflow fires when queue drops events (replaces warn-and-drop)
  • Dispatcher: count-cap overflow parity with byte-cap overflow
  • Dispatcher: ring-buffer drop-oldest when persistence disabled (maxDiskEvents=0)
  • Cold start: hasDiskData flag stays in sync on write/delete/drain; no unconditional rehydrate
  • Debounce: offline immediate, online after 2s, pending online cancelled by offline, repeated online resets window
  • Offline→online: dual flush paths (memory + disk) run independently
  • Config: throws on negative maxDiskEvents; warns on maxDiskEvents < maxQueueEvents
  • Manual: verify iOS NWPathMonitor + disk drain end-to-end in RN app
  • Manual: verify Android ConnectivityManager + disk drain end-to-end in RN app

When the memory queue reaches capacity while offline, evicted events
now overflow to a batched in-memory buffer that flushes to a separate
disk file (via the native bridge). On reconnect, two independent flush
paths run: the memory queue flushes normally, and overflow events drain
directly from disk to network in batches of 100 — avoiding loading
10K events into a 2K memory queue.

Key changes:
- Dispatcher: onOverflow callback redirects evicted events, sendBatchDirect
  bypasses the queue for disk-to-network drain
- PersistentEventQueue: handleOverflow batched buffer, flushOverflowBufferToDisk,
  drainDiskToNetwork with TTL filtering
- NativeQueueStorage: overflow snapshot read/write/delete methods
- MetaRouterAnalyticsClient: wires overflow callback, dual flush on reconnect,
  disk drain on launch, maxOfflineDiskEvents passthrough
@choudlet choudlet changed the title feat: network-awarness [rn] feat: network-awarness and offline queing behavior Apr 15, 2026
Collapses the dual-file (snapshot + overflow-snapshot) JS abstraction into
one file backed by the existing native module's queue.v1.json. The overflow
native methods were never implemented in iOS/Android, so in production the
old "offline overflow" path silently dropped events; consolidation fixes
this latent bug by routing all persistence through the single native API
that already exists.

- NativeQueueStorage: drop read/write/deleteOverflowSnapshot; single
  read/write/deleteSnapshot interface
- PersistentEventQueue.flushEventsToDisk (was flushEventsToOverflowDisk)
  uses read-merge-cap-write semantics so crash-safety and offline-overflow
  writes share one disk store
- flushToDisk now drains the memory queue and appends to disk (was:
  overwrite snapshot with current queue contents), avoiding duplication
  when both memory and disk hold events
- Dispatcher.onFlushToOfflineStorage renamed to onFlushToDisk; the store
  is no longer offline-specific
- New Dispatcher.drainQueue() helper for the persistence layer to empty
  memory and reset byte counter atomically
- DEFAULT_MAX_OFFLINE_DISK_EVENTS renamed to DEFAULT_MAX_DISK_EVENTS
  internally; public InitOptions surface (maxOfflineDiskEvents) stays
  until the API rename commit

Native modules are unchanged (already target queue.v1.json and already
do atomic writes). Public InitOptions surface is unchanged.
Replaces rehydrate-into-memory with a cheap file-existence probe so a
large on-disk backlog no longer forces an allocation spike on cold start.
Memory queue stays empty through init; disk events are drained directly
to the network by the dispatcher when online.

- Native (iOS/Android): new exists() method returning a boolean without
  reading file contents
- PersistentEventQueue: rehydrate() removed, replaced by
  checkForPersistedEvents() which sets the hasDiskData flag from the
  cheap exists probe; hasDiskData is kept in sync on write/delete/drain
- MetaRouterAnalyticsClient.init(): calls checkForPersistedEvents
  instead of rehydrate; triggers drainDiskToNetwork only when online AND
  hasDiskData is true (no unconditional cold-start flush)
- getDebugInfo exposes hasDiskData (boolean) in place of rehydratedEvents
  (count); we no longer parse the file just to report its size
- TTL filter: preserve events with unparseable timestamps (conservative,
  matches iOS/Android) — previously they were silently dropped
Wraps the raw NetworkMonitor so consumers see:
- Offline transitions immediately (pause HTTP ASAP — no wasted retries
  on a down network)
- Online transitions only after 2 seconds of stable connectivity (absorb
  WiFi/cellular flaps that would otherwise trigger flush attempts that
  immediately fail)

A pending online debounce is cancelled if offline arrives during the
wait, and the debounce window resets on repeated raw online signals.

MetaRouterAnalyticsClient.init() now wraps the default NetworkMonitor
in DebouncedNetworkMonitor. An injected monitor (tests, custom reachability
providers) is used as-is, so callers control their own debounce.
Aligns the RN public surface with iOS/Android: InitOptions now exposes
`maxQueueEvents` (count cap, default 2000) and `maxDiskEvents` (default
10000). `maxQueueBytes` moves to an internal 5MB constant — byte caps
stay internal on all platforms until a customer needs to tune them.

- InitOptions: add maxQueueEvents, add maxDiskEvents, remove
  maxQueueBytes, remove maxOfflineDiskEvents
- Constructor: throws on negative maxDiskEvents (use 0 to disable);
  silently clamps maxQueueEvents to >= 1; warns when a non-zero
  maxDiskEvents is smaller than maxQueueEvents (memory could exceed
  what disk can preserve on background flush)
- Dispatcher: enforces BOTH count AND byte caps on enqueue and
  enqueueFront. New isPersistenceEnabled callback; when it returns
  false (maxDiskEvents === 0), capacity overflow becomes a ring
  buffer (drop oldest one-by-one) instead of splicing the whole
  queue to a disk handler that would silently drop
- Dispatcher.flush() offline branch skips the splice-to-disk
  callback when persistence is disabled — events stay in memory
  for the next foreground retry
- PersistentEventQueue.flushToDisk / flushEventsToDisk are no-ops
  when maxDiskEvents === 0 (documented kill-loss tradeoff)
- getDebugInfo: exposes maxQueueEvents and maxDiskEvents; drops
  maxQueueBytes from the returned shape
Before: the disk drain path silently deleted the disk store on
401/403/404 while the main flush path disabled the client. Asymmetric
— the client would stay "ready" but drop events until the next main
flush also hit the same error.

Now both paths converge on a shared Dispatcher.handleFatalConfig helper
that clears the queue, stops the flush loop, and fires onFatalConfig.

Also adds test coverage for:
- drain FATAL_CONFIG triggers onFatalConfig
- enqueue at the event-count cap (count-cap overflow parity with
  byte-cap overflow)
- ring-buffer drop-oldest when persistence is disabled (maxDiskEvents=0)
Matches the iOS/Android wording from the port guide so support can
grep the same strings across all platforms:

- "Network status changed: <old> -> <new>"
- "Dispatcher paused — device is offline"
- "Dispatcher resumed — device is online, triggering flush"

No behavior change. networkStatus + hasDiskData are already in
getDebugInfo from earlier commits; log set already omits the "deleted
during iteration" noise (reconcile, per-update, debounce-state).
@choudlet choudlet changed the title feat: network-awarness and offline queing behavior feat: offline disk overflow and network-aware persistence [rn] Apr 21, 2026
@choudlet choudlet merged commit 5e01319 into main Apr 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant