From 043d9152ea40a2dd0e9ccddd962879db703d82a0 Mon Sep 17 00:00:00 2001 From: Andrew Lake Date: Sat, 13 Jun 2026 19:58:09 -0600 Subject: [PATCH 1/5] Add local history cache and storage port --- FEATURES.md | 4 + README.md | 2 +- spec/04-clients/01-desktop.md | 2 +- spec/04-clients/02-web-app.md | 2 + spec/04-clients/04-local-cache.md | 341 ++++++++++++++++++ .../05-infrastructure/04-application-seams.md | 6 +- spec/README.md | 1 + 7 files changed, 355 insertions(+), 3 deletions(-) create mode 100644 spec/04-clients/04-local-cache.md diff --git a/FEATURES.md b/FEATURES.md index 2266d0e..eef617b 100644 --- a/FEATURES.md +++ b/FEATURES.md @@ -121,6 +121,8 @@ Everything below ships in the first usable release. The MVP is deliberately narr | Voice UI | Satellite selector, participant list, mute/deafen, per-user volume, push-to-talk, grid/speaker layouts | [04-clients/01-desktop](spec/04-clients/01-desktop.md) | | Message outbox | Messages held during disconnection, sent on reconnect, cleared after server ack | [04-clients/01-desktop](spec/04-clients/01-desktop.md) | | Memory discipline | Paginated chat (200 msgs/channel in DOM), image resizing via Rust, debounced resizes | [04-clients/01-desktop](spec/04-clients/01-desktop.md) | +| Local history cache | On-device persistent cache (IndexedDB on web, SQLite on desktop); cache-first prefill, progressive rendering, sliding window; offloads `chathistory` from Uplink | [04-clients/04-local-cache](spec/04-clients/04-local-cache.md) | +| Storage management | Settings -> Storage: per-buffer stats, eviction, JSON export, configurable per-buffer cap | [04-clients/04-local-cache](spec/04-clients/04-local-cache.md) | | `orbit://` + `satellite://` URI schemes | Deep links to server/channel/voice; standalone Satellite links | [04-clients/01-desktop](spec/04-clients/01-desktop.md) | | Screen sharing | Share screen in voice sessions | [02-satellite](spec/02-components/02-satellite.md) | | Multi-server client | Connect to multiple Orbit servers simultaneously | - | @@ -245,6 +247,8 @@ Federation requires server-to-server linking that is absent from stock Ergo. It | E2E encrypted file uploads | Client-side encryption before Depot upload | [03-depot](spec/02-components/03-depot.md) | | Richer presence | Presence history (last online), structured status states, server-managed avatars | [01-uplink/04-presence](spec/02-components/01-uplink/04-presence.md) | | Channel renaming | Optional, capability-gated (`draft/channel-rename`); blocked on registered channels, so `display-name` metadata is the rename path for established channels | [01-uplink/01-overview](spec/02-components/01-uplink/01-overview.md#channel-renaming) | +| Client-side search index | Local inverted index / SQLite FTS5 over the on-device cache; answers most searches offline and offloads the server's history backend | [04-clients/04-local-cache](spec/04-clients/04-local-cache.md#evaluation-performance-and-limits) | +| Chat list virtualization | Mount only the visible message range to push effective DOM cost below the live-window cap for very large buffers | [04-clients/04-local-cache](spec/04-clients/04-local-cache.md#evaluation-performance-and-limits) | ## Research diff --git a/README.md b/README.md index 0993075..21d5d97 100644 --- a/README.md +++ b/README.md @@ -35,7 +35,7 @@ The `spec/` directory contains the full design spec: | [Architecture](spec/01-architecture/) | System overview, design philosophy, component glossary, platform comparison | | [Components](spec/02-components/) | Uplink (incl. tag namespace and trust model), Satellite, Depot, Transponder | | [Identity and Auth](spec/03-identity/) | Authentication, permissions | -| [Clients](spec/04-clients/) | Desktop, web app, widget | +| [Clients](spec/04-clients/) | Desktop, web app, widget, local history cache | | [Infrastructure](spec/05-infrastructure/) | DNS discovery, deployment, monorepo | | [Next](spec/06-next/) | Mobile, bot API, push delivery, E2E encryption, server discovery, Satellite gateway | | [Decisions](spec/0A-decisions/) | ADRs, open questions, out-of-scope | diff --git a/spec/04-clients/01-desktop.md b/spec/04-clients/01-desktop.md index 6c249e7..2070bbf 100644 --- a/spec/04-clients/01-desktop.md +++ b/spec/04-clients/01-desktop.md @@ -174,7 +174,7 @@ DNS SRV resolution is handled by the desktop client's Rust resolver. For the ful | Concern | Strategy | |--------------------|--------------------------------------------------------------------------------------------------------------------------------------------| -| Chat history | Paginated loading. Keep at most 200 messages per channel in the DOM. Older messages are evicted and re-fetched on scroll-up via `chathistory`. | +| Chat history | Paginated loading. Keep at most 200 messages per channel in the DOM (the *live window*). Older messages are evicted from the DOM and served from the on-device [local cache](04-local-cache.md), which falls back to `chathistory` only when the cache is exhausted. | | Image rendering | Images are proxied and resized by the Rust backend (or server-side) before display. No raw multi-MB images loaded into the WebView. | | Large IPC payloads | File downloads and bulk history loads are served via Tauri's custom protocol handler (`tauri://`), not JSON-serialized IPC. | | Layout thrashing | Resize events are debounced aggressively (200ms minimum). This works around a known WebKitGTK memory leak on Linux triggered by rapid resize cycles. | diff --git a/spec/04-clients/02-web-app.md b/spec/04-clients/02-web-app.md index 4655111..ec5869d 100644 --- a/spec/04-clients/02-web-app.md +++ b/spec/04-clients/02-web-app.md @@ -33,6 +33,7 @@ export interface Platform { deepLinks: DeepLinkPort | null // null in the browser fileTransfer: FileTransferPort dns: DnsPort | null // null in the browser; resolver endpoint used instead + historyCache: HistoryCachePort | null // IndexedDB on web, SQLite on desktop; null when storage is blocked } ``` @@ -93,6 +94,7 @@ The complete capability comparison across all three surfaces: |------------------------------------------|--------------------------------|---------------------------------------------|----------------------------------------| | Text chat (full) | Yes | Yes | Yes | | Message history / scrollback | Yes | Yes | Yes (limited to recent on load) | +| Persistent local history cache | Yes (SQLite, disk-bounded) | Yes (IndexedDB, quota-bounded, best-effort) | Ephemeral (in-memory; cache may be `null`) | | Message retractions | MVP (via `draft/message-redaction`) | MVP (via `draft/message-redaction`) | MVP (via `draft/message-redaction`) | | Message editing | Post-Uplink | Post-Uplink | Post-Uplink | | Rich rendering (links, images, Markdown) | Yes | Yes | Yes | diff --git a/spec/04-clients/04-local-cache.md b/spec/04-clients/04-local-cache.md new file mode 100644 index 0000000..4f75715 --- /dev/null +++ b/spec/04-clients/04-local-cache.md @@ -0,0 +1,341 @@ +# Local History Cache & Storage + +Every Orbit client keeps a persistent, on-device copy of the message history it has +seen. This is distinct from the in-memory **live window** described in +[Memory Discipline](01-desktop.md#memory-discipline): the live window is what is mounted in +the DOM right now (capped, evicted aggressively); the **local cache** is the durable archive +that sits underneath it and outlives the session. + +This page covers why the cache exists, how it is keyed and paged, the progressive-rendering +path that makes large buffers feel instant, the platform seam that backs it (IndexedDB on the +web, SQLite on the desktop), the storage-management surface, and the honest limits in each +environment. + +Cross-references: +- [Desktop - Memory Discipline](01-desktop.md#memory-discipline) - the in-DOM live window this cache feeds +- [Desktop - Reconnection Flow](01-desktop.md#reconnection-flow) - `CHATHISTORY` reconciliation the cache piggybacks on +- [Uplink - Interaction with Chat History](../02-components/01-uplink/01-overview.md#interaction-with-chat-history) - how retractions and replies replay through `chathistory` +- [Application Seams](../05-infrastructure/04-application-seams.md) - the capability-port pattern the cache is implemented as + +## Why a Local Cache Is the Right Move for IRC + +The IRC history model is unusually friendly to client-side caching, more so than the proprietary +APIs Orbit is measured against. Three properties make it close to ideal: + +- **Records are immutable and append-only.** A `PRIVMSG` that has been delivered never changes. + The only post-delivery mutations are retractions (which Orbit renders as a *tombstone* overlay, + never a content edit - see [Retractions](../02-components/01-uplink/01-overview.md#retractions)) + and, post-MVP, message editing. The cache is therefore a write-once store with rare overlay + events, not a cache-coherence problem. There is almost nothing to invalidate. +- **There is a stable, server-assigned dedup key.** Every message carries a `msgid` + (`message-ids`) and an authoritative `server-time`. The `msgid` is the cache primary key and the + deduplication key; `server-time` is the sort key. Live messages, `echo-message` self-copies, and + `chathistory` replays all collapse to the same record by `msgid` with no client-side guessing. +- **Delivery is already batched.** `chathistory` responses arrive inside a `batch`, pre-chunked by + the server. The cache writes a batch as a unit and the renderer consumes it incrementally. + +### Offloading Uplink + +Ergo's history retention is **operator-configured and intentionally bounded** - the reference +guidance is a window such as 7 days or 10,000 messages per channel (see +[Ergochat Configuration](../02-components/01-uplink/01-overview.md#ergochat-configuration)). The +server is the source of truth, but it is not a long-term archive, and every `chathistory` request +costs the server CPU, disk reads, and a round-trip. + +A local cache changes the load profile in Orbit's favour: + +- **Scrollback is served from disk, not the server.** Re-opening a channel and scrolling up a few + hundred messages hits zero network round-trips when the data is already cached. On a `$5 VPS` + serving a whole community, removing repeated `chathistory` paging from every client's normal + navigation is a material reduction in server work. +- **The cache outlives server retention.** Messages that have aged out of Ergo's retention window + remain readable on every device that saw them live. Each client becomes its own incremental + archive of the conversations it participates in, without the operator having to provision + unbounded server-side storage. +- **Reconnection fetches a delta, not a window.** On reconnect the client already knows the newest + `msgid` it has cached per target, so it asks the server only for what it missed + (`CHATHISTORY LATEST ... after the cached tip`) instead of re-pulling a full seed. See + [Reconnection Flow](01-desktop.md#reconnection-flow). +- **It is the foundation for client-side search.** Full-text search ships in the MVP via Ergo's + history backends (see the [Feature Map](../../FEATURES.md)). A populated local cache lets a client + answer many searches locally - instant, offline-capable, and without hammering the server's FTS + index for every keystroke. Server search remains authoritative for history the client never saw. + +> The cache never weakens the trust model. It stores what the server delivered, keyed by the +> server-asserted `account-tag` and `msgid`. It is a performance and availability layer, not a new +> source of authority. See [Tag Integrity and Trust Model](../02-components/01-uplink/02-tags/02-trust-model.md). + +## Storage Model + +The cache is scoped **per account, per target**. "Target" (buffer) means any channel or DM query - +the same notion `CHATHISTORY TARGETS` uses. Keying by account keeps multiple identities on a shared +device (or multiple servers, in the multi-server client) from colliding. + +Two logical stores: + +| Store | Key | Holds | +|------------|------------------------------|-------| +| `messages` | `msgid` (with a `[target, server_time]` index) | One record per delivered message | +| `buffers` | `target` (channel/DM name) | Per-target metadata: oldest/newest cached `msgid` and timestamp, cached count, last reconcile time, cap override | + +A `messages` record stores exactly what the renderer and search need, and nothing the trust model +forbids reconstructing: + +```ts +interface CachedMessage { + msgid: string // primary key, server-assigned (message-ids) + target: string // channel or DM this belongs to + serverTime: number // sort key, from server-time (epoch ms) + account: string | null // server-asserted author identity (account-tag), null for unauthenticated + nick: string // nick at send time (display only; account is authoritative) + type: "privmsg" | "notice" | "action" + text: string + tags: Record // surviving +orbit/* and +draft/* tags (reply ref, reactions, etc.) + redacted?: boolean // tombstone overlay; original text is NOT retained when set +} +``` + +Deduplication is always by `msgid`. The live socket path and the `chathistory` path both upsert into +the same store, so an overlap between a freshly received message and a cached one resolves to a +single record rather than a visible duplicate. + +## Seeding, Paging, and the Sliding Window + +Three tunables govern the data path. They are defaults, not protocol constants, and the desktop +client can run them higher than the web app (see [Environment Limits](#environment-limits-and-constraints)). + +| Knob | Role | Reference default | +|---------------------|------|-------------------| +| `CACHE_SEED_COUNT` | How many recent messages are rendered immediately from cache when a target is opened | ~150 | +| `CACHE_PAGE_SIZE` | How many older messages are pulled per scroll-up page | ~50 | +| `MAX_LIVE_MESSAGES` | Hard cap on messages kept in the reactive in-memory buffer (the DOM path) | 200 (matches [Memory Discipline](01-desktop.md#memory-discipline)) | + +The cache is a strict superset of the live window: the live window is a `MAX_LIVE_MESSAGES`-sized +view that slides over a much larger cached (and ultimately server-side) history. + +### Opening a Target (Prefill) + +```mermaid +sequenceDiagram + participant U as User + participant V as Live Window (DOM) + participant C as Local Cache + participant S as Uplink (Ergo) + + U->>V: Open #channel + V->>C: read seed (newest CACHE_SEED_COUNT) + C-->>V: cached messages + Note over V: Render instantly, no network wait + V->>S: CHATHISTORY LATEST #channel * + S-->>V: delta batch (messages since tip) + V->>C: upsert delta (dedupe by msgid) + Note over V: Reconcile tail; cache and view converge +``` + +Prefill is the key UX win: the buffer paints from disk on the same frame the user clicks, and the +network reconciliation only ever fills the small gap between the cached tip and now. A cold target +(nothing cached) falls back to a normal `CHATHISTORY LATEST ... *` seed and populates the cache from +the response. + +### Scrolling Back (Backward Paging) + +`fetchOlderHistory()` is **cache-first**: when the user scrolls toward the top, it prepends the next +`CACHE_PAGE_SIZE` older messages from the cache. Only when the cache is exhausted at its oldest +cached `msgid` does it issue `CHATHISTORY BEFORE ` to the server, then writes +that page back into the cache so the next scroll is local again. + +### Scrolling Forward (Forward Paging) + +When the live window has been trimmed at the tail (the user scrolled far up, so newer messages were +evicted from the DOM to honour `MAX_LIVE_MESSAGES`), scrolling back down calls +`fetchNewerFromCache()`: it appends newer messages from the cache and trims from the head, keeping a +sliding window of constant size. A `tailTrimmed` flag tracks whether the live window's newest message +is still the buffer's true tip; when it is, new live messages append directly and no forward fetch is +needed. + +This backward/forward symmetry is what keeps a multi-thousand-message buffer navigable with a bounded +DOM: the user can scroll arbitrarily far in either direction and the renderer never holds more than +`MAX_LIVE_MESSAGES` nodes. + +## Progressive Rendering + +A naive "render the whole seed at once" approach janks on large seeds because layout and paint for +hundreds of message components block the main thread. Orbit clients render the seed **incrementally**: + +- The seed is committed to the reactive buffer in `RENDER_CHUNK`-sized increments (e.g. 25-50 + messages), one chunk per animation frame (`requestAnimationFrame`), until the seed is fully + mounted or `MAX_LIVE_MESSAGES` is reached. +- The first chunk is the visible viewport's worth of messages, so the user sees content on the first + frame; subsequent chunks fill in above/below as frames are available. +- Scroll position is anchored to a stable message (by `msgid`) across chunk commits so the viewport + does not jump while earlier messages mount in. + +This staged approach (which superseded an earlier binary "render 50, then render the rest" scheme) +keeps the main thread responsive during the initial paint and during large scroll-driven page loads. + +> For extremely large buffers, list virtualization (mounting only the visible range) is the +> end-state optimization. The incremental-commit approach above is the MVP mechanism; virtualization +> is tracked as a [performance follow-up](#evaluation-performance-and-limits). + +## The Cache as a Platform Capability + +Persistent storage is exactly the kind of environment-specific capability the +[Application Seams](../05-infrastructure/04-application-seams.md) model is built for. The cache is a +**capability port** on the `Platform` contract, not an `if (isTauri)` branch inside a store. `core` +calls the port; it never knows whether records land in IndexedDB or SQLite. + +```ts +// packages/core/src/platform/index.ts (addition to the Platform contract) +export interface HistoryCachePort { + seed(target: string, limit: number): Promise + pageBefore(target: string, beforeMsgid: string, limit: number): Promise + pageAfter(target: string, afterMsgid: string, limit: number): Promise + upsert(messages: CachedMessage[]): Promise // batched, dedupes by msgid + markRedacted(msgid: string): Promise // tombstone overlay + // Storage management surface + bufferStats(): Promise + prune(target: string, keepCount: number): Promise + export(target: string): Promise + clear(): Promise // wipe this account's cache +} + +export interface BufferStats { + target: string + count: number + estimatedBytes: number + oldest: number // server-time epoch ms + newest: number +} +``` + +| Target | Adapter | Backing store | +|-------------------|--------------------|---------------| +| Web app / PWA | `web.ts` | IndexedDB (one object store per logical store, indexed on `[target, serverTime]`) | +| Desktop (Tauri) | `tauri.ts` | SQLite via the Rust backend (one table, indexed) - served over the custom protocol for bulk reads, not JSON IPC | +| Widget | `web.ts`, degraded | IndexedDB if available, else `null` port -> ephemeral in-memory only (recent on load) | +| Mobile (Tauri) | `tauri-mobile.ts` | reuses the desktop SQLite adapter | + +When the port is `null` (storage blocked, private-browsing IndexedDB denial, or widget mode), core +degrades explicitly: it runs with an in-memory buffer only and the seed/prefill path simply has +nothing to read. This is the same `null`-port degradation pattern used for `tray` and `deepLinks`. + +Bulk history reads on the desktop go over Tauri's custom protocol handler rather than JSON-serialized +IPC, consistent with the [Large IPC payloads](01-desktop.md#memory-discipline) rule. + +## Cache Lifecycle + +### A Detached, App-Scoped Writer + +The write path **must not be owned by a view component.** An early prototype scoped cache syncing to +the chat-surface component and only persisted the currently-mounted buffers (a "3 of 8 buffers +cached" bug appeared when the other targets were never mounted). The fix is to own the cache writer +at **app scope** - a store/composable that subscribes to the message stream for *all* joined targets +for the whole session, independent of which view is mounted. Component mount/unmount changes what is +*rendered*, never what is *persisted*. + +### Write Path + +- Live messages and `chathistory` batches are written through `upsert()` in **debounced batches** + rather than one write per message, to amortize transaction overhead (IndexedDB transaction + setup, SQLite write locks). +- Retractions call `markRedacted(msgid)`, which sets the tombstone flag and drops the stored `text`. + The original content is never retained, matching the server contract. +- Edits (post-MVP) update the record's `text` in place, keyed by the edited message's `msgid`. + +### Invalidation and Clearing + +There is almost nothing to invalidate (records are immutable). The two clearing paths are: + +- **User-initiated** - the [Storage management surface](#storage-management-surface) below. +- **Force refresh** - the client's hard-reset shortcut (Ctrl+Shift+R), which already clears + `localStorage` for settings, is extended to also call `HistoryCachePort.clear()` so a force refresh + wipes the on-device history archive and reloads from the server, scoped to the active account. + +## Storage Management Surface + +The cache is durable and can grow large, so it is a first-class surface in **Settings -> Storage**, +not a hidden implementation detail. It exposes: + +- **Per-buffer stats** - target name, cached message count, estimated bytes, and the cached date + range, via `bufferStats()`. +- **A live total** - aggregate message count and estimated size across all buffers, plus the + environment quota and headroom (web) reported by `navigator.storage.estimate()`. +- **Eviction** - per-buffer "Evict old messages" (`prune(target, keepCount)`, oldest-first) and a + global "Evict all". Eviction respects the live window: it never evicts what is currently mounted. +- **Export** - per-buffer "Export as JSON" (`export(target)`) so a user can keep their own archive + independent of any server or device. +- **A configurable per-buffer cap** - a setting (e.g. `chat.cacheMaxMessagesPerBuffer`, default in the + low tens of thousands) that triggers an oldest-first prune pass after writes exceed it. Changing the + cap propagates to the cache layer and can trigger a one-time prune of existing over-cap buffers. + +Eviction policy is **oldest-first within a target**, never cross-target, so heavily-used channels do +not evict each other. The cap is per-buffer rather than global to keep accounting simple and +predictable for the user. + +## Environment Limits and Constraints + +The honest constraint differences between surfaces drive the defaults above. + +### Web App / PWA (IndexedDB) + +- **Quota is browser-managed and shared.** Chromium grants an origin a large slice (commonly up to + ~60% of free disk, shared across the origin's storage); Firefox uses group/eTLD+1 limits; Safari is + the tightest. The client must treat quota as finite: read `navigator.storage.estimate()`, surface it + in the Storage tab, and prune before writes when near the limit rather than letting a write throw + `QuotaExceededError`. +- **Best-effort storage can be evicted by the browser.** Under storage pressure the browser may clear + a non-persistent origin's IndexedDB. The client SHOULD request `navigator.storage.persist()` (granted + more readily for installed PWAs and engaged origins) to opt into persistent storage and reduce + surprise eviction. +- **WebKit time-based eviction.** Safari may evict script-writable storage for sites without sufficient + user engagement after a period of inactivity. Installing the PWA and the `persist()` request mitigate + this; the cache is designed to degrade gracefully (re-seed from `chathistory`) if it is wiped. +- **Structured-clone cost.** Large IndexedDB reads deserialize on the main thread. For big seeds this + is mitigated by chunked reads and, as a follow-up, moving cache I/O to a Web Worker. +- **Practical posture.** Keep web defaults moderate (`CACHE_SEED_COUNT ~150`, per-buffer cap in the low + tens of thousands). The web cache is an accelerator and a recent-history archive, not an unbounded + one. + +### Desktop / Mobile (SQLite) + +- **No browser quota and no surprise eviction.** SQLite is bounded only by disk. The standalone clients + do not suffer the web's quota or time-based eviction problems, so they can run a larger seed, a higher + per-buffer cap, and effectively a complete personal archive of every conversation they have seen. +- **Bulk reads are cheap and off the WebView.** Paging and search queries run in Rust against an + indexed table and stream results over the custom protocol, avoiding the WebView's main-thread + deserialization cost entirely. +- **This asymmetry is intentional.** The same `HistoryCachePort` contract backs both; only the limits + and tuning differ. A user who wants a permanent, searchable archive runs the desktop client; the web + app gives most of the benefit within the browser's constraints. + +## Evaluation: Performance and Limits + +What the design gets right today: + +- **Instant target switches** via cache prefill; the network only fills the tail delta. +- **Bounded DOM** regardless of buffer size via the sliding live window plus incremental rendering. +- **Reduced server load** - normal scrollback and reconnection are deltas, not full re-pulls. +- **Trivial coherence** - immutable, `msgid`-keyed records with tombstone overlays; no invalidation + graph. + +Where it can get faster (tracked follow-ups, not MVP blockers): + +- **List virtualization** for very large mounted ranges, to push the effective DOM cost below + `MAX_LIVE_MESSAGES` and make the live-window cap a soft target. +- **Worker-thread cache I/O** on the web, to keep structured-clone deserialization off the main + thread for large reads. +- **Batch-write tuning** - larger debounce windows and "prune-before-write" under quota pressure to + avoid `QuotaExceededError` round-trips. +- **A local search index** - an inverted index built over the cache (or SQLite FTS5 on the desktop) + so the client answers most searches locally and only escalates to the server's history backend for + history it never cached. This is the natural next layer on top of a populated cache and the biggest + future server-offload win. + +Known limits to keep in view: + +- The web cache can be evicted by the browser; treat it as a fast, best-effort layer with the server + as the durable fallback. The desktop cache is the durable one. +- Estimated byte sizes in the Storage tab are approximate (structured-clone/row overhead is not + exact); they are for user guidance, not billing. +- The cache cannot recover content the server never delivered or that was retracted - it stores only + what the client legitimately received. diff --git a/spec/05-infrastructure/04-application-seams.md b/spec/05-infrastructure/04-application-seams.md index 136239d..6ff3c8e 100644 --- a/spec/05-infrastructure/04-application-seams.md +++ b/spec/05-infrastructure/04-application-seams.md @@ -8,6 +8,7 @@ Cross-references: - [03-monorepo.md](03-monorepo.md) - Directory structure, build commands, CI - [../04-clients/02-web-app.md](../04-clients/02-web-app.md) - The capability matrix and the platform adapter from the client's perspective - [../04-clients/01-desktop.md](../04-clients/01-desktop.md) - The Tauri shell that backs the desktop adapter +- [../04-clients/04-local-cache.md](../04-clients/04-local-cache.md) - The `HistoryCachePort` capability (IndexedDB on web, SQLite on desktop) ## Dependency Direction @@ -57,7 +58,7 @@ import { createWebPlatform } from "platform" `packages/core/src/platform/index.ts` is the seam. It is the one file `core` and every adapter both agree on. It defines: -1. **Capability ports** - small interfaces, one per capability that differs across environments (`NotificationPort`, `TrayPort`, `AudioDevicePort`, `DeepLinkPort`, `FileTransferPort`, `DnsPort`). +1. **Capability ports** - small interfaces, one per capability that differs across environments (`NotificationPort`, `TrayPort`, `AudioDevicePort`, `DeepLinkPort`, `FileTransferPort`, `DnsPort`, `HistoryCachePort`). 2. **The `Platform` interface** - an object holding one instance of each port, plus a `target` discriminator. A port an environment cannot provide is `null`. 3. **The injection plumbing** - a Vue `InjectionKey`, `providePlatform(app, platform)`, and `usePlatform()`. @@ -70,6 +71,7 @@ export interface Platform { readonly deepLinks: DeepLinkPort | null // null in the browser readonly fileTransfer: FileTransferPort readonly dns: DnsPort | null // null in the browser + readonly historyCache: HistoryCachePort | null // IndexedDB (web) / SQLite (desktop); null when storage is blocked } export const PLATFORM_KEY: InjectionKey = Symbol("orbit-platform") @@ -101,6 +103,7 @@ export function createWebPlatform(): Platform { deepLinks: null, // no orbit:// handler in the browser fileTransfer: createFileTransferPort(), dns: null, // resolver endpoint used instead + historyCache: createIndexedDbCachePort(), // null if IndexedDB is unavailable } } ``` @@ -192,6 +195,7 @@ const platform: Platform = { deepLinks: null, fileTransfer: { download: async () => {} }, dns: null, + historyCache: null, } // provide(PLATFORM_KEY, platform) in the test harness, then mount the component. ``` diff --git a/spec/README.md b/spec/README.md index 482a24c..3b9ca96 100644 --- a/spec/README.md +++ b/spec/README.md @@ -35,6 +35,7 @@ This directory contains the full set of Orbit design specifications, organized i - [Desktop](04-clients/01-desktop.md) - Tauri v2 + Vue desktop client: features, URI scheme, memory discipline, reconnection - [Web App](04-clients/02-web-app.md) - Web app and PWA: platform adapter, service worker, capability matrix - [Widget](04-clients/03-widget.md) - Embeddable iframe widget mode +- [Local History Cache & Storage](04-clients/04-local-cache.md) - On-device history cache, progressive loading, IndexedDB/SQLite seam, storage management ## Infrastructure From 09d20e99f3a8b45243fa0233c5a9002ab35a3423 Mon Sep 17 00:00:00 2001 From: Andrew Lake Date: Sun, 14 Jun 2026 15:25:43 -0600 Subject: [PATCH 2/5] Shorten local-cache documentatino --- spec/04-clients/04-local-cache.md | 346 +++++++++++++----------------- 1 file changed, 144 insertions(+), 202 deletions(-) diff --git a/spec/04-clients/04-local-cache.md b/spec/04-clients/04-local-cache.md index 4f75715..ac79369 100644 --- a/spec/04-clients/04-local-cache.md +++ b/spec/04-clients/04-local-cache.md @@ -1,15 +1,9 @@ # Local History Cache & Storage -Every Orbit client keeps a persistent, on-device copy of the message history it has -seen. This is distinct from the in-memory **live window** described in -[Memory Discipline](01-desktop.md#memory-discipline): the live window is what is mounted in -the DOM right now (capped, evicted aggressively); the **local cache** is the durable archive -that sits underneath it and outlives the session. - -This page covers why the cache exists, how it is keyed and paged, the progressive-rendering -path that makes large buffers feel instant, the platform seam that backs it (IndexedDB on the -web, SQLite on the desktop), the storage-management surface, and the honest limits in each -environment. +Every Orbit client keeps a persistent, on-device copy of the message history it has seen. This is +distinct from the in-memory **live window** in [Memory Discipline](01-desktop.md#memory-discipline): +the live window is what is mounted in the DOM right now (capped, evicted aggressively); the **local +cache** is the durable archive underneath it that outlives the session. Cross-references: - [Desktop - Memory Discipline](01-desktop.md#memory-discipline) - the in-DOM live window this cache feeds @@ -17,69 +11,55 @@ Cross-references: - [Uplink - Interaction with Chat History](../02-components/01-uplink/01-overview.md#interaction-with-chat-history) - how retractions and replies replay through `chathistory` - [Application Seams](../05-infrastructure/04-application-seams.md) - the capability-port pattern the cache is implemented as -## Why a Local Cache Is the Right Move for IRC +## Why a Local Cache Fits IRC -The IRC history model is unusually friendly to client-side caching, more so than the proprietary -APIs Orbit is measured against. Three properties make it close to ideal: +The IRC history model is unusually cache-friendly: -- **Records are immutable and append-only.** A `PRIVMSG` that has been delivered never changes. - The only post-delivery mutations are retractions (which Orbit renders as a *tombstone* overlay, - never a content edit - see [Retractions](../02-components/01-uplink/01-overview.md#retractions)) - and, post-MVP, message editing. The cache is therefore a write-once store with rare overlay - events, not a cache-coherence problem. There is almost nothing to invalidate. -- **There is a stable, server-assigned dedup key.** Every message carries a `msgid` - (`message-ids`) and an authoritative `server-time`. The `msgid` is the cache primary key and the - deduplication key; `server-time` is the sort key. Live messages, `echo-message` self-copies, and - `chathistory` replays all collapse to the same record by `msgid` with no client-side guessing. +- **Records are immutable and append-only.** A delivered `PRIVMSG` never changes. The only + post-delivery mutations are retractions (rendered as a *tombstone* overlay, never a content edit - + see [Retractions](../02-components/01-uplink/01-overview.md#retractions)) and, post-MVP, editing. + It is a write-once store with rare overlay events, not a cache-coherence problem. +- **There is a stable server-assigned dedup key.** Every message carries a `msgid` (`message-ids`, + the primary/dedup key) and an authoritative `server-time` (the sort key). Live messages, + `echo-message` self-copies, and `chathistory` replays all collapse to one record by `msgid`. - **Delivery is already batched.** `chathistory` responses arrive inside a `batch`, pre-chunked by - the server. The cache writes a batch as a unit and the renderer consumes it incrementally. + the server. The cache writes a batch as a unit; the renderer consumes it incrementally. ### Offloading Uplink -Ergo's history retention is **operator-configured and intentionally bounded** - the reference -guidance is a window such as 7 days or 10,000 messages per channel (see -[Ergochat Configuration](../02-components/01-uplink/01-overview.md#ergochat-configuration)). The -server is the source of truth, but it is not a long-term archive, and every `chathistory` request -costs the server CPU, disk reads, and a round-trip. - -A local cache changes the load profile in Orbit's favour: - -- **Scrollback is served from disk, not the server.** Re-opening a channel and scrolling up a few - hundred messages hits zero network round-trips when the data is already cached. On a `$5 VPS` - serving a whole community, removing repeated `chathistory` paging from every client's normal - navigation is a material reduction in server work. -- **The cache outlives server retention.** Messages that have aged out of Ergo's retention window - remain readable on every device that saw them live. Each client becomes its own incremental - archive of the conversations it participates in, without the operator having to provision - unbounded server-side storage. -- **Reconnection fetches a delta, not a window.** On reconnect the client already knows the newest - `msgid` it has cached per target, so it asks the server only for what it missed - (`CHATHISTORY LATEST ... after the cached tip`) instead of re-pulling a full seed. See +Ergo's history retention is **operator-configured and intentionally bounded** (reference guidance: +~7 days or 10,000 messages per channel - see +[Ergochat Configuration](../02-components/01-uplink/01-overview.md#ergochat-configuration)). It is +the source of truth but not a long-term archive, and every `chathistory` request costs CPU, disk, +and a round-trip. A local cache shifts the load profile: + +- **Scrollback is served from disk.** Re-opening a channel and scrolling up hits zero round-trips + when the data is cached - a material reduction in work on a `$5 VPS` serving a whole community. +- **The cache outlives server retention.** Messages aged out of Ergo's window stay readable on every + device that saw them live, with no unbounded server-side storage. +- **Reconnection fetches a delta, not a window.** The client knows its newest cached `msgid` per + target and asks only for what it missed (`CHATHISTORY LATEST ... after the cached tip`). See [Reconnection Flow](01-desktop.md#reconnection-flow). -- **It is the foundation for client-side search.** Full-text search ships in the MVP via Ergo's - history backends (see the [Feature Map](../../FEATURES.md)). A populated local cache lets a client - answer many searches locally - instant, offline-capable, and without hammering the server's FTS - index for every keystroke. Server search remains authoritative for history the client never saw. +- **It is the foundation for client-side search.** A populated cache answers many searches locally - + instant, offline-capable, without hammering the server FTS index. Server search stays + authoritative for history the client never saw. > The cache never weakens the trust model. It stores what the server delivered, keyed by the -> server-asserted `account-tag` and `msgid`. It is a performance and availability layer, not a new -> source of authority. See [Tag Integrity and Trust Model](../02-components/01-uplink/02-tags/02-trust-model.md). +> server-asserted `account-tag` and `msgid` - a performance/availability layer, not a new source of +> authority. See [Tag Integrity and Trust Model](../02-components/01-uplink/02-tags/02-trust-model.md). ## Storage Model -The cache is scoped **per account, per target**. "Target" (buffer) means any channel or DM query - -the same notion `CHATHISTORY TARGETS` uses. Keying by account keeps multiple identities on a shared -device (or multiple servers, in the multi-server client) from colliding. - -Two logical stores: +The cache is scoped **per account, per target** ("target" = any channel or DM query, as in +`CHATHISTORY TARGETS`). Keying by account keeps multiple identities (or servers) on a shared device +from colliding. Two logical stores: | Store | Key | Holds | |------------|------------------------------|-------| | `messages` | `msgid` (with a `[target, server_time]` index) | One record per delivered message | | `buffers` | `target` (channel/DM name) | Per-target metadata: oldest/newest cached `msgid` and timestamp, cached count, last reconcile time, cap override | -A `messages` record stores exactly what the renderer and search need, and nothing the trust model -forbids reconstructing: +A `messages` record stores exactly what the renderer and search need: ```ts interface CachedMessage { @@ -95,23 +75,22 @@ interface CachedMessage { } ``` -Deduplication is always by `msgid`. The live socket path and the `chathistory` path both upsert into -the same store, so an overlap between a freshly received message and a cached one resolves to a -single record rather than a visible duplicate. +Dedup is always by `msgid`: the live socket path and the `chathistory` path upsert into the same +store, so overlaps resolve to a single record rather than a visible duplicate. ## Seeding, Paging, and the Sliding Window -Three tunables govern the data path. They are defaults, not protocol constants, and the desktop -client can run them higher than the web app (see [Environment Limits](#environment-limits-and-constraints)). +Three tunables govern the data path - defaults, not protocol constants. The desktop client can run +them higher than the web app (see [Environment Limits](#environment-limits-and-constraints)). | Knob | Role | Reference default | |---------------------|------|-------------------| -| `CACHE_SEED_COUNT` | How many recent messages are rendered immediately from cache when a target is opened | ~150 | -| `CACHE_PAGE_SIZE` | How many older messages are pulled per scroll-up page | ~50 | -| `MAX_LIVE_MESSAGES` | Hard cap on messages kept in the reactive in-memory buffer (the DOM path) | 200 (matches [Memory Discipline](01-desktop.md#memory-discipline)) | +| `CACHE_SEED_COUNT` | Recent messages rendered immediately from cache on open | ~150 | +| `CACHE_PAGE_SIZE` | Older messages pulled per scroll-up page | ~50 | +| `MAX_LIVE_MESSAGES` | Hard cap on messages in the reactive in-memory buffer (DOM path) | 200 (matches [Memory Discipline](01-desktop.md#memory-discipline)) | -The cache is a strict superset of the live window: the live window is a `MAX_LIVE_MESSAGES`-sized -view that slides over a much larger cached (and ultimately server-side) history. +The cache is a strict superset of the live window: a `MAX_LIVE_MESSAGES`-sized view sliding over a +much larger cached (and ultimately server-side) history. ### Opening a Target (Prefill) @@ -122,67 +101,58 @@ sequenceDiagram participant C as Local Cache participant S as Uplink (Ergo) - U->>V: Open #channel + U->>V: Open channel V->>C: read seed (newest CACHE_SEED_COUNT) C-->>V: cached messages Note over V: Render instantly, no network wait - V->>S: CHATHISTORY LATEST #channel * + V->>S: CHATHISTORY LATEST channel newest-cached-msgid * S-->>V: delta batch (messages since tip) V->>C: upsert delta (dedupe by msgid) Note over V: Reconcile tail; cache and view converge ``` -Prefill is the key UX win: the buffer paints from disk on the same frame the user clicks, and the -network reconciliation only ever fills the small gap between the cached tip and now. A cold target -(nothing cached) falls back to a normal `CHATHISTORY LATEST ... *` seed and populates the cache from -the response. - -### Scrolling Back (Backward Paging) +The buffer paints from disk on the frame the user clicks; reconciliation only fills the gap between +the cached tip and now. A cold target falls back to a normal `CHATHISTORY LATEST ... *` seed and +populates the cache from the response. -`fetchOlderHistory()` is **cache-first**: when the user scrolls toward the top, it prepends the next -`CACHE_PAGE_SIZE` older messages from the cache. Only when the cache is exhausted at its oldest -cached `msgid` does it issue `CHATHISTORY BEFORE ` to the server, then writes -that page back into the cache so the next scroll is local again. +### Backward and Forward Paging -### Scrolling Forward (Forward Paging) +`fetchOlderHistory()` is **cache-first**: scrolling toward the top prepends the next +`CACHE_PAGE_SIZE` older messages from cache. Only when the cache is exhausted at its oldest `msgid` +does it issue `CHATHISTORY BEFORE `, writing the page back so the next scroll is +local again. -When the live window has been trimmed at the tail (the user scrolled far up, so newer messages were -evicted from the DOM to honour `MAX_LIVE_MESSAGES`), scrolling back down calls -`fetchNewerFromCache()`: it appends newer messages from the cache and trims from the head, keeping a -sliding window of constant size. A `tailTrimmed` flag tracks whether the live window's newest message -is still the buffer's true tip; when it is, new live messages append directly and no forward fetch is -needed. +When the live window has been trimmed at the tail (user scrolled far up, newer messages evicted to +honour `MAX_LIVE_MESSAGES`), scrolling back down calls `fetchNewerFromCache()`: append newer +messages, trim from the head, keeping a constant-size window. A `tailTrimmed` flag tracks whether the +live window's newest message is still the buffer's true tip; when it is, live messages append +directly and no forward fetch is needed. -This backward/forward symmetry is what keeps a multi-thousand-message buffer navigable with a bounded -DOM: the user can scroll arbitrarily far in either direction and the renderer never holds more than -`MAX_LIVE_MESSAGES` nodes. +This symmetry keeps a multi-thousand-message buffer navigable with a bounded DOM - the user scrolls +arbitrarily far either way and the renderer never holds more than `MAX_LIVE_MESSAGES` nodes. ## Progressive Rendering -A naive "render the whole seed at once" approach janks on large seeds because layout and paint for -hundreds of message components block the main thread. Orbit clients render the seed **incrementally**: +Rendering a whole seed at once janks because layout/paint for hundreds of components blocks the main +thread. Clients render the seed **incrementally**: -- The seed is committed to the reactive buffer in `RENDER_CHUNK`-sized increments (e.g. 25-50 - messages), one chunk per animation frame (`requestAnimationFrame`), until the seed is fully - mounted or `MAX_LIVE_MESSAGES` is reached. -- The first chunk is the visible viewport's worth of messages, so the user sees content on the first - frame; subsequent chunks fill in above/below as frames are available. -- Scroll position is anchored to a stable message (by `msgid`) across chunk commits so the viewport - does not jump while earlier messages mount in. - -This staged approach (which superseded an earlier binary "render 50, then render the rest" scheme) -keeps the main thread responsive during the initial paint and during large scroll-driven page loads. +- Committed to the reactive buffer in `RENDER_CHUNK`-sized increments (e.g. 25-50 messages), one + chunk per `requestAnimationFrame`, until fully mounted or `MAX_LIVE_MESSAGES` is reached. +- The first chunk is the visible viewport's worth, so content appears on the first frame; later + chunks fill above/below as frames allow. +- Scroll position is anchored to a stable message (by `msgid`) across commits so the viewport does + not jump. > For extremely large buffers, list virtualization (mounting only the visible range) is the -> end-state optimization. The incremental-commit approach above is the MVP mechanism; virtualization -> is tracked as a [performance follow-up](#evaluation-performance-and-limits). +> end-state optimization, tracked as a [performance follow-up](#evaluation-performance-and-limits). +> Incremental commit is the MVP mechanism. ## The Cache as a Platform Capability -Persistent storage is exactly the kind of environment-specific capability the -[Application Seams](../05-infrastructure/04-application-seams.md) model is built for. The cache is a -**capability port** on the `Platform` contract, not an `if (isTauri)` branch inside a store. `core` -calls the port; it never knows whether records land in IndexedDB or SQLite. +Persistent storage is exactly the environment-specific capability the +[Application Seams](../05-infrastructure/04-application-seams.md) model targets. The cache is a +**capability port** on the `Platform` contract, not an `if (isTauri)` branch. `core` calls the port +and never knows whether records land in IndexedDB or SQLite. ```ts // packages/core/src/platform/index.ts (addition to the Platform contract) @@ -211,131 +181,103 @@ export interface BufferStats { | Target | Adapter | Backing store | |-------------------|--------------------|---------------| | Web app / PWA | `web.ts` | IndexedDB (one object store per logical store, indexed on `[target, serverTime]`) | -| Desktop (Tauri) | `tauri.ts` | SQLite via the Rust backend (one table, indexed) - served over the custom protocol for bulk reads, not JSON IPC | -| Widget | `web.ts`, degraded | IndexedDB if available, else `null` port -> ephemeral in-memory only (recent on load) | +| Desktop (Tauri) | `tauri.ts` | SQLite via the Rust backend, served over the custom protocol for bulk reads, not JSON IPC | +| Widget | `web.ts`, degraded | IndexedDB if available, else `null` port -> ephemeral in-memory only | | Mobile (Tauri) | `tauri-mobile.ts` | reuses the desktop SQLite adapter | -When the port is `null` (storage blocked, private-browsing IndexedDB denial, or widget mode), core -degrades explicitly: it runs with an in-memory buffer only and the seed/prefill path simply has -nothing to read. This is the same `null`-port degradation pattern used for `tray` and `deepLinks`. - -Bulk history reads on the desktop go over Tauri's custom protocol handler rather than JSON-serialized -IPC, consistent with the [Large IPC payloads](01-desktop.md#memory-discipline) rule. +When the port is `null` (storage blocked, private-browsing denial, widget mode), core degrades +explicitly to an in-memory buffer and the seed/prefill path has nothing to read - the same +`null`-port pattern used for `tray` and `deepLinks`. Bulk desktop reads go over Tauri's custom +protocol handler rather than JSON IPC, per the +[Large IPC payloads](01-desktop.md#memory-discipline) rule. ## Cache Lifecycle -### A Detached, App-Scoped Writer +**A detached, app-scoped writer.** The write path must not be owned by a view component. An early +prototype scoped syncing to the chat surface and only persisted mounted buffers (a "3 of 8 buffers +cached" bug). The writer is owned at **app scope** - a store/composable subscribed to the message +stream for *all* joined targets for the whole session. Mount/unmount changes what is *rendered*, +never what is *persisted*. -The write path **must not be owned by a view component.** An early prototype scoped cache syncing to -the chat-surface component and only persisted the currently-mounted buffers (a "3 of 8 buffers -cached" bug appeared when the other targets were never mounted). The fix is to own the cache writer -at **app scope** - a store/composable that subscribes to the message stream for *all* joined targets -for the whole session, independent of which view is mounted. Component mount/unmount changes what is -*rendered*, never what is *persisted*. +**Write path.** -### Write Path +- Live messages and `chathistory` batches are written through `upsert()` in **debounced batches** to + amortize transaction overhead (IndexedDB setup, SQLite write locks). +- Retractions call `markRedacted(msgid)`, setting the tombstone flag and dropping stored `text`; + original content is never retained, matching the server contract. +- Edits (post-MVP) update the record's `text` in place, keyed by `msgid`. -- Live messages and `chathistory` batches are written through `upsert()` in **debounced batches** - rather than one write per message, to amortize transaction overhead (IndexedDB transaction - setup, SQLite write locks). -- Retractions call `markRedacted(msgid)`, which sets the tombstone flag and drops the stored `text`. - The original content is never retained, matching the server contract. -- Edits (post-MVP) update the record's `text` in place, keyed by the edited message's `msgid`. - -### Invalidation and Clearing - -There is almost nothing to invalidate (records are immutable). The two clearing paths are: - -- **User-initiated** - the [Storage management surface](#storage-management-surface) below. -- **Force refresh** - the client's hard-reset shortcut (Ctrl+Shift+R), which already clears - `localStorage` for settings, is extended to also call `HistoryCachePort.clear()` so a force refresh - wipes the on-device history archive and reloads from the server, scoped to the active account. +**Invalidation and clearing.** Records are immutable, so there is almost nothing to invalidate. Two +clearing paths: **user-initiated** (the [Storage management surface](#storage-management-surface)) +and **force refresh** (Ctrl+Shift+R, already clearing `localStorage` for settings, extended to call +`HistoryCachePort.clear()` scoped to the active account). ## Storage Management Surface -The cache is durable and can grow large, so it is a first-class surface in **Settings -> Storage**, -not a hidden implementation detail. It exposes: +The cache is durable and can grow large, so it is a first-class surface in **Settings -> Storage**: -- **Per-buffer stats** - target name, cached message count, estimated bytes, and the cached date - range, via `bufferStats()`. -- **A live total** - aggregate message count and estimated size across all buffers, plus the - environment quota and headroom (web) reported by `navigator.storage.estimate()`. +- **Per-buffer stats** - target, cached count, estimated bytes, cached date range (`bufferStats()`). +- **A live total** - aggregate count and size across buffers, plus environment quota and headroom + (web) from `navigator.storage.estimate()`. - **Eviction** - per-buffer "Evict old messages" (`prune(target, keepCount)`, oldest-first) and a - global "Evict all". Eviction respects the live window: it never evicts what is currently mounted. -- **Export** - per-buffer "Export as JSON" (`export(target)`) so a user can keep their own archive - independent of any server or device. -- **A configurable per-buffer cap** - a setting (e.g. `chat.cacheMaxMessagesPerBuffer`, default in the - low tens of thousands) that triggers an oldest-first prune pass after writes exceed it. Changing the - cap propagates to the cache layer and can trigger a one-time prune of existing over-cap buffers. + global "Evict all". Eviction never removes what is currently mounted. +- **Export** - per-buffer "Export as JSON" (`export(target)`) for a user-owned archive. +- **A configurable per-buffer cap** - e.g. `chat.cacheMaxMessagesPerBuffer` (default low tens of + thousands) triggering an oldest-first prune after writes exceed it; changing it can trigger a + one-time prune of over-cap buffers. -Eviction policy is **oldest-first within a target**, never cross-target, so heavily-used channels do -not evict each other. The cap is per-buffer rather than global to keep accounting simple and -predictable for the user. +Eviction is **oldest-first within a target**, never cross-target, so heavily-used channels do not +evict each other. The cap is per-buffer to keep accounting predictable. ## Environment Limits and Constraints -The honest constraint differences between surfaces drive the defaults above. - ### Web App / PWA (IndexedDB) -- **Quota is browser-managed and shared.** Chromium grants an origin a large slice (commonly up to - ~60% of free disk, shared across the origin's storage); Firefox uses group/eTLD+1 limits; Safari is - the tightest. The client must treat quota as finite: read `navigator.storage.estimate()`, surface it - in the Storage tab, and prune before writes when near the limit rather than letting a write throw - `QuotaExceededError`. -- **Best-effort storage can be evicted by the browser.** Under storage pressure the browser may clear - a non-persistent origin's IndexedDB. The client SHOULD request `navigator.storage.persist()` (granted - more readily for installed PWAs and engaged origins) to opt into persistent storage and reduce - surprise eviction. -- **WebKit time-based eviction.** Safari may evict script-writable storage for sites without sufficient - user engagement after a period of inactivity. Installing the PWA and the `persist()` request mitigate - this; the cache is designed to degrade gracefully (re-seed from `chathistory`) if it is wiped. -- **Structured-clone cost.** Large IndexedDB reads deserialize on the main thread. For big seeds this - is mitigated by chunked reads and, as a follow-up, moving cache I/O to a Web Worker. -- **Practical posture.** Keep web defaults moderate (`CACHE_SEED_COUNT ~150`, per-buffer cap in the low - tens of thousands). The web cache is an accelerator and a recent-history archive, not an unbounded - one. +- **Quota is browser-managed and shared.** Chromium grants ~60% of free disk (shared across the + origin); Firefox uses group/eTLD+1 limits; Safari is tightest. Treat quota as finite: read + `navigator.storage.estimate()`, surface it, and prune before writes near the limit rather than + letting a write throw `QuotaExceededError`. +- **Best-effort storage can be evicted.** Under pressure the browser may clear a non-persistent + origin's IndexedDB. The client SHOULD request `navigator.storage.persist()` (granted more readily + for installed PWAs) to reduce surprise eviction. +- **WebKit time-based eviction.** Safari may evict script-writable storage for low-engagement sites + after inactivity. Installing the PWA and `persist()` mitigate this; the cache re-seeds from + `chathistory` if wiped. +- **Structured-clone cost.** Large IndexedDB reads deserialize on the main thread; mitigated by + chunked reads and, as a follow-up, Web Worker cache I/O. +- **Posture.** Keep web defaults moderate (`CACHE_SEED_COUNT ~150`, cap in the low tens of + thousands). The web cache is an accelerator and recent-history archive, not an unbounded one. ### Desktop / Mobile (SQLite) -- **No browser quota and no surprise eviction.** SQLite is bounded only by disk. The standalone clients - do not suffer the web's quota or time-based eviction problems, so they can run a larger seed, a higher - per-buffer cap, and effectively a complete personal archive of every conversation they have seen. -- **Bulk reads are cheap and off the WebView.** Paging and search queries run in Rust against an - indexed table and stream results over the custom protocol, avoiding the WebView's main-thread - deserialization cost entirely. -- **This asymmetry is intentional.** The same `HistoryCachePort` contract backs both; only the limits - and tuning differ. A user who wants a permanent, searchable archive runs the desktop client; the web - app gives most of the benefit within the browser's constraints. +- **No browser quota and no surprise eviction.** SQLite is bounded only by disk, so the standalone + clients can run a larger seed, a higher cap, and effectively a complete personal archive. +- **Bulk reads are cheap and off the WebView.** Paging/search run in Rust against an indexed table + and stream over the custom protocol, avoiding main-thread deserialization. +- **The asymmetry is intentional.** The same `HistoryCachePort` contract backs both; only limits and + tuning differ. A user wanting a permanent searchable archive runs the desktop client. ## Evaluation: Performance and Limits -What the design gets right today: +What the design gets right: -- **Instant target switches** via cache prefill; the network only fills the tail delta. -- **Bounded DOM** regardless of buffer size via the sliding live window plus incremental rendering. -- **Reduced server load** - normal scrollback and reconnection are deltas, not full re-pulls. -- **Trivial coherence** - immutable, `msgid`-keyed records with tombstone overlays; no invalidation - graph. +- **Instant target switches** via prefill; network only fills the tail delta. +- **Bounded DOM** regardless of buffer size via the sliding window plus incremental rendering. +- **Reduced server load** - scrollback and reconnection are deltas, not full re-pulls. +- **Trivial coherence** - immutable, `msgid`-keyed records with tombstone overlays; no invalidation graph. -Where it can get faster (tracked follow-ups, not MVP blockers): +Tracked follow-ups (not MVP blockers): -- **List virtualization** for very large mounted ranges, to push the effective DOM cost below - `MAX_LIVE_MESSAGES` and make the live-window cap a soft target. -- **Worker-thread cache I/O** on the web, to keep structured-clone deserialization off the main - thread for large reads. -- **Batch-write tuning** - larger debounce windows and "prune-before-write" under quota pressure to - avoid `QuotaExceededError` round-trips. -- **A local search index** - an inverted index built over the cache (or SQLite FTS5 on the desktop) - so the client answers most searches locally and only escalates to the server's history backend for - history it never cached. This is the natural next layer on top of a populated cache and the biggest - future server-offload win. +- **List virtualization** for very large mounted ranges, making the live-window cap a soft target. +- **Worker-thread cache I/O** on the web, keeping structured-clone off the main thread. +- **Batch-write tuning** - larger debounce windows and prune-before-write under quota pressure. +- **A local search index** - an inverted index over the cache (or SQLite FTS5 on desktop) so the + client answers most searches locally, escalating to the server only for uncached history. The + biggest future server-offload win. -Known limits to keep in view: +Known limits: - The web cache can be evicted by the browser; treat it as a fast, best-effort layer with the server - as the durable fallback. The desktop cache is the durable one. -- Estimated byte sizes in the Storage tab are approximate (structured-clone/row overhead is not - exact); they are for user guidance, not billing. -- The cache cannot recover content the server never delivered or that was retracted - it stores only - what the client legitimately received. + as durable fallback. The desktop cache is the durable one. +- Estimated byte sizes are approximate (clone/row overhead) - for user guidance, not billing. +- The cache cannot recover content the server never delivered or that was retracted. From 1d6266cf6f6620a055807f9a20d391868312d965 Mon Sep 17 00:00:00 2001 From: Andrew Lake Date: Sun, 14 Jun 2026 16:33:53 -0600 Subject: [PATCH 3/5] Update 04-local-cache.md --- spec/04-clients/04-local-cache.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/04-clients/04-local-cache.md b/spec/04-clients/04-local-cache.md index ac79369..8dcfe5e 100644 --- a/spec/04-clients/04-local-cache.md +++ b/spec/04-clients/04-local-cache.md @@ -108,7 +108,7 @@ sequenceDiagram V->>S: CHATHISTORY LATEST channel newest-cached-msgid * S-->>V: delta batch (messages since tip) V->>C: upsert delta (dedupe by msgid) - Note over V: Reconcile tail; cache and view converge + Note over V: Reconcile tail - cache and view converge ``` The buffer paints from disk on the frame the user clicks; reconciliation only fills the gap between From c519553448047bc695c1d06875df840f4bf00e13 Mon Sep 17 00:00:00 2001 From: Andrew Lake Date: Mon, 15 Jun 2026 07:34:49 -0600 Subject: [PATCH 4/5] Document synthetic keys for keyless lines --- spec/04-clients/04-local-cache.md | 37 ++++++++++++++++++++++--------- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/spec/04-clients/04-local-cache.md b/spec/04-clients/04-local-cache.md index 8dcfe5e..66e50d1 100644 --- a/spec/04-clients/04-local-cache.md +++ b/spec/04-clients/04-local-cache.md @@ -19,9 +19,13 @@ The IRC history model is unusually cache-friendly: post-delivery mutations are retractions (rendered as a *tombstone* overlay, never a content edit - see [Retractions](../02-components/01-uplink/01-overview.md#retractions)) and, post-MVP, editing. It is a write-once store with rare overlay events, not a cache-coherence problem. -- **There is a stable server-assigned dedup key.** Every message carries a `msgid` (`message-ids`, - the primary/dedup key) and an authoritative `server-time` (the sort key). Live messages, - `echo-message` self-copies, and `chathistory` replays all collapse to one record by `msgid`. +- **There is a stable server-assigned dedup key for content.** Every `PRIVMSG`/`NOTICE` carries a + `msgid` (`message-ids`, the primary/dedup key) and an authoritative `server-time` (the sort key). + Live messages, `echo-message` self-copies, and `chathistory` replays all collapse to one record by + `msgid`. The exceptions are lines with no `msgid` - presence events (`JOIN`/`PART`) from + `event-playback`, and replays from servers that omit the tag. Those get a deterministic synthetic + key and dedupe by a `type + author + text + server-time` signature, so the same event from any + delivery path still collapses to one record. - **Delivery is already batched.** `chathistory` responses arrive inside a `batch`, pre-chunked by the server. The cache writes a batch as a unit; the renderer consumes it incrementally. @@ -56,27 +60,32 @@ from colliding. Two logical stores: | Store | Key | Holds | |------------|------------------------------|-------| -| `messages` | `msgid` (with a `[target, server_time]` index) | One record per delivered message | +| `messages` | `msgid` - server `msgid`, or a synthetic `evt:*` key for keyless lines (with a `[target, server_time]` index) | One record per delivered line | | `buffers` | `target` (channel/DM name) | Per-target metadata: oldest/newest cached `msgid` and timestamp, cached count, last reconcile time, cap override | A `messages` record stores exactly what the renderer and search need: ```ts interface CachedMessage { - msgid: string // primary key, server-assigned (message-ids) + msgid: string // primary key: server msgid, or a synthetic evt:* key for keyless lines target: string // channel or DM this belongs to serverTime: number // sort key, from server-time (epoch ms) account: string | null // server-asserted author identity (account-tag), null for unauthenticated nick: string // nick at send time (display only; account is authoritative) - type: "privmsg" | "notice" | "action" + type: "privmsg" | "notice" | "action" | "join" | "part" text: string tags: Record // surviving +orbit/* and +draft/* tags (reply ref, reactions, etc.) redacted?: boolean // tombstone overlay; original text is NOT retained when set + edited?: boolean // set when text was edited in place (post-MVP) } ``` -Dedup is always by `msgid`: the live socket path and the `chathistory` path upsert into the same -store, so overlaps resolve to a single record rather than a visible duplicate. +Dedup is by `msgid` wherever the server provides one: the live socket path and the `chathistory` +path upsert into the same store, so overlaps resolve to a single record rather than a visible +duplicate. Keyless lines (presence events, msgid-less replays) carry a deterministic synthetic key +and additionally collapse by a content + `server-time` signature, giving the same single-record +guarantee. The synthetic key is never surfaced as a real `msgid`, so it cannot anchor a +`CHATHISTORY` request. ## Seeding, Paging, and the Sliding Window @@ -205,7 +214,15 @@ never what is *persisted*. amortize transaction overhead (IndexedDB setup, SQLite write locks). - Retractions call `markRedacted(msgid)`, setting the tombstone flag and dropping stored `text`; original content is never retained, matching the server contract. -- Edits (post-MVP) update the record's `text` in place, keyed by `msgid`. +- Edits (post-MVP) update the record's `text` in place, keyed by `msgid`. The `edited` flag is + reserved on `CachedMessage` so the overlay is representable ahead of the feature. +- Presence events (`JOIN`/`PART`) from `event-playback` are persisted too, keyed on their synthetic + `evt:*` id, so scrollback renders them consistently rather than only when a live replay happens to + include them. +- `server-time` is the sort key, not arrival order. `event-playback` can deliver an old-stamped line + live, so the live window inserts it at its `server-time` position rather than appending (see + [Memory Discipline](01-desktop.md#memory-discipline)); cache reads are already + `[target, server_time]`-ordered, so prefill and paging return correct order for free. **Invalidation and clearing.** Records are immutable, so there is almost nothing to invalidate. Two clearing paths: **user-initiated** (the [Storage management surface](#storage-management-surface)) @@ -264,7 +281,7 @@ What the design gets right: - **Instant target switches** via prefill; network only fills the tail delta. - **Bounded DOM** regardless of buffer size via the sliding window plus incremental rendering. - **Reduced server load** - scrollback and reconnection are deltas, not full re-pulls. -- **Trivial coherence** - immutable, `msgid`-keyed records with tombstone overlays; no invalidation graph. +- **Trivial coherence** - immutable, stably-keyed records (`msgid`, or a synthetic key for keyless lines) with tombstone overlays; no invalidation graph. Tracked follow-ups (not MVP blockers): From 3e69b1bdd2e4e78e5344bd626c3b25f113f3d9ec Mon Sep 17 00:00:00 2001 From: Andrew Lake Date: Wed, 1 Jul 2026 22:38:01 -0600 Subject: [PATCH 5/5] Add note on synthetic key fallback for servers --- spec/04-clients/04-local-cache.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/spec/04-clients/04-local-cache.md b/spec/04-clients/04-local-cache.md index fd96341..211a92f 100644 --- a/spec/04-clients/04-local-cache.md +++ b/spec/04-clients/04-local-cache.md @@ -21,11 +21,13 @@ The IRC history model is unusually cache-friendly: It is a write-once store with rare overlay events, not a cache-coherence problem. - **There is a stable server-assigned dedup key for content.** Every `PRIVMSG`/`NOTICE` carries a `msgid` (`message-ids`, the primary/dedup key) and an authoritative `server-time` (the sort key). - Live messages, `echo-message` self-copies, and `chathistory` replays all collapse to one record by - `msgid`. The exceptions are lines with no `msgid` - presence events (`JOIN`/`PART`) from + Dedup is per message, not per batch. A given line keeps the same `msgid` whether it arrives live, + as an `echo-message` self-copy, or in a later `chathistory` replay, so those copies of it merge + into one cached record. A replay of 200 messages still writes 200 records; it just skips the ones + already cached. Lines with no `msgid` are the exception: presence events (`JOIN`/`PART`) from `event-playback`, and replays from servers that omit the tag. Those get a deterministic synthetic - key and dedupe by a `type + author + text + server-time` signature, so the same event from any - delivery path still collapses to one record. + key and dedupe by a `type + author + text + server-time` signature, so the same event still + collapses to one record whichever path delivered it. - **Delivery is already batched.** `chathistory` responses arrive inside a `batch`, pre-chunked by the server. The cache writes a batch as a unit; the renderer consumes it incrementally. @@ -87,6 +89,12 @@ and additionally collapse by a content + `server-time` signature, giving the sam guarantee. The synthetic key is never surfaced as a real `msgid`, so it cannot anchor a `CHATHISTORY` request. +Servers without `message-ids` support fall back to that same synthetic key for every line, so +caching still works, just on a best-effort content signature instead of an authoritative id. A +server that old is unlikely to implement the other IRCv3 features anyway, so the experience stays +close to plain IRC. Bouncer playback (ZNC and similar) isn't an MVP target; supporting it later +means reconciling the timestamps the bouncer stamps on replay and matching best-effort. + ## Seeding, Paging, and the Sliding Window Three tunables govern the data path - defaults, not protocol constants. The desktop client can run