Skip to content

refactor(data): split data.ts into domain repos + review the DAL architecture #327

@chipi

Description

@chipi

Deferred from #326 Action 4 / Batch 3. The other 9 actions in that issue landed cleanly on fixes (Batches 1, 2, 4, 5); the data.ts split is its own scope and benefits from a clean-slate session — see #326 for that history.

What src/lib/data.ts is today

It's our client-side data-access layer. One file, 1,565 lines, 83 exports. Every route + every component talks to it for missions, planets, fleet entries, science articles, Mars sites, ISS modules, scenarios, rockets, image provenance, audio provenance, episode sources, source logos, satellite catalogs, belt catalogs, site stories, porkchop grids, …

The actual "database" is static/data/*.json on disk — those files are already split (missions/index.json, planets.json, fleet/dragonfly.json, etc.). data.ts is the repository / DAL that fetches those JSON files, caches in memory, merges locale overlays, returns typed objects.

Internally it has three layers:

core           cache (Map<string, unknown>) + get<T>(path)             ~25 lines
domain getters 83 exports across 8+ unrelated subject areas           ~1,480 lines
module state   lazy singletons (provenanceManifest, audioProvenance,
               episodeSources, sourceLogos, …)                         ~60 lines

Why split

Two reasons. Both real, neither urgent.

1. Single responsibility violated. When /explore wants a satellite catalog, import { getSatellites } from '$lib/data' carries the cognitive weight of "this file owns everything." Future readers scan a 1,565-line file to find the right section. Per-subject repos make reading + modifying obvious.

2. Tree-shaking is currently dead. import { getSatellites } from '$lib/data' makes the bundler keep getMissions, getScienceTab, etc. in the dep graph because they live in the same module. Per-domain modules let the bundler drop unused getters per route.

Caveat on #2. Tree-shaking only fires if callers migrate to the per-domain imports. If we keep data.ts as a barrel re-export and never sweep callers, this benefit doesn't materialize — only the readability win does. See "open architectural questions" below for the trade-off.

This is more than a mechanical split — it's an architecture review

@chipi framed this explicitly: "use this to review our data approach". So treating this as just-split-the-file misses the point. Concrete open questions worth answering before writing code:

  1. Per-domain caches vs shared cache singleton? Today every fetch hits one Map<string, unknown> keyed by URL. Splitting to per-domain modules makes "should each domain own its cache?" a real question. Shared keeps invalidation policy in one place; per-domain makes the dependency direction one-way.

  2. Lazy module singletons or query-key pattern? Four getters depend on lazy-initialized module state (provenanceManifest, audioProvenance, episodeSources, source logos). This pattern works but interacts poorly with tests (cold-path bugs) and with future SSR. A TanStack-style query-key pattern is more idiomatic Svelte 5 / SvelteKit 2 — but it's a much bigger change. Worth exploring vs status quo before committing.

  3. Locale-overlay merge — inline or shared transformer? Each getter that supports locales (missions, planets, fleet, science, station, …) re-implements the same fetch-base + fetch-overlay + deep-merge pattern. A shared withLocaleOverlay<T>(basePath, overlayPath, locale) helper would be a real DRY win and likely worth doing alongside the split.

  4. Barrel forever, or sweep callers? 37 caller files currently from '$lib/data'. Keep the barrel forever = zero blast radius but no tree-shaking gain. Sweep callers = real win but ~37 import-line edits + e2e re-validation per route. The split itself is independent of this decision — the question is what happens after.

  5. Provenance / source-logos / audio-provenance — same DAL, or separate concern? These are meta-data about other data (which mission a photo belongs to, who licensed it, which TTS provider generated it). Lumping them with "fetch mission JSON" feels like a category error. Consider extracting to $lib/provenance/* rather than to provenance-data.ts.

  6. Cache invalidation policy. Today the cache lives forever (memory-bound). At static-site scale that's fine; at scale it's a memory leak. Do we want a TTL, an LRU bound, or an explicit "reset on route change"? __resetCache only exists for tests today.

  7. The cache singleton as a footgun. If a domain module accidentally re-declares const cache = new Map() (easy copy-paste mistake), it gets its own empty cache and the runtime is silently slow. Worth designing the import shape so this can't happen — e.g. data-core.ts exports cache only via a useCache() accessor that hides the Map identity.

The right answer to these is what the review session should produce before the file split lands.

Proposed module breakdown (illustrative)

The original plan in docs/guides/non-fly-modules-audit-plan.md listed 8 target modules. Realistic count once you include provenance, audio, source logos, small-bodies, site stories, porkchop grids is 10-11:

New file Owns Approx LOC
data-core.ts cache + get<T> + (__resetCache umbrella) + (maybe) shared withLocaleOverlay ~50
missions-data.ts getMission*, MissionFilter, filterMissions, getMissionsForLibrary, getMissionGallery, getPorkchopGrid ~250
fleet-data.ts getFleet*, getFleetGallery, getFleetIndex, getFleetByCategory ~200
science-data.ts SCIENCE_TABS, getScienceTab, getScienceSection, getScienceLanding, getScienceTabIntro ~150
surface-site-data.ts planets, getPlanets, earthObjects, getEarthObjects, moonSites, getMoonSites, marsSites, getMarsSites, getMarsTraverse, getSun, all surface galleries ~400
station-data.ts getIss* (8 getters), getTiangong* (8 getters), station galleries ~250
scenarios-data.ts getScenario ~30
rockets-data.ts rockets, getRockets ~30
provenance-data.ts (maybe → $lib/provenance/) getImageProvenanceManifest, getImageProvenance, getSourceLogos, getTextSources, getAudioProvenanceManifest, getEpisodeSourcesManifest (+ their module-level singletons) ~250
small-bodies-data.ts getSatellites, getSatelliteI18n, getSatelliteGallery, getBelts, getBeltI18n, getBeltGallery, getSmallBodyGallery, getSiteStory ~200

data.ts itself becomes a thin barrel that re-exports from each new module so the 37 existing from '$lib/data' callers keep working through the transition.

Risks (in roughly descending order)

  1. Module-level singletons + closures (provenanceManifest, audioProvenance, episodeSources, source logos). These lets have to move with their getter. If they stay in data.ts while their getter moves, the getter sees an always-null variable. __resetCache clears these today → the umbrella reset has to call each domain's reset hook, which means each domain has to export one. Easy to get wrong, easy to miss in tests because tests rarely exercise the cold path.

  2. Shared cache singleton. All 60+ JSON fetches hit the same Map. Every domain module has to import the same cache instance from data-core.ts, not a copy. Accidental re-declaration in a domain module = silently slow runtime, no test catches it. (See open question ci(deps): Bump peaceiris/actions-gh-pages from 3 to 4 #7 — design the import shape so this can't happen.)

  3. Import cycles. Some getters call others (e.g. getMission looks up a destination via planets()). If missions-data.ts imports from surface-site-data.ts, and the surface module ever needs a mission helper, you get a cycle. Vite tolerates cycles imperfectly — sometimes works, sometimes undefined exports at runtime. Audit the inter-getter call graph before splitting to pick a directed-acyclic layout.

  4. 37 caller files import from $lib/data. Barrel-keep strategy mitigates this (callers untouched), but one missed re-export in the barrel breaks call sites silently. Typecheck catches most of it; runtime-only interfaces don't always.

  5. Test ordering + cache leakage. src/lib/data.test.ts calls __resetCache() between tests. If the umbrella reset only clears data-core's JSON cache but forgets to reset provenanceManifest, a test asserting on a fresh manifest gets a stale one from a previous test.

Suggested approach

Two-phase, both in this issue:

Phase 0 — design review (no code). Answer the 7 open architectural questions above. Land an ADR or RFC capturing the chosen approach. Specifically pin down:

  • Cache shape (shared vs per-domain, accessor pattern to avoid foot-gun)
  • Module-singleton handling (whose reset hook calls whose)
  • Whether provenance + source logos move out of the DAL entirely
  • Whether withLocaleOverlay<T> becomes a shared helper

Phase 1 — file split, barrel-keep. Once the design is locked, mechanical execution:

  1. Create data-core.ts (cache, get, umbrella __resetCache, maybe withLocaleOverlay). data.ts imports from it. Verify tests pass.
  2. Per-domain extraction, one module per commit. Move exports + matching tests + their module-level state + their reset hook. Run vitest after each. Suggested order: smallest/least-connected first (rockets, scenarios, then science, fleet, missions, station, surface-site, small-bodies, provenance last because it owns the most state).
  3. data.ts is now a pure barrel — every export re-exported from a domain module.
  4. (Optional, separate commits) Sweep callers from $lib/data to the per-domain modules. ~37 import-line edits + browser-verify per route. Could be its own follow-up issue if scope creeps.

Acceptance

  • Phase 0 — ADR/RFC capturing answers to the 7 open questions, signed off by @chipi
  • Phase 1 — all data.ts exports moved to per-domain modules; data.ts is a thin barrel
  • npx svelte-check 0 errors
  • npx vitest run passes (count grows by however many new tests Phase 0 prescribes)
  • npm run preflight exit 0
  • Browser-verify every touched route after each domain extraction
  • (Optional follow-up issue or commits) — caller sweep + delete the barrel

Cross-links

Working on this issue

@chipi to schedule a fresh agent session once the seven Batch 1/2/4/5 commits are merged to main. Don't start Phase 1 (the file split) before Phase 0 (the design review) is signed off — the plan author already noted "best done with no other in-flight refactors on the same branch," and the architectural questions affect what the right split actually looks like.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions