You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deferred from #326 Action 4 / Batch 3. The other 9 actions in that issue landed cleanly on fixes (Batches 1, 2, 4, 5); the data.ts split is its own scope and benefits from a clean-slate session — see #326 for that history.
What src/lib/data.ts is today
It's our client-side data-access layer. One file, 1,565 lines, 83 exports. Every route + every component talks to it for missions, planets, fleet entries, science articles, Mars sites, ISS modules, scenarios, rockets, image provenance, audio provenance, episode sources, source logos, satellite catalogs, belt catalogs, site stories, porkchop grids, …
The actual "database" is static/data/*.json on disk — those files are already split (missions/index.json, planets.json, fleet/dragonfly.json, etc.). data.ts is the repository / DAL that fetches those JSON files, caches in memory, merges locale overlays, returns typed objects.
Internally it has three layers:
core cache (Map<string, unknown>) + get<T>(path) ~25 lines
domain getters 83 exports across 8+ unrelated subject areas ~1,480 lines
module state lazy singletons (provenanceManifest, audioProvenance,
episodeSources, sourceLogos, …) ~60 lines
Why split
Two reasons. Both real, neither urgent.
1. Single responsibility violated. When /explore wants a satellite catalog, import { getSatellites } from '$lib/data' carries the cognitive weight of "this file owns everything." Future readers scan a 1,565-line file to find the right section. Per-subject repos make reading + modifying obvious.
2. Tree-shaking is currently dead.import { getSatellites } from '$lib/data' makes the bundler keep getMissions, getScienceTab, etc. in the dep graph because they live in the same module. Per-domain modules let the bundler drop unused getters per route.
⚠ Caveat on #2. Tree-shaking only fires if callers migrate to the per-domain imports. If we keep data.ts as a barrel re-export and never sweep callers, this benefit doesn't materialize — only the readability win does. See "open architectural questions" below for the trade-off.
This is more than a mechanical split — it's an architecture review
@chipi framed this explicitly: "use this to review our data approach". So treating this as just-split-the-file misses the point. Concrete open questions worth answering before writing code:
Per-domain caches vs shared cache singleton? Today every fetch hits one Map<string, unknown> keyed by URL. Splitting to per-domain modules makes "should each domain own its cache?" a real question. Shared keeps invalidation policy in one place; per-domain makes the dependency direction one-way.
Lazy module singletons or query-key pattern? Four getters depend on lazy-initialized module state (provenanceManifest, audioProvenance, episodeSources, source logos). This pattern works but interacts poorly with tests (cold-path bugs) and with future SSR. A TanStack-style query-key pattern is more idiomatic Svelte 5 / SvelteKit 2 — but it's a much bigger change. Worth exploring vs status quo before committing.
Locale-overlay merge — inline or shared transformer? Each getter that supports locales (missions, planets, fleet, science, station, …) re-implements the same fetch-base + fetch-overlay + deep-merge pattern. A shared withLocaleOverlay<T>(basePath, overlayPath, locale) helper would be a real DRY win and likely worth doing alongside the split.
Barrel forever, or sweep callers? 37 caller files currently from '$lib/data'. Keep the barrel forever = zero blast radius but no tree-shaking gain. Sweep callers = real win but ~37 import-line edits + e2e re-validation per route. The split itself is independent of this decision — the question is what happens after.
Provenance / source-logos / audio-provenance — same DAL, or separate concern? These are meta-data about other data (which mission a photo belongs to, who licensed it, which TTS provider generated it). Lumping them with "fetch mission JSON" feels like a category error. Consider extracting to $lib/provenance/* rather than to provenance-data.ts.
Cache invalidation policy. Today the cache lives forever (memory-bound). At static-site scale that's fine; at scale it's a memory leak. Do we want a TTL, an LRU bound, or an explicit "reset on route change"? __resetCache only exists for tests today.
The cache singleton as a footgun. If a domain module accidentally re-declares const cache = new Map() (easy copy-paste mistake), it gets its own empty cache and the runtime is silently slow. Worth designing the import shape so this can't happen — e.g. data-core.ts exports cache only via a useCache() accessor that hides the Map identity.
The right answer to these is what the review session should produce before the file split lands.
Proposed module breakdown (illustrative)
The original plan in docs/guides/non-fly-modules-audit-plan.md listed 8 target modules. Realistic count once you include provenance, audio, source logos, small-bodies, site stories, porkchop grids is 10-11:
data.ts itself becomes a thin barrel that re-exports from each new module so the 37 existing from '$lib/data' callers keep working through the transition.
Risks (in roughly descending order)
Module-level singletons + closures (provenanceManifest, audioProvenance, episodeSources, source logos). These lets have to move with their getter. If they stay in data.ts while their getter moves, the getter sees an always-null variable. __resetCache clears these today → the umbrella reset has to call each domain's reset hook, which means each domain has to export one. Easy to get wrong, easy to miss in tests because tests rarely exercise the cold path.
Shared cache singleton. All 60+ JSON fetches hit the same Map. Every domain module has to import the samecache instance from data-core.ts, not a copy. Accidental re-declaration in a domain module = silently slow runtime, no test catches it. (See open question ci(deps): Bump peaceiris/actions-gh-pages from 3 to 4 #7 — design the import shape so this can't happen.)
Import cycles. Some getters call others (e.g. getMission looks up a destination via planets()). If missions-data.ts imports from surface-site-data.ts, and the surface module ever needs a mission helper, you get a cycle. Vite tolerates cycles imperfectly — sometimes works, sometimes undefined exports at runtime. Audit the inter-getter call graph before splitting to pick a directed-acyclic layout.
37 caller files import from $lib/data. Barrel-keep strategy mitigates this (callers untouched), but one missed re-export in the barrel breaks call sites silently. Typecheck catches most of it; runtime-only interfaces don't always.
Test ordering + cache leakage.src/lib/data.test.ts calls __resetCache() between tests. If the umbrella reset only clears data-core's JSON cache but forgets to reset provenanceManifest, a test asserting on a fresh manifest gets a stale one from a previous test.
Suggested approach
Two-phase, both in this issue:
Phase 0 — design review (no code). Answer the 7 open architectural questions above. Land an ADR or RFC capturing the chosen approach. Specifically pin down:
Cache shape (shared vs per-domain, accessor pattern to avoid foot-gun)
Whether provenance + source logos move out of the DAL entirely
Whether withLocaleOverlay<T> becomes a shared helper
Phase 1 — file split, barrel-keep. Once the design is locked, mechanical execution:
Create data-core.ts (cache, get, umbrella __resetCache, maybe withLocaleOverlay). data.ts imports from it. Verify tests pass.
Per-domain extraction, one module per commit. Move exports + matching tests + their module-level state + their reset hook. Run vitest after each. Suggested order: smallest/least-connected first (rockets, scenarios, then science, fleet, missions, station, surface-site, small-bodies, provenance last because it owns the most state).
data.ts is now a pure barrel — every export re-exported from a domain module.
(Optional, separate commits) Sweep callers from $lib/data to the per-domain modules. ~37 import-line edits + browser-verify per route. Could be its own follow-up issue if scope creeps.
Acceptance
Phase 0 — ADR/RFC capturing answers to the 7 open questions, signed off by @chipi
Phase 1 — all data.ts exports moved to per-domain modules; data.ts is a thin barrel
npx svelte-check 0 errors
npx vitest run passes (count grows by however many new tests Phase 0 prescribes)
npm run preflight exit 0
Browser-verify every touched route after each domain extraction
(Optional follow-up issue or commits) — caller sweep + delete the barrel
@chipi to schedule a fresh agent session once the seven Batch 1/2/4/5 commits are merged to main. Don't start Phase 1 (the file split) before Phase 0 (the design review) is signed off — the plan author already noted "best done with no other in-flight refactors on the same branch," and the architectural questions affect what the right split actually looks like.
What
src/lib/data.tsis todayIt's our client-side data-access layer. One file, 1,565 lines, 83 exports. Every route + every component talks to it for missions, planets, fleet entries, science articles, Mars sites, ISS modules, scenarios, rockets, image provenance, audio provenance, episode sources, source logos, satellite catalogs, belt catalogs, site stories, porkchop grids, …
The actual "database" is
static/data/*.jsonon disk — those files are already split (missions/index.json,planets.json,fleet/dragonfly.json, etc.).data.tsis the repository / DAL that fetches those JSON files, caches in memory, merges locale overlays, returns typed objects.Internally it has three layers:
Why split
Two reasons. Both real, neither urgent.
1. Single responsibility violated. When
/explorewants a satellite catalog,import { getSatellites } from '$lib/data'carries the cognitive weight of "this file owns everything." Future readers scan a 1,565-line file to find the right section. Per-subject repos make reading + modifying obvious.2. Tree-shaking is currently dead.
import { getSatellites } from '$lib/data'makes the bundler keepgetMissions,getScienceTab, etc. in the dep graph because they live in the same module. Per-domain modules let the bundler drop unused getters per route.This is more than a mechanical split — it's an architecture review
@chipiframed this explicitly: "use this to review our data approach". So treating this as just-split-the-file misses the point. Concrete open questions worth answering before writing code:Per-domain caches vs shared cache singleton? Today every fetch hits one
Map<string, unknown>keyed by URL. Splitting to per-domain modules makes "should each domain own its cache?" a real question. Shared keeps invalidation policy in one place; per-domain makes the dependency direction one-way.Lazy module singletons or query-key pattern? Four getters depend on lazy-initialized module state (
provenanceManifest,audioProvenance,episodeSources, source logos). This pattern works but interacts poorly with tests (cold-path bugs) and with future SSR. A TanStack-style query-key pattern is more idiomatic Svelte 5 / SvelteKit 2 — but it's a much bigger change. Worth exploring vs status quo before committing.Locale-overlay merge — inline or shared transformer? Each getter that supports locales (missions, planets, fleet, science, station, …) re-implements the same fetch-base + fetch-overlay + deep-merge pattern. A shared
withLocaleOverlay<T>(basePath, overlayPath, locale)helper would be a real DRY win and likely worth doing alongside the split.Barrel forever, or sweep callers? 37 caller files currently
from '$lib/data'. Keep the barrel forever = zero blast radius but no tree-shaking gain. Sweep callers = real win but ~37 import-line edits + e2e re-validation per route. The split itself is independent of this decision — the question is what happens after.Provenance / source-logos / audio-provenance — same DAL, or separate concern? These are meta-data about other data (which mission a photo belongs to, who licensed it, which TTS provider generated it). Lumping them with "fetch mission JSON" feels like a category error. Consider extracting to
$lib/provenance/*rather than toprovenance-data.ts.Cache invalidation policy. Today the cache lives forever (memory-bound). At static-site scale that's fine; at scale it's a memory leak. Do we want a TTL, an LRU bound, or an explicit "reset on route change"?
__resetCacheonly exists for tests today.The cache singleton as a footgun. If a domain module accidentally re-declares
const cache = new Map()(easy copy-paste mistake), it gets its own empty cache and the runtime is silently slow. Worth designing the import shape so this can't happen — e.g.data-core.tsexportscacheonly via auseCache()accessor that hides the Map identity.The right answer to these is what the review session should produce before the file split lands.
Proposed module breakdown (illustrative)
The original plan in docs/guides/non-fly-modules-audit-plan.md listed 8 target modules. Realistic count once you include provenance, audio, source logos, small-bodies, site stories, porkchop grids is 10-11:
data-core.tscache+get<T>+ (__resetCacheumbrella) + (maybe) sharedwithLocaleOverlaymissions-data.tsgetMission*,MissionFilter,filterMissions,getMissionsForLibrary,getMissionGallery,getPorkchopGridfleet-data.tsgetFleet*,getFleetGallery,getFleetIndex,getFleetByCategoryscience-data.tsSCIENCE_TABS,getScienceTab,getScienceSection,getScienceLanding,getScienceTabIntrosurface-site-data.tsplanets,getPlanets,earthObjects,getEarthObjects,moonSites,getMoonSites,marsSites,getMarsSites,getMarsTraverse,getSun, all surface galleriesstation-data.tsgetIss*(8 getters),getTiangong*(8 getters), station galleriesscenarios-data.tsgetScenariorockets-data.tsrockets,getRocketsprovenance-data.ts(maybe →$lib/provenance/)getImageProvenanceManifest,getImageProvenance,getSourceLogos,getTextSources,getAudioProvenanceManifest,getEpisodeSourcesManifest(+ their module-level singletons)small-bodies-data.tsgetSatellites,getSatelliteI18n,getSatelliteGallery,getBelts,getBeltI18n,getBeltGallery,getSmallBodyGallery,getSiteStorydata.tsitself becomes a thin barrel that re-exports from each new module so the 37 existingfrom '$lib/data'callers keep working through the transition.Risks (in roughly descending order)
Module-level singletons + closures (
provenanceManifest,audioProvenance,episodeSources, source logos). Theselets have to move with their getter. If they stay indata.tswhile their getter moves, the getter sees an always-null variable.__resetCacheclears these today → the umbrella reset has to call each domain's reset hook, which means each domain has to export one. Easy to get wrong, easy to miss in tests because tests rarely exercise the cold path.Shared
cachesingleton. All 60+ JSON fetches hit the same Map. Every domain module has to import the samecacheinstance fromdata-core.ts, not a copy. Accidental re-declaration in a domain module = silently slow runtime, no test catches it. (See open question ci(deps): Bump peaceiris/actions-gh-pages from 3 to 4 #7 — design the import shape so this can't happen.)Import cycles. Some getters call others (e.g.
getMissionlooks up a destination viaplanets()). Ifmissions-data.tsimports fromsurface-site-data.ts, and the surface module ever needs a mission helper, you get a cycle. Vite tolerates cycles imperfectly — sometimes works, sometimesundefinedexports at runtime. Audit the inter-getter call graph before splitting to pick a directed-acyclic layout.37 caller files import from
$lib/data. Barrel-keep strategy mitigates this (callers untouched), but one missed re-export in the barrel breaks call sites silently. Typecheck catches most of it; runtime-only interfaces don't always.Test ordering + cache leakage.
src/lib/data.test.tscalls__resetCache()between tests. If the umbrella reset only clearsdata-core's JSON cache but forgets to resetprovenanceManifest, a test asserting on a fresh manifest gets a stale one from a previous test.Suggested approach
Two-phase, both in this issue:
Phase 0 — design review (no code). Answer the 7 open architectural questions above. Land an ADR or RFC capturing the chosen approach. Specifically pin down:
withLocaleOverlay<T>becomes a shared helperPhase 1 — file split, barrel-keep. Once the design is locked, mechanical execution:
data-core.ts(cache,get, umbrella__resetCache, maybewithLocaleOverlay).data.tsimports from it. Verify tests pass.rockets,scenarios, thenscience,fleet,missions,station,surface-site,small-bodies,provenancelast because it owns the most state).data.tsis now a pure barrel — every export re-exported from a domain module.$lib/datato the per-domain modules. ~37 import-line edits + browser-verify per route. Could be its own follow-up issue if scope creeps.Acceptance
data.tsis a thin barrelnpx svelte-check0 errorsnpx vitest runpasses (count grows by however many new tests Phase 0 prescribes)npm run preflightexit 0Cross-links
fixes; this Action 4 deferred here)Working on this issue
@chipi to schedule a fresh agent session once the seven Batch 1/2/4/5 commits are merged to
main. Don't start Phase 1 (the file split) before Phase 0 (the design review) is signed off — the plan author already noted "best done with no other in-flight refactors on the same branch," and the architectural questions affect what the right split actually looks like.