refactor(data): split data.ts into domain repos + review the DAL architecture

> Deferred from #326 Action 4 / Batch 3. The other 9 actions in that issue landed cleanly on `fixes` (Batches 1, 2, 4, 5); the data.ts split is its own scope and benefits from a clean-slate session — see #326 for that history.

## What `src/lib/data.ts` is today

It's our **client-side data-access layer**. One file, **1,565 lines, 83 exports**. Every route + every component talks to it for missions, planets, fleet entries, science articles, Mars sites, ISS modules, scenarios, rockets, image provenance, audio provenance, episode sources, source logos, satellite catalogs, belt catalogs, site stories, porkchop grids, …

The actual "database" is `static/data/*.json` on disk — those files are already split (`missions/index.json`, `planets.json`, `fleet/dragonfly.json`, etc.). `data.ts` is the **repository / DAL** that fetches those JSON files, caches in memory, merges locale overlays, returns typed objects.

Internally it has three layers:

```
core           cache (Map<string, unknown>) + get<T>(path)             ~25 lines
domain getters 83 exports across 8+ unrelated subject areas           ~1,480 lines
module state   lazy singletons (provenanceManifest, audioProvenance,
               episodeSources, sourceLogos, …)                         ~60 lines
```

## Why split

Two reasons. Both real, neither urgent.

**1. Single responsibility violated.** When `/explore` wants a satellite catalog, `import { getSatellites } from '$lib/data'` carries the *cognitive* weight of "this file owns everything." Future readers scan a 1,565-line file to find the right section. Per-subject repos make reading + modifying obvious.

**2. Tree-shaking is currently dead.** `import { getSatellites } from '$lib/data'` makes the bundler keep `getMissions`, `getScienceTab`, etc. in the dep graph because they live in the same module. Per-domain modules let the bundler drop unused getters per route.

> ⚠ **Caveat on #2.** Tree-shaking only fires if callers migrate to the per-domain imports. If we keep `data.ts` as a barrel re-export and never sweep callers, this benefit doesn't materialize — only the readability win does. See "open architectural questions" below for the trade-off.

## This is more than a mechanical split — it's an architecture review

`@chipi` framed this explicitly: *"use this to review our data approach"*. So treating this as just-split-the-file misses the point. Concrete open questions worth answering **before** writing code:

1. **Per-domain caches vs shared cache singleton?** Today every fetch hits one `Map<string, unknown>` keyed by URL. Splitting to per-domain modules makes "should each domain own its cache?" a real question. Shared keeps invalidation policy in one place; per-domain makes the dependency direction one-way.

2. **Lazy module singletons or query-key pattern?** Four getters depend on lazy-initialized module state (`provenanceManifest`, `audioProvenance`, `episodeSources`, source logos). This pattern works but interacts poorly with tests (cold-path bugs) and with future SSR. A TanStack-style query-key pattern is more idiomatic Svelte 5 / SvelteKit 2 — but it's a much bigger change. Worth exploring vs status quo before committing.

3. **Locale-overlay merge — inline or shared transformer?** Each getter that supports locales (missions, planets, fleet, science, station, …) re-implements the same fetch-base + fetch-overlay + deep-merge pattern. A shared `withLocaleOverlay<T>(basePath, overlayPath, locale)` helper would be a real DRY win and likely worth doing alongside the split.

4. **Barrel forever, or sweep callers?** 37 caller files currently `from '$lib/data'`. Keep the barrel forever = zero blast radius but no tree-shaking gain. Sweep callers = real win but ~37 import-line edits + e2e re-validation per route. The split itself is independent of this decision — the question is what happens after.

5. **Provenance / source-logos / audio-provenance — same DAL, or separate concern?** These are *meta-data about other data* (which mission a photo belongs to, who licensed it, which TTS provider generated it). Lumping them with "fetch mission JSON" feels like a category error. Consider extracting to `$lib/provenance/*` rather than to `provenance-data.ts`.

6. **Cache invalidation policy.** Today the cache lives forever (memory-bound). At static-site scale that's fine; at scale it's a memory leak. Do we want a TTL, an LRU bound, or an explicit "reset on route change"? `__resetCache` only exists for tests today.

7. **The cache singleton as a footgun.** If a domain module accidentally re-declares `const cache = new Map()` (easy copy-paste mistake), it gets its own empty cache and the runtime is silently slow. Worth designing the import shape so this can't happen — e.g. `data-core.ts` exports `cache` only via a `useCache()` accessor that hides the Map identity.

The right answer to these is what the review session should produce **before** the file split lands.

## Proposed module breakdown (illustrative)

The original plan in [docs/guides/non-fly-modules-audit-plan.md](docs/guides/non-fly-modules-audit-plan.md) listed 8 target modules. Realistic count once you include provenance, audio, source logos, small-bodies, site stories, porkchop grids is **10-11**:

| New file | Owns | Approx LOC |
|---|---|---|
| `data-core.ts` | `cache` + `get<T>` + (`__resetCache` umbrella) + (maybe) shared `withLocaleOverlay` | ~50 |
| `missions-data.ts` | `getMission*`, `MissionFilter`, `filterMissions`, `getMissionsForLibrary`, `getMissionGallery`, `getPorkchopGrid` | ~250 |
| `fleet-data.ts` | `getFleet*`, `getFleetGallery`, `getFleetIndex`, `getFleetByCategory` | ~200 |
| `science-data.ts` | `SCIENCE_TABS`, `getScienceTab`, `getScienceSection`, `getScienceLanding`, `getScienceTabIntro` | ~150 |
| `surface-site-data.ts` | `planets`, `getPlanets`, `earthObjects`, `getEarthObjects`, `moonSites`, `getMoonSites`, `marsSites`, `getMarsSites`, `getMarsTraverse`, `getSun`, all surface galleries | ~400 |
| `station-data.ts` | `getIss*` (8 getters), `getTiangong*` (8 getters), station galleries | ~250 |
| `scenarios-data.ts` | `getScenario` | ~30 |
| `rockets-data.ts` | `rockets`, `getRockets` | ~30 |
| `provenance-data.ts` *(maybe → `$lib/provenance/`)* | `getImageProvenanceManifest`, `getImageProvenance`, `getSourceLogos`, `getTextSources`, `getAudioProvenanceManifest`, `getEpisodeSourcesManifest` (+ their module-level singletons) | ~250 |
| `small-bodies-data.ts` | `getSatellites`, `getSatelliteI18n`, `getSatelliteGallery`, `getBelts`, `getBeltI18n`, `getBeltGallery`, `getSmallBodyGallery`, `getSiteStory` | ~200 |

`data.ts` itself becomes a thin barrel that re-exports from each new module so the 37 existing `from '$lib/data'` callers keep working through the transition.

## Risks (in roughly descending order)

1. **Module-level singletons + closures (`provenanceManifest`, `audioProvenance`, `episodeSources`, source logos).** These `let`s have to move with their getter. If they stay in `data.ts` while their getter moves, the getter sees an always-null variable. `__resetCache` clears these today → the umbrella reset has to call each domain's reset hook, which means each domain has to export one. Easy to get wrong, easy to miss in tests because tests rarely exercise the cold path.

2. **Shared `cache` singleton.** All 60+ JSON fetches hit the same Map. Every domain module has to import the *same* `cache` instance from `data-core.ts`, not a copy. Accidental re-declaration in a domain module = silently slow runtime, no test catches it. (See open question #7 — design the import shape so this can't happen.)

3. **Import cycles.** Some getters call others (e.g. `getMission` looks up a destination via `planets()`). If `missions-data.ts` imports from `surface-site-data.ts`, and the surface module ever needs a mission helper, you get a cycle. Vite tolerates cycles imperfectly — sometimes works, sometimes `undefined` exports at runtime. Audit the inter-getter call graph **before** splitting to pick a directed-acyclic layout.

4. **37 caller files import from `$lib/data`.** Barrel-keep strategy mitigates this (callers untouched), but one missed re-export in the barrel breaks call sites silently. Typecheck catches most of it; runtime-only interfaces don't always.

5. **Test ordering + cache leakage.** `src/lib/data.test.ts` calls `__resetCache()` between tests. If the umbrella reset only clears `data-core`'s JSON cache but forgets to reset `provenanceManifest`, a test asserting on a fresh manifest gets a stale one from a previous test.

## Suggested approach

Two-phase, both in this issue:

**Phase 0 — design review (no code).** Answer the 7 open architectural questions above. Land an ADR or RFC capturing the chosen approach. Specifically pin down:
- Cache shape (shared vs per-domain, accessor pattern to avoid foot-gun)
- Module-singleton handling (whose reset hook calls whose)
- Whether provenance + source logos move out of the DAL entirely
- Whether `withLocaleOverlay<T>` becomes a shared helper

**Phase 1 — file split, barrel-keep.** Once the design is locked, mechanical execution:

1. Create `data-core.ts` (`cache`, `get`, umbrella `__resetCache`, maybe `withLocaleOverlay`). `data.ts` imports from it. Verify tests pass.
2. Per-domain extraction, one module per commit. Move exports + matching tests + their module-level state + their reset hook. Run vitest after each. Suggested order: smallest/least-connected first (`rockets`, `scenarios`, then `science`, `fleet`, `missions`, `station`, `surface-site`, `small-bodies`, `provenance` last because it owns the most state).
3. `data.ts` is now a pure barrel — every export re-exported from a domain module.
4. (Optional, separate commits) Sweep callers from `$lib/data` to the per-domain modules. ~37 import-line edits + browser-verify per route. Could be its own follow-up issue if scope creeps.

## Acceptance

- [ ] Phase 0 — ADR/RFC capturing answers to the 7 open questions, signed off by @chipi
- [ ] Phase 1 — all data.ts exports moved to per-domain modules; `data.ts` is a thin barrel
- [ ] `npx svelte-check` 0 errors
- [ ] `npx vitest run` passes (count grows by however many new tests Phase 0 prescribes)
- [ ] `npm run preflight` exit 0
- [ ] Browser-verify every touched route after each domain extraction
- [ ] (Optional follow-up issue or commits) — caller sweep + delete the barrel

## Cross-links

- Parent audit issue: #326 (Batches 1, 2, 4, 5 landed on `fixes`; this Action 4 deferred here)
- Original plan: [docs/guides/non-fly-modules-audit-plan.md](docs/guides/non-fly-modules-audit-plan.md) §Batch 3
- Companion: #324 (closed — /fly state refactor), #325 (raf+Three.js harness)

## Working on this issue

@chipi to schedule a fresh agent session once the seven Batch 1/2/4/5 commits are merged to `main`. Don't start Phase 1 (the file split) before Phase 0 (the design review) is signed off — the plan author already noted "best done with no other in-flight refactors on the same branch," and the architectural questions affect what the right split actually looks like.


New file	Owns	Approx LOC
`data-core.ts`	`cache` + `get<T>` + (`__resetCache` umbrella) + (maybe) shared `withLocaleOverlay`	~50
`missions-data.ts`	`getMission*`, `MissionFilter`, `filterMissions`, `getMissionsForLibrary`, `getMissionGallery`, `getPorkchopGrid`	~250
`fleet-data.ts`	`getFleet*`, `getFleetGallery`, `getFleetIndex`, `getFleetByCategory`	~200
`science-data.ts`	`SCIENCE_TABS`, `getScienceTab`, `getScienceSection`, `getScienceLanding`, `getScienceTabIntro`	~150
`surface-site-data.ts`	`planets`, `getPlanets`, `earthObjects`, `getEarthObjects`, `moonSites`, `getMoonSites`, `marsSites`, `getMarsSites`, `getMarsTraverse`, `getSun`, all surface galleries	~400
`station-data.ts`	`getIss` (8 getters), `getTiangong` (8 getters), station galleries	~250
`scenarios-data.ts`	`getScenario`	~30
`rockets-data.ts`	`rockets`, `getRockets`	~30
`provenance-data.ts` (maybe → `$lib/provenance/`)	`getImageProvenanceManifest`, `getImageProvenance`, `getSourceLogos`, `getTextSources`, `getAudioProvenanceManifest`, `getEpisodeSourcesManifest` (+ their module-level singletons)	~250
`small-bodies-data.ts`	`getSatellites`, `getSatelliteI18n`, `getSatelliteGallery`, `getBelts`, `getBeltI18n`, `getBeltGallery`, `getSmallBodyGallery`, `getSiteStory`	~200

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(data): split data.ts into domain repos + review the DAL architecture #327

What `src/lib/data.ts` is today

Why split

This is more than a mechanical split — it's an architecture review

Proposed module breakdown (illustrative)

Risks (in roughly descending order)

Suggested approach

Acceptance

Cross-links

Working on this issue

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

refactor(data): split data.ts into domain repos + review the DAL architecture #327

Description

What src/lib/data.ts is today

Why split

This is more than a mechanical split — it's an architecture review

Proposed module breakdown (illustrative)

Risks (in roughly descending order)

Suggested approach

Acceptance

Cross-links

Working on this issue

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

What `src/lib/data.ts` is today