For sources whose file lists are fragile to resolve from the client — Lumbra's Gigabase (JS-rendered .7z download links) and Austrian Bundesliga (files scattered across pages, no system) — add a small server-side manifest to the existing LPDO data service (the one that already caches player normalization, normalise.lpdo.com).
Why
If scraping/URL-reconstruction lives in the shipped daemon, any source-site change breaks every installed copy until users update the app. Server-side, we fix the scraper once and all clients keep working — they just fetch a refreshed manifest. Also: politeness (we hit the source once, fan out a cached list), a place to validate URLs (HEAD-check, sizes, sha, dates), and reuse of existing infra.
Shape
- New endpoint(s): a JSON manifest per source —
[{ label, url, covers:[from,to], size, sha256?, published }].
- Server runs source-specific resolvers on a schedule: Lumbra's (resolve the real
wp-content/uploads/.../OTB_<era>_v<date>.7z URLs, curate to complete-year OTB), Bundesliga AT (crawl the scattered pages into a clean per-season file list).
- Client side: a generic "manifest-backed feed" driver — GET the manifest, map entries to
FeedItems (with covers, reusing the B2 window file-skip), download each file straight from the origin, decompress, import. The service is a thin index, never a mirror.
Guardrails
- Thin index only — URLs + metadata, never proxy the bytes (esp. Lumbra's CC BY-NC-SA non-commercial + bandwidth).
- Not a hard dependency — ship a bundled last-known-good manifest in the app and cache the last fetched one, so a service outage only blocks discovery of new files, not import of known ones.
- Keep the high-level catalog compiled-in (defines what sources exist + how to acquire); only the volatile file list/URLs come from the manifest.
- TWIC + Lichess stay self-resolving (they publish clean indexes) — not routed through the service.
Sequencing
B3 ships Lumbra's with a bundled static manifest wired through the manifest-backed driver (proves 7z + bulk import). This issue then moves manifest generation to the service as a fast follow — the client code doesn't change, only where the manifest comes from (bundled → fetched-with-bundled-fallback). Relates to #40.
For sources whose file lists are fragile to resolve from the client — Lumbra's Gigabase (JS-rendered
.7zdownload links) and Austrian Bundesliga (files scattered across pages, no system) — add a small server-side manifest to the existing LPDO data service (the one that already caches player normalization, normalise.lpdo.com).Why
If scraping/URL-reconstruction lives in the shipped daemon, any source-site change breaks every installed copy until users update the app. Server-side, we fix the scraper once and all clients keep working — they just fetch a refreshed manifest. Also: politeness (we hit the source once, fan out a cached list), a place to validate URLs (HEAD-check, sizes, sha, dates), and reuse of existing infra.
Shape
[{ label, url, covers:[from,to], size, sha256?, published }].wp-content/uploads/.../OTB_<era>_v<date>.7zURLs, curate to complete-year OTB), Bundesliga AT (crawl the scattered pages into a clean per-season file list).FeedItems (withcovers, reusing the B2 window file-skip), download each file straight from the origin, decompress, import. The service is a thin index, never a mirror.Guardrails
Sequencing
B3 ships Lumbra's with a bundled static manifest wired through the manifest-backed driver (proves 7z + bulk import). This issue then moves manifest generation to the service as a fast follow — the client code doesn't change, only where the manifest comes from (bundled → fetched-with-bundled-fallback). Relates to #40.