feat(domain-rer): RER d'Île-de-France — first multi-domain ship#62
Merged
Conversation
The daily-grid generator backs off after max_attempts CSP attempts. 200 was fine for the paris-metro pack (~36 predicates, 310 entities, broad intersections). Denser packs — RER ships with 230 entities and a narrower set of valid candidates — were occasionally failing publish on their first attempt. 100-seed stress test: paris-metro @ 200: 0/100 failures rer @ 200: 17/100 failures rer @ 1000: 0/100 failures Each attempt is sub-ms on the runner, so 1000 still costs under a second per (domain, day). No need to expose the value as a config field yet — flip the constant when a future pack needs more.
Two Python scripts (stdlib only, no external deps), mirroring the
paris-metro pipeline but adapted to the rer constraints.
scripts/ingest/build_rer_dataset.py
- One Overpass query pulls every relation[type=route][network=RER] in
Île-de-France plus their stop-role node members. Members tagged
'stop_entry_only' / 'platform' are intentionally filtered out via
the role selector.
- Each station node carries name + geo. The trunk line letter (A..E)
is harvested from the relation's ref tag.
- Names like 'Châtelet - Voie 1' / 'Brétigny - Voie 6' are stripped
of their platform suffix so the 4-5 platform nodes of a station
collapse into one entity.
- Centroid geo per group, in_paris=true when inside the Paris bbox.
- Wikidata's zone tarifaire (P5031) isn't exposed by OSM so 'zone' is
left at 0; a follow-up enrich pass can wire it later.
scripts/ingest/fame_score_rer.py
- WDQS rate-limits at 1 req/min for repeat clients since the 2024
outage — too aggressive for a 230-entity batch. So this version
skips SPARQL entirely and hits the Wikipedia REST endpoints
directly:
1. summary/<name> | summary/<name> (RER) | summary/Gare de <name>
— first 200-OK whose description matches a transit keyword
wins. Word-boundary anchored regex so 'métropole' / 'commune'
don't smuggle in unrelated commune pages (the false positives
that were inflating Issy, Antony, etc.).
2. /metrics/pageviews/per-article/.../monthly/ → 365-day sum.
3. percentile-rank → fame_score 0..=100.
- 136/230 stations resolved (the rest are small banlieue stops
without a dedicated frwiki page) — they stay at fame_score=null and
the scoring engine treats them as neutral 50. Coverage gap is
acceptable for v1 of the pack.
Bootstrap of the rer domain pack via the two ingest scripts in
scripts/ingest/. Shapes:
domains/rer/entities.json (230 stations)
- Pulled from OSM Overpass: every relation[network=RER] in IDF, then
stop-role nodes deduped by canonical name (platform suffixes
stripped).
- Attributes: lines (subset of {A,B,C,D,E}), geo, in_paris (Paris
inner-ring bbox), zone (placeholder 0 — Wikidata's P5031 to come).
- fame_score populated on 136 stations (those with an unambiguous
frwiki article matching a transit-topic keyword). The remaining 94
small banlieue stops sit at null → core::scoring treats them as
neutral 50.
domains/rer/predicates.json (31 predicates)
- Mirror of the paris-metro pack with rer-specific tweaks:
- 4 ends_with, 4 starts_with, 2 contains_letter (Y, U)
- name_length_max=7, name_length_min=13, name_length_eq=9
- name_word_count_eq={1, 3}, name_word_count_min=4
- attr_list_size_eq=1 (single-line stations only)
- on_attr_in_set on every individual line A..E + A|B
- in_paris true/false
- within_km anchors: Louvre, Tour Eiffel, Gare de Lyon, CDG,
Versailles
- Two predicates that paris-metro carries were dropped on rer because
coverage was < 8 stations (contains_letter Z = 6, attr_list_size_gte
n=3 = 1 station) — under that threshold the CSP solver thrashes.
domains/rer/metadata.json
- version 0.2.0 (0.1.0 seeded, 0.2.0 after fame ingest)
- sources_versions records both the Overpass query date and the
Wikipedia pageviews window for auditability.
100-seed generator stress test passes at the worker's new 1000
attempts threshold.
server: - main.rs declares rer in active_domains so /api/domains lists it and the start_game / solo / duel endpoints accept domain='rer'. The Dockerfile already COPYs domains/* into the image; the worker auto-discovers any subdirectory of domains/, so once this PR ships the next daily run produces a rer grid automatically. - paris-metro version field bumped 0.3.0 → 0.4.0 to match the fame score patch from #57, and 'duel' added to its available_modes (was missing since #55 made duels free). web: - grid_domain_rer i18n key in fr + en. The Grid component reads the active domain label from this key; the rest of the UI is domain-agnostic, so no other change is needed. A domain picker on the home page isn't part of this PR — the URL still hardcodes paris-metro on /. Multi-domain selection will land as a follow-up once we have at least two playable domains in prod.
domains/paris-metro/tmp/ was the original local-only scratch dir for ingest audit reports. The rer domain reuses the same convention so the ignore rule should match any domain. Also drops the accidentally committed domains/rer/tmp/fame_score_report.json (the audit output, not a build artefact).
9 tasks
Calixteair
added a commit
that referenced
this pull request
May 13, 2026
Adding a new domain in main.rs used to require running INSERT INTO domains (...) manually on prod — the grids.domain foreign key rejects any solo/daily/duel start_game otherwise with a 500. We hit this on rer's first deploy (#62). Run a one-shot upsert pass right after migrations from state.domains: - Existing rows get version + active updated to match the code - New rows are inserted with empty metadata, active=true, created_at=now() Idempotent — safe to re-run every boot. A failure is logged and the server keeps booting (downgraded to the previous behaviour) so a typo in the upsert path doesn't block a redeploy.
15 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
First step of phase 3 from the roadmap: add the rer domain pack alongside paris-metro.
Why
paris-metro proved the engine works. Phase 3 measures how cheap it is to clone the pattern for a new dataset. This PR is the answer: about a half-day of work, mostly ingest tooling + tuning the predicate pack until the CSP solver consistently finds grids.
How
5 commits in increasing scope:
1.
09f95c4 feat(worker): bump generator max_attempts 200 → 1000The CSP solver in
core::generatorretries up tomax_attemptsrandom predicate selections before giving up. paris-metro is wide enough that 200 always succeeds; rer is denser and was failing 17% of seeds at 200. Bumping to 1000 gets us 0/100 failures and costs sub-second extra runtime per (domain, day).2.
b0fbc3c feat(scripts): ingest tooling for the rer domainTwo stdlib-only Python scripts under
scripts/ingest/:build_rer_dataset.pypullsrelation[network=RER]+ theirstop-role node members from OSM Overpass, deduplicates by canonical name (stripsVoie X/Quai Xplatform suffixes), centroidises per group, packages asentities.json.fame_score_rer.pyskips Wikidata entirely (WDQS rate-limits at 1 req/min since the 2024 outage — too aggressive for batch). Goes straight at the Wikipedia REST endpoints:summaryto resolve<name>/<name> (RER)/Gare de <name>, filters by topic keyword (word-boundary regex so 'métropole' / 'commune' don't smuggle in unrelated commune pages), thenpageviews/per-articlefor the 365-day sum, percentile-ranks.Coverage: 136/230 stations matched to a frwiki article. The remaining 94 are small banlieue stops without a dedicated page; they stay at
fame_score: nullandcore::scoringtreats them as neutral 50.3.
a490c45 feat(domain-rer): seed pack — 230 stations, 31 predicatesThe artefacts the two scripts produce, committed as-is:
entities.json— 230 stations, sorted by id, with attributeslines,geo,in_paris,zone(placeholder 0 — Wikidata'sP5031to wire later)predicates.json— mirror of paris-metro with rer tweaks. Two predicates from paris-metro were dropped because coverage was < 8 stations (contains_letter Z= 6,attr_list_size_gte n=3= 1) — under that threshold the CSP solver thrashes.metadata.json— version 0.2.0, both ingestion sources tracked insources_versions4.
d72101d feat(server,web): register rer domain + i18n labelsserver/src/main.rs: declaresrerinactive_domains. Same shape as paris-metro entry. While here, bumps paris-metro version field 0.3.0 → 0.4.0 (the fame snapshot from feat: ingest fame_score on paris-metro (Wikipedia pageviews) #57) and addsduelto itsavailable_modes(was missing since feat: phase 2 finale — originality score + async duel #55).web/messages/{fr,en}.json:grid_domain_rerlabel.Worker auto-discovers any subdirectory of
domains/, so once this lands the next nightly cron produces a rer daily grid automatically. No worker config change needed.A domain picker on the home page is not part of this PR — the URL still hardcodes paris-metro on
/. Multi-domain switching will land as a follow-up once we have at least two playable domains in prod (i.e. once this PR ships).5.
97a43e5 chore: generalise tmp/ gitignoreDomain-pack scratch dirs are now
domains/*/tmp/(was specificallydomains/paris-metro/tmp/). Drops an audit report that slipped in.Calibration sanity check
Top fame_score on rer:
Bottom:
Checklist
cargo fmt --checkpassescargo clippy --all-targets -- -D warningspassescargo test --workspacepassespnpm --dir web lint && typecheck && testpassesdocs/agents/agent-f-domains.mdalready covers generalisation (paris-metro PR), no update needed for this domainTest plan
rerdaily gridGET /api/grids/rer/todayreturns aPublicGridshapePOST /api/games {domain:"rer", mode:"solo"}returns a fresh seed + grid with cells that can be solvedPOST /api/duels {domain:"rer"}succeeds, share link plays cleanlyGET /api/leaderboard/rer?period=dailyreturns an empty list (no plays yet) without erroringGET /api/grids/paris-metro/todayunchanged