feat(domain-rer): RER d'Île-de-France — first multi-domain ship by Calixteair · Pull Request #62 · Calixteair/kalidoku

Calixteair · 2026-05-13T21:07:05Z

What

First step of phase 3 from the roadmap: add the rer domain pack alongside paris-metro.

230 stations covering RER A–E
31 predicates (letters, length, word count, lines, geo anchors)
fame_score on 136 stations from French Wikipedia pageviews
Generator validated on 100 seeds × 0 failures
Worker auto-discovers the new domain — no extra wiring needed once this lands

Why

paris-metro proved the engine works. Phase 3 measures how cheap it is to clone the pattern for a new dataset. This PR is the answer: about a half-day of work, mostly ingest tooling + tuning the predicate pack until the CSP solver consistently finds grids.

How

5 commits in increasing scope:

1. `09f95c4 feat(worker): bump generator max_attempts 200 → 1000`

The CSP solver in core::generator retries up to max_attempts random predicate selections before giving up. paris-metro is wide enough that 200 always succeeds; rer is denser and was failing 17% of seeds at 200. Bumping to 1000 gets us 0/100 failures and costs sub-second extra runtime per (domain, day).

2. `b0fbc3c feat(scripts): ingest tooling for the rer domain`

Two stdlib-only Python scripts under scripts/ingest/:

build_rer_dataset.py pulls relation[network=RER] + their stop-role node members from OSM Overpass, deduplicates by canonical name (strips Voie X / Quai X platform suffixes), centroidises per group, packages as entities.json.
fame_score_rer.py skips Wikidata entirely (WDQS rate-limits at 1 req/min since the 2024 outage — too aggressive for batch). Goes straight at the Wikipedia REST endpoints: summary to resolve <name> / <name> (RER) / Gare de <name>, filters by topic keyword (word-boundary regex so 'métropole' / 'commune' don't smuggle in unrelated commune pages), then pageviews/per-article for the 365-day sum, percentile-ranks.

Coverage: 136/230 stations matched to a frwiki article. The remaining 94 are small banlieue stops without a dedicated page; they stay at fame_score: null and core::scoring treats them as neutral 50.

3. `a490c45 feat(domain-rer): seed pack — 230 stations, 31 predicates`

The artefacts the two scripts produce, committed as-is:

entities.json — 230 stations, sorted by id, with attributes lines, geo, in_paris, zone (placeholder 0 — Wikidata's P5031 to wire later)
predicates.json — mirror of paris-metro with rer tweaks. Two predicates from paris-metro were dropped because coverage was < 8 stations (contains_letter Z = 6, attr_list_size_gte n=3 = 1) — under that threshold the CSP solver thrashes.
metadata.json — version 0.2.0, both ingestion sources tracked in sources_versions

4. `d72101d feat(server,web): register rer domain + i18n labels`

server/src/main.rs: declares rer in active_domains. Same shape as paris-metro entry. While here, bumps paris-metro version field 0.3.0 → 0.4.0 (the fame snapshot from feat: ingest fame_score on paris-metro (Wikipedia pageviews) #57) and adds duel to its available_modes (was missing since feat: phase 2 finale — originality score + async duel #55).
web/messages/{fr,en}.json: grid_domain_rer label.

Worker auto-discovers any subdirectory of domains/, so once this lands the next nightly cron produces a rer daily grid automatically. No worker config change needed.

A domain picker on the home page is not part of this PR — the URL still hardcodes paris-metro on /. Multi-domain switching will land as a follow-up once we have at least two playable domains in prod (i.e. once this PR ships).

5. `97a43e5 chore: generalise tmp/ gitignore`

Domain-pack scratch dirs are now domains/*/tmp/ (was specifically domains/paris-metro/tmp/). Drops an audit report that slipped in.

Calibration sanity check

Top fame_score on rer:

score	station
100	Musée d'Orsay
99	Gare de Lyon
98	Paris Gare du Nord
97	Paris Austerlitz, Châtelet - Les Halles
96	Massy-Palaiseau
95	Versailles-Chantiers
94	La Défense - Grande Arche, Magenta
93	Juvisy-sur-Orge

Bottom:

score	station
0	Boutigny
1	Sermaise
2	Buno - Gironville
3	Boissise-le-Roi, Villabé

Checklist

Conventional commit titles, no AI / Claude mention
cargo fmt --check passes
cargo clippy --all-targets -- -D warnings passes
cargo test --workspace passes
100-seed generator run on rer: 0/100 failures @ max_attempts=1000
pnpm --dir web lint && typecheck && test passes
No new secret
No contract change (the new pack uses existing schemas)
Docs: docs/agents/agent-f-domains.md already covers generalisation (paris-metro PR), no update needed for this domain

Test plan

After merge + deploy, the worker's next nightly run publishes a rer daily grid
GET /api/grids/rer/today returns a PublicGrid shape
POST /api/games {domain:"rer", mode:"solo"} returns a fresh seed + grid with cells that can be solved
POST /api/duels {domain:"rer"} succeeds, share link plays cleanly
GET /api/leaderboard/rer?period=daily returns an empty list (no plays yet) without erroring
paris-metro flows still work — GET /api/grids/paris-metro/today unchanged

The daily-grid generator backs off after max_attempts CSP attempts. 200 was fine for the paris-metro pack (~36 predicates, 310 entities, broad intersections). Denser packs — RER ships with 230 entities and a narrower set of valid candidates — were occasionally failing publish on their first attempt. 100-seed stress test: paris-metro @ 200: 0/100 failures rer @ 200: 17/100 failures rer @ 1000: 0/100 failures Each attempt is sub-ms on the runner, so 1000 still costs under a second per (domain, day). No need to expose the value as a config field yet — flip the constant when a future pack needs more.

Two Python scripts (stdlib only, no external deps), mirroring the paris-metro pipeline but adapted to the rer constraints. scripts/ingest/build_rer_dataset.py - One Overpass query pulls every relation[type=route][network=RER] in Île-de-France plus their stop-role node members. Members tagged 'stop_entry_only' / 'platform' are intentionally filtered out via the role selector. - Each station node carries name + geo. The trunk line letter (A..E) is harvested from the relation's ref tag. - Names like 'Châtelet - Voie 1' / 'Brétigny - Voie 6' are stripped of their platform suffix so the 4-5 platform nodes of a station collapse into one entity. - Centroid geo per group, in_paris=true when inside the Paris bbox. - Wikidata's zone tarifaire (P5031) isn't exposed by OSM so 'zone' is left at 0; a follow-up enrich pass can wire it later. scripts/ingest/fame_score_rer.py - WDQS rate-limits at 1 req/min for repeat clients since the 2024 outage — too aggressive for a 230-entity batch. So this version skips SPARQL entirely and hits the Wikipedia REST endpoints directly: 1. summary/<name> | summary/<name> (RER) | summary/Gare de <name> — first 200-OK whose description matches a transit keyword wins. Word-boundary anchored regex so 'métropole' / 'commune' don't smuggle in unrelated commune pages (the false positives that were inflating Issy, Antony, etc.). 2. /metrics/pageviews/per-article/.../monthly/ → 365-day sum. 3. percentile-rank → fame_score 0..=100. - 136/230 stations resolved (the rest are small banlieue stops without a dedicated frwiki page) — they stay at fame_score=null and the scoring engine treats them as neutral 50. Coverage gap is acceptable for v1 of the pack.

Bootstrap of the rer domain pack via the two ingest scripts in scripts/ingest/. Shapes: domains/rer/entities.json (230 stations) - Pulled from OSM Overpass: every relation[network=RER] in IDF, then stop-role nodes deduped by canonical name (platform suffixes stripped). - Attributes: lines (subset of {A,B,C,D,E}), geo, in_paris (Paris inner-ring bbox), zone (placeholder 0 — Wikidata's P5031 to come). - fame_score populated on 136 stations (those with an unambiguous frwiki article matching a transit-topic keyword). The remaining 94 small banlieue stops sit at null → core::scoring treats them as neutral 50. domains/rer/predicates.json (31 predicates) - Mirror of the paris-metro pack with rer-specific tweaks: - 4 ends_with, 4 starts_with, 2 contains_letter (Y, U) - name_length_max=7, name_length_min=13, name_length_eq=9 - name_word_count_eq={1, 3}, name_word_count_min=4 - attr_list_size_eq=1 (single-line stations only) - on_attr_in_set on every individual line A..E + A|B - in_paris true/false - within_km anchors: Louvre, Tour Eiffel, Gare de Lyon, CDG, Versailles - Two predicates that paris-metro carries were dropped on rer because coverage was < 8 stations (contains_letter Z = 6, attr_list_size_gte n=3 = 1 station) — under that threshold the CSP solver thrashes. domains/rer/metadata.json - version 0.2.0 (0.1.0 seeded, 0.2.0 after fame ingest) - sources_versions records both the Overpass query date and the Wikipedia pageviews window for auditability. 100-seed generator stress test passes at the worker's new 1000 attempts threshold.

server: - main.rs declares rer in active_domains so /api/domains lists it and the start_game / solo / duel endpoints accept domain='rer'. The Dockerfile already COPYs domains/* into the image; the worker auto-discovers any subdirectory of domains/, so once this PR ships the next daily run produces a rer grid automatically. - paris-metro version field bumped 0.3.0 → 0.4.0 to match the fame score patch from #57, and 'duel' added to its available_modes (was missing since #55 made duels free). web: - grid_domain_rer i18n key in fr + en. The Grid component reads the active domain label from this key; the rest of the UI is domain-agnostic, so no other change is needed. A domain picker on the home page isn't part of this PR — the URL still hardcodes paris-metro on /. Multi-domain selection will land as a follow-up once we have at least two playable domains in prod.

domains/paris-metro/tmp/ was the original local-only scratch dir for ingest audit reports. The rer domain reuses the same convention so the ignore rule should match any domain. Also drops the accidentally committed domains/rer/tmp/fame_score_report.json (the audit output, not a build artefact).

Adding a new domain in main.rs used to require running INSERT INTO domains (...) manually on prod — the grids.domain foreign key rejects any solo/daily/duel start_game otherwise with a 500. We hit this on rer's first deploy (#62). Run a one-shot upsert pass right after migrations from state.domains: - Existing rows get version + active updated to match the code - New rows are inserted with empty metadata, active=true, created_at=now() Idempotent — safe to re-run every boot. A failure is logged and the server keeps booting (downgraded to the previous behaviour) so a typo in the upsert path doesn't block a redeploy.

Calixteair added 5 commits May 13, 2026 23:05

Calixteair merged commit 652dece into main May 13, 2026
7 checks passed

Calixteair deleted the feat/domain-rer branch May 13, 2026 21:25

github-actions Bot mentioned this pull request May 13, 2026

chore(main): release 1.1.0 #48

Open

Calixteair mentioned this pull request May 13, 2026

feat(server): auto-upsert active domains rows on boot #63

Merged

9 tasks

Calixteair mentioned this pull request May 13, 2026

feat(web): domain picker via /[domain]/* routing + hub at / #64

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(domain-rer): RER d'Île-de-France — first multi-domain ship#62

feat(domain-rer): RER d'Île-de-France — first multi-domain ship#62
Calixteair merged 5 commits into
mainfrom
feat/domain-rer

Calixteair commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Calixteair commented May 13, 2026

What

Why

How

1. 09f95c4 feat(worker): bump generator max_attempts 200 → 1000

2. b0fbc3c feat(scripts): ingest tooling for the rer domain

3. a490c45 feat(domain-rer): seed pack — 230 stations, 31 predicates

4. d72101d feat(server,web): register rer domain + i18n labels

5. 97a43e5 chore: generalise tmp/ gitignore

Calibration sanity check

Checklist

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `09f95c4 feat(worker): bump generator max_attempts 200 → 1000`

2. `b0fbc3c feat(scripts): ingest tooling for the rer domain`

3. `a490c45 feat(domain-rer): seed pack — 230 stations, 31 predicates`

4. `d72101d feat(server,web): register rer domain + i18n labels`

5. `97a43e5 chore: generalise tmp/ gitignore`