Skip to content

chore: scrap geocoding-based NUTS resolution direction (#45)#70

Merged
bk86a merged 1 commit intomainfrom
chore/scrap-geocoding-direction
May 1, 2026
Merged

chore: scrap geocoding-based NUTS resolution direction (#45)#70
bk86a merged 1 commit intomainfrom
chore/scrap-geocoding-direction

Conversation

@bk86a
Copy link
Copy Markdown
Owner

@bk86a bk86a commented May 1, 2026

Summary

Removes the geocoding-based validation workflow and script introduced in e8948b7, and abandons the broader direction explored under #45 (Nominatim/Zippopotam → coordinates → GISCO find-nuts.py for low-confidence postal codes).

Why

First and only run of the daily validation workflow (2026-05-01 report on #69) showed:

  • ~6-7% coverage — only 26 of 391 low-confidence entries got a geocoded NUTS3 result. Nominatim is biased against rural / small-locality postcodes, i.e. the exact postcodes that need a fallback.
  • High noise in the geocoded slice — 8 of 26 disagreements (31%) are dominated by Nominatim mis-geocodes and GISCO border-pixel artefacts, not real estimate errors.

Net signal is not worth the operational cost (daily noise email per run, plus a future hot-path latency hit if the live fallback ever shipped). All three #45 sub-features (validation, live fallback for unknown postcodes, better monitor NUTS estimation) die together — they share the same geocoding mechanism.

If postal-code coverage gaps need addressing again, the agreed alternative is GeoNames postcode bulk data (#54/#56/#57), not geocoding.

Changes

  • Delete .github/workflows/validate-estimates.yml
  • Delete scripts/validate_estimates.py
  • Drop the #45 (happyGISCO outbound geocoding) re-baseline bullet from docs/performance.md

.github/data/validation_state.json was never created (the workflow only ran once via the public output and didn't push state). Repo's existing GISCO TERCET references in app/data_loader.py, app/main.py, README, and CHANGELOG are about the authoritative TERCET flat files and the NUTS region names CSV — those are unrelated to the geocoding direction and are kept.

Closes #45.
Closes #69.

Test plan

  • Confirm .github/workflows/validate-estimates.yml no longer appears in the Actions tab once merged
  • Confirm the next 03:00 UTC tick produces no scheduled run
  • Spot-check docs/performance.md reads cleanly without the dropped bullet

🤖 Generated with Claude Code

The geocoding direction explored under #45 — using Nominatim/Zippopotam
to geocode a postal code to coordinates and then GISCO find-nuts.py to
look up NUTS3 — is abandoned. First and only run of the daily validation
workflow showed Nominatim/Zippopotam coverage at ~6-7% of the
low-confidence pool (Nominatim is biased against rural / small-locality
postcodes — exactly the postcodes that need a fallback) and the
disagreement rate among the geocoded slice was dominated by
Nominatim mis-geocodes and GISCO border-pixel artefacts rather than
real errors in our estimates. Net signal is not worth the operational
cost.

Removes:
  - .github/workflows/validate-estimates.yml (daily 03:00 UTC validation)
  - scripts/validate_estimates.py (the script the workflow ran)
  - the #45 happyGISCO bullet from docs/performance.md (no longer a
    future re-baseline scenario)

Closes #45 (parent investigation — all three sub-features die together:
validation, live fallback, monitor estimation).
Closes #69 (rolling tracking issue — moot once the workflow is gone).

If postal-code coverage gaps need addressing again, the agreed alternative
direction is GeoNames postcode bulk data (#54/#56/#57), not geocoding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bk86a bk86a merged commit 0fe157c into main May 1, 2026
11 checks passed
@bk86a bk86a deleted the chore/scrap-geocoding-direction branch May 1, 2026 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Daily estimate validation — rolling report feat: investigate using happyGISCO for improved NUTS estimation

1 participant