Releases: bk86a/PostalCode2NUTS
v0.17.1 — libsql Hrana wire-protocol fix
Fixed
- TokenDB wire protocol (#61): the v0.17.0 client assumed a generic
POST /querybody shape; the actual deployment target speaks libsql/Hrana v2 (POST /v2/pipelinewith statements wrapped as{requests: [{type: "execute", stmt: {sql, args}}]}and rows returned as arrays of typed value objects).TokenDB.executenow speaks Hrana correctly, automatically rewriteslibsql://URLs tohttps://, and accepts a Bearer auth token via the newPC2NUTS_TOKEN_DB_AUTH_TOKENenv var (and matching--auth-tokenCLI flag).
Verified end-to-end against a real database instance: typed-value round-trip (str/int/null/float/blob), schema initialisation (idempotent), add with RETURNING id, idempotent revoke, and list_active/list_all projections.
Activation
To bring the feature live in production, the operator must (one-time):
- Provision a libsql-compatible database with the configured provider; obtain the
libsql://URL and the auth token. - Set both
PC2NUTS_TOKEN_DB_URLandPC2NUTS_TOKEN_DB_AUTH_TOKENon the running container's environment configuration. - Restart the container.
- From an operator laptop with both env vars set:
python -m scripts.tokens init(creates schema, idempotent). - Migrate any existing v1 env-var tokens with
python -m scripts.tokens add --label "..." --value "<existing-token>"(preserves the audittoken_id).
v0.17.0 — DB-backed trusted tokens
What's new
- DB-backed trusted tokens (#61): trusted-token storage moved from
PC2NUTS_TRUSTED_TOKENSenv var to a managed SQLite-compatible HTTP database. Configure withPC2NUTS_TOKEN_DB_URL. Tokens issued viapython -m scripts.tokens add --label "..."take effect within ~60 s (configurable viaPC2NUTS_TOKEN_REFRESH_SECONDS) — no container restart required. The env var continues to work as a union with the DB and serves as a disaster-recovery fallback when the DB is unreachable. - Operator CLI at
python -m scripts.tokens— subcommandsinit,add,list,revoke.add --value <token>preserves audit-id continuity when migrating v1 env-var tokens.--valueis validated to be ≥32 hex chars to prevent accidentally minting weak tokens. /healthfieldtoken_db_stale— flags when the most recent DB refresh failed and the in-memory set is stale.
Backwards compatibility
Fully backwards-compatible with v0.16.0. With PC2NUTS_TOKEN_DB_URL unset, behaviour is identical to v0.16.0. The env var is not deprecated in this release.
Operator runbook
See README → Authentication & rate-limit bypass.
Activation
The feature is live in code but requires a one-time database setup before it takes effect: provision a database instance, set PC2NUTS_TOKEN_DB_URL, run python -m scripts.tokens init. Until then, behaviour is unchanged from v0.16.0.
Closes #61.
v0.16.0 — Auth-token rate-limit bypass
What's new
- Auth-token bypass (#60): trusted callers can bypass the per-IP rate limit by presenting
Authorization: Bearer <token>. Tokens are managed via the newPC2NUTS_TRUSTED_TOKENScomma-separated env var (restart applies changes). Invalid tokens return401; malformedAuthorizationheaders return400. When the env var is unset/empty, the feature is fully disabled and anyAuthorizationheader is ignored — behaviour identical to pre-feature.
Audit logging
Trusted requests are recorded in the access log with a token_id=<8hex> field — the first 8 chars of sha256(token). Token values themselves never appear in logs. To match a logged token id back to a token: echo -n "<token>" | sha256sum | cut -c1-8.
Operator runbook
See README → Authentication & rate-limit bypass for copy-pasteable steps to issue, verify, revoke, and disable tokens.
Out of scope
No per-token quotas, OAuth, JWT, signatures, or self-service signup — opaque bearer tokens with full exemption only. Future iterations can add quotas if usage demands.
Closes #60.
v0.15.0 — Montenegro support
What's new
-
Montenegro (ME) support (#53): postal-code lookups for Montenegro return
ME000/ME00/ME0via the existing single-NUTS3 fallback (Tier 5). Eurostat treats Montenegro as a single nationwide unit at every NUTS level, and GISCO publishes no TERCET file for it; ME is therefore served entirely from the new `single_nuts3_fallback` map in `app/settings.json` (no external data download). Pattern: 5 digits starting with `8`, optional `ME-` / `ME ` prefix accepted. -
`single_nuts3_fallback` settings field: data-driven seed for the Tier 5 single-NUTS3 set, allowing countries with no GISCO TERCET coverage but a single nationwide NUTS3 unit to be added via configuration alone. Auto-detected single-NUTS3 entries derived from real data take precedence on conflict. Pre-positions cleanly for #55 (Faroe Islands).
Changed
- `patterns_version` bumped to 1.1 (additive change — new ME entry, no existing pattern altered).
- `get_loaded_countries()` now includes countries served only via the single-NUTS3 fallback, so `/lookup` accepts them without a 400.
Coverage
35 countries (was 34): EU-27 + EFTA-4 + four candidates (MK, ME, RS, TR).
Closes #53.
v0.14.0
v0.13.0
What's Changed
Added
- Automated test suite (#25): 69 pytest tests covering
postal_patterns.py(preprocessing, tercet_map, extraction),data_loader.py(normalize functions, all 5 lookup tiers), and FastAPI endpoints (/lookup,/pattern,/health). CI now runs tests before publish. - Makefile (#24): standard targets for
lint,format,test,run,docker-build,docker-run. - Pre-commit hooks (#24): ruff lint + format via
.pre-commit-config.yaml. requirements-dev.txt(#22): dev/test dependencies (ruff, bandit, pip-audit, pytest).ruff formatCI check (#24): enforces consistent code formatting in CI.
Changed
- Centralized duplicated logic (#22):
normalize_country()replaces duplicate GR→EL blocks,_db_connection()context manager replaces 6 manual SQLite connect/close patterns,_build_result()helper replaces repetitive result dict construction across all lookup tiers. - Narrowed exception handling (#23): 9 bare
except Exceptionblocks indata_loader.pyreplaced with specific types (sqlite3.Error,httpx.RequestError,OSError,csv.Error, etc.). Silent catch inimport_estimates.pynow logs a message. - Return type hints added to
dispatch()and_rate_limit_handler()inmain.py. - Branch protection enabled on
main: required status checks + PR reviews.
Full Changelog: v0.12.0...v0.13.0
v0.12.0
What's Changed
Fixed
- MT regex (#14): separator between alpha prefix and digits is now optional (
MST1000accepted alongsideMST 1000andMST-1000). Previously, codes without a space failed regex extraction and fell to approximate matching with lower confidence.
Added
- Country-level majority-vote fallback: new Tier 4 in the lookup chain for countries where all postal codes map to the same NUTS1/NUTS2 but NUTS3 has a dominant winner. Returns
match_type: "approximate"with NUTS1/NUTS2 confidence 1.0 and NUTS3 confidence based on agreement ratio (capped at 0.80). Naturally captures MT (MT0/MT00/MT001 at ~77%). Digit-only MT codes like1043that previously returned 404 now get a valid approximate result.
Full Changelog: v0.11.0...v0.12.0
v0.11.0
Added
- FR CEDEX estimates (#8): ~511 French CEDEX postal codes (enterprise/university mail routing) added to
tercet_missing_codes.csvwith high-confidence département→NUTS3 mappings. - FR DOM-TOM estimates (#9): 15 French overseas territory postal codes (Guadeloupe, Martinique, Guyane, La Réunion, Mayotte) added with high-confidence mappings. French Polynesia (987xx) and New Caledonia (988xx) excluded — these are OCTs with no valid NUTS mapping.
- NL missing code estimates (#13): 8 Dutch postal codes for major cities (Amsterdam, The Hague, Utrecht, Maastricht, Arnhem, Apeldoorn, Zwolle) added with high-confidence mappings. Willemstad (3059) excluded — belongs to Curaçao, not the Netherlands.
v0.10.1 — Preprocessing order fix and regex relaxations
Fixes
- Preprocessing order: dot thousand-separator removal now runs before
.0stripping, so locale-formatted codes like13.000correctly become13000instead of13(regression from v0.10.0). - IE regex (#10): space between Eircode routing key and identifier is now optional —
D02X285accepted alongsideD02 X285. - PT regex (#12): space accepted as separator between digit groups —
1000 001alongside1000-001and1000001. - NO (#11): closed as already handled — all regexes are compiled with
re.IGNORECASEand input is uppercased before matching.
Backward compatibility
Fully backward compatible — all previously valid inputs continue to work. Only adds acceptance of additional input formats.
v0.10.0 — Input preprocessing for Excel artifacts
What's new
Generic input preprocessing for postal codes mangled by Excel, CSV exports, or database dumps. Three country-agnostic steps are applied automatically before regex matching:
| Step | Problem | Example | Result |
|---|---|---|---|
Strip .0 suffix |
Excel stores numbers as floats | 28040.0 |
28040 |
| Remove dot thousands | Dot-as-thousand-separator formatting | 13.600 |
13600 |
| Restore leading zeros | Excel strips leading zeros from numbers | 8461 (ES) |
08461 |
This recovers an estimated 2,000–4,000 additional postal code mappings from real-world datasets without any changes to the curated regex patterns.
New metadata: expected_digits
A new expected_digits field in postal_patterns.json enables country-aware leading-zero restoration for 30 countries with fixed-length all-numeric postal codes. Countries with non-numeric formats (IE, MT, NL) are excluded.
Backward compatibility
- Fully backward compatible — correctly formatted postal codes pass through preprocessing unchanged
- No regex patterns were modified
- No API contract changes
Files changed
| File | Change |
|---|---|
app/postal_patterns.py |
New _preprocess() function, updated extract_postal_code() |
app/postal_patterns.json |
Added expected_digits to 30 country entries |
app/__init__.py |
Version bump to 0.10.0 |
CHANGELOG.md |
New 0.10.0 entry |
README.md |
Documented preprocessing steps and expected_digits |