Skip to content

Releases: bk86a/PostalCode2NUTS

v0.17.1 — libsql Hrana wire-protocol fix

29 Apr 16:39

Choose a tag to compare

Fixed

  • TokenDB wire protocol (#61): the v0.17.0 client assumed a generic POST /query body shape; the actual deployment target speaks libsql/Hrana v2 (POST /v2/pipeline with statements wrapped as {requests: [{type: "execute", stmt: {sql, args}}]} and rows returned as arrays of typed value objects). TokenDB.execute now speaks Hrana correctly, automatically rewrites libsql:// URLs to https://, and accepts a Bearer auth token via the new PC2NUTS_TOKEN_DB_AUTH_TOKEN env var (and matching --auth-token CLI flag).

Verified end-to-end against a real database instance: typed-value round-trip (str/int/null/float/blob), schema initialisation (idempotent), add with RETURNING id, idempotent revoke, and list_active/list_all projections.

Activation

To bring the feature live in production, the operator must (one-time):

  1. Provision a libsql-compatible database with the configured provider; obtain the libsql:// URL and the auth token.
  2. Set both PC2NUTS_TOKEN_DB_URL and PC2NUTS_TOKEN_DB_AUTH_TOKEN on the running container's environment configuration.
  3. Restart the container.
  4. From an operator laptop with both env vars set: python -m scripts.tokens init (creates schema, idempotent).
  5. Migrate any existing v1 env-var tokens with python -m scripts.tokens add --label "..." --value "<existing-token>" (preserves the audit token_id).

v0.17.0 — DB-backed trusted tokens

29 Apr 15:31

Choose a tag to compare

What's new

  • DB-backed trusted tokens (#61): trusted-token storage moved from PC2NUTS_TRUSTED_TOKENS env var to a managed SQLite-compatible HTTP database. Configure with PC2NUTS_TOKEN_DB_URL. Tokens issued via python -m scripts.tokens add --label "..." take effect within ~60 s (configurable via PC2NUTS_TOKEN_REFRESH_SECONDS) — no container restart required. The env var continues to work as a union with the DB and serves as a disaster-recovery fallback when the DB is unreachable.
  • Operator CLI at python -m scripts.tokens — subcommands init, add, list, revoke. add --value <token> preserves audit-id continuity when migrating v1 env-var tokens. --value is validated to be ≥32 hex chars to prevent accidentally minting weak tokens.
  • /health field token_db_stale — flags when the most recent DB refresh failed and the in-memory set is stale.

Backwards compatibility

Fully backwards-compatible with v0.16.0. With PC2NUTS_TOKEN_DB_URL unset, behaviour is identical to v0.16.0. The env var is not deprecated in this release.

Operator runbook

See README → Authentication & rate-limit bypass.

Activation

The feature is live in code but requires a one-time database setup before it takes effect: provision a database instance, set PC2NUTS_TOKEN_DB_URL, run python -m scripts.tokens init. Until then, behaviour is unchanged from v0.16.0.

Closes #61.

v0.16.0 — Auth-token rate-limit bypass

29 Apr 12:49

Choose a tag to compare

What's new

  • Auth-token bypass (#60): trusted callers can bypass the per-IP rate limit by presenting Authorization: Bearer <token>. Tokens are managed via the new PC2NUTS_TRUSTED_TOKENS comma-separated env var (restart applies changes). Invalid tokens return 401; malformed Authorization headers return 400. When the env var is unset/empty, the feature is fully disabled and any Authorization header is ignored — behaviour identical to pre-feature.

Audit logging

Trusted requests are recorded in the access log with a token_id=<8hex> field — the first 8 chars of sha256(token). Token values themselves never appear in logs. To match a logged token id back to a token: echo -n "<token>" | sha256sum | cut -c1-8.

Operator runbook

See README → Authentication & rate-limit bypass for copy-pasteable steps to issue, verify, revoke, and disable tokens.

Out of scope

No per-token quotas, OAuth, JWT, signatures, or self-service signup — opaque bearer tokens with full exemption only. Future iterations can add quotas if usage demands.

Closes #60.

v0.15.0 — Montenegro support

29 Apr 07:16

Choose a tag to compare

What's new

  • Montenegro (ME) support (#53): postal-code lookups for Montenegro return ME000 / ME00 / ME0 via the existing single-NUTS3 fallback (Tier 5). Eurostat treats Montenegro as a single nationwide unit at every NUTS level, and GISCO publishes no TERCET file for it; ME is therefore served entirely from the new `single_nuts3_fallback` map in `app/settings.json` (no external data download). Pattern: 5 digits starting with `8`, optional `ME-` / `ME ` prefix accepted.

  • `single_nuts3_fallback` settings field: data-driven seed for the Tier 5 single-NUTS3 set, allowing countries with no GISCO TERCET coverage but a single nationwide NUTS3 unit to be added via configuration alone. Auto-detected single-NUTS3 entries derived from real data take precedence on conflict. Pre-positions cleanly for #55 (Faroe Islands).

Changed

  • `patterns_version` bumped to 1.1 (additive change — new ME entry, no existing pattern altered).
  • `get_loaded_countries()` now includes countries served only via the single-NUTS3 fallback, so `/lookup` accepts them without a 400.

Coverage

35 countries (was 34): EU-27 + EFTA-4 + four candidates (MK, ME, RS, TR).

Closes #53.

v0.14.0

03 Mar 13:09

Choose a tag to compare

What's new

  • Add _meta block (version, date) to postal_patterns.json for change detection by external consumers
  • Surface patterns_version in /health endpoint response

Closes #34

v0.13.0

23 Feb 20:35

Choose a tag to compare

What's Changed

Added

  • Automated test suite (#25): 69 pytest tests covering postal_patterns.py (preprocessing, tercet_map, extraction), data_loader.py (normalize functions, all 5 lookup tiers), and FastAPI endpoints (/lookup, /pattern, /health). CI now runs tests before publish.
  • Makefile (#24): standard targets for lint, format, test, run, docker-build, docker-run.
  • Pre-commit hooks (#24): ruff lint + format via .pre-commit-config.yaml.
  • requirements-dev.txt (#22): dev/test dependencies (ruff, bandit, pip-audit, pytest).
  • ruff format CI check (#24): enforces consistent code formatting in CI.

Changed

  • Centralized duplicated logic (#22): normalize_country() replaces duplicate GR→EL blocks, _db_connection() context manager replaces 6 manual SQLite connect/close patterns, _build_result() helper replaces repetitive result dict construction across all lookup tiers.
  • Narrowed exception handling (#23): 9 bare except Exception blocks in data_loader.py replaced with specific types (sqlite3.Error, httpx.RequestError, OSError, csv.Error, etc.). Silent catch in import_estimates.py now logs a message.
  • Return type hints added to dispatch() and _rate_limit_handler() in main.py.
  • Branch protection enabled on main: required status checks + PR reviews.

Full Changelog: v0.12.0...v0.13.0

v0.12.0

23 Feb 19:21

Choose a tag to compare

What's Changed

Fixed

  • MT regex (#14): separator between alpha prefix and digits is now optional (MST1000 accepted alongside MST 1000 and MST-1000). Previously, codes without a space failed regex extraction and fell to approximate matching with lower confidence.

Added

  • Country-level majority-vote fallback: new Tier 4 in the lookup chain for countries where all postal codes map to the same NUTS1/NUTS2 but NUTS3 has a dominant winner. Returns match_type: "approximate" with NUTS1/NUTS2 confidence 1.0 and NUTS3 confidence based on agreement ratio (capped at 0.80). Naturally captures MT (MT0/MT00/MT001 at ~77%). Digit-only MT codes like 1043 that previously returned 404 now get a valid approximate result.

Full Changelog: v0.11.0...v0.12.0

v0.11.0

23 Feb 19:14

Choose a tag to compare

Added

  • FR CEDEX estimates (#8): ~511 French CEDEX postal codes (enterprise/university mail routing) added to tercet_missing_codes.csv with high-confidence département→NUTS3 mappings.
  • FR DOM-TOM estimates (#9): 15 French overseas territory postal codes (Guadeloupe, Martinique, Guyane, La Réunion, Mayotte) added with high-confidence mappings. French Polynesia (987xx) and New Caledonia (988xx) excluded — these are OCTs with no valid NUTS mapping.
  • NL missing code estimates (#13): 8 Dutch postal codes for major cities (Amsterdam, The Hague, Utrecht, Maastricht, Arnhem, Apeldoorn, Zwolle) added with high-confidence mappings. Willemstad (3059) excluded — belongs to Curaçao, not the Netherlands.

v0.10.1 — Preprocessing order fix and regex relaxations

23 Feb 18:57

Choose a tag to compare

Fixes

  • Preprocessing order: dot thousand-separator removal now runs before .0 stripping, so locale-formatted codes like 13.000 correctly become 13000 instead of 13 (regression from v0.10.0).
  • IE regex (#10): space between Eircode routing key and identifier is now optional — D02X285 accepted alongside D02 X285.
  • PT regex (#12): space accepted as separator between digit groups — 1000 001 alongside 1000-001 and 1000001.
  • NO (#11): closed as already handled — all regexes are compiled with re.IGNORECASE and input is uppercased before matching.

Backward compatibility

Fully backward compatible — all previously valid inputs continue to work. Only adds acceptance of additional input formats.

v0.10.0 — Input preprocessing for Excel artifacts

23 Feb 18:51

Choose a tag to compare

What's new

Generic input preprocessing for postal codes mangled by Excel, CSV exports, or database dumps. Three country-agnostic steps are applied automatically before regex matching:

Step Problem Example Result
Strip .0 suffix Excel stores numbers as floats 28040.0 28040
Remove dot thousands Dot-as-thousand-separator formatting 13.600 13600
Restore leading zeros Excel strips leading zeros from numbers 8461 (ES) 08461

This recovers an estimated 2,000–4,000 additional postal code mappings from real-world datasets without any changes to the curated regex patterns.

New metadata: expected_digits

A new expected_digits field in postal_patterns.json enables country-aware leading-zero restoration for 30 countries with fixed-length all-numeric postal codes. Countries with non-numeric formats (IE, MT, NL) are excluded.

Backward compatibility

  • Fully backward compatible — correctly formatted postal codes pass through preprocessing unchanged
  • No regex patterns were modified
  • No API contract changes

Files changed

File Change
app/postal_patterns.py New _preprocess() function, updated extract_postal_code()
app/postal_patterns.json Added expected_digits to 30 country entries
app/__init__.py Version bump to 0.10.0
CHANGELOG.md New 0.10.0 entry
README.md Documented preprocessing steps and expected_digits

Closes #16. Subsumes #15.