feat: Azure Key Vault native Certificate-object storage mode#118
feat: Azure Key Vault native Certificate-object storage mode#118rocogamer wants to merge 41 commits into
Conversation
Adds a configurable storage_mode to the Azure Key Vault backend so it can
persist certificates as native Certificate objects (PKCS12, issuer
'Unknown') in addition to — or instead of — the existing per-PEM Secrets
layout. Native Certificate objects are consumed directly by App Service,
Application Gateway, Front Door, API Management and AKS Ingress without
manual PFX export.
Modes (default 'secrets' preserves existing behaviour):
secrets — current per-PEM Secrets layout, unchanged
certificate — single Certificate object, metadata stored as tags
both — write to both surfaces, prefer Secrets on read
The new POST /api/storage/azure-keyvault/backfill-certificates admin
endpoint imports Certificate objects for domains already stored as
Secrets (skipping ones already imported); restricted to storage_mode=
'both' so the legacy Secrets are still listed during the walk.
Implementation notes:
- Composition: AzureKeyVaultBackend keeps auth/naming/retry; the new
private _AzureKeyVaultCertificateImporter encapsulates Certificate-
API specifics. Both clients are lazy.
- Tags ↔ metadata round-trip lives in a single class-level helper with
a strict allow-list, so vault-level tags from Azure Policy do not
pollute the rehydrated metadata.
- Truncated SAN lists drop the (incomplete-by-construction) trailing
fragment on rehydrate to avoid exposing a malformed FQDN.
- The metadata-secret list filter is anchored to the CRC32 suffix that
_sanitize_secret_name appends, fixing a pre-existing bug in main
where endswith('-metadata') matched zero secrets in production.
- Settings migration backfills storage_mode='secrets' for upgraded
installs.
- StorageBackendTest probes the Certificate API explicitly when the
target mode requires it; failure messages expose only the exception
type to the client (full detail in server logs).
Tests: 32 unit tests with mocked SDK clients cover all three modes,
backfill flow pre-conditions, retrieve fallback paths in 'both' mode,
SAN truncation edge cases, settings migration and external-tag
filtering. Suite of 58 unit tests passes with no regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The docstring previously said "rehydrated from the Certificate object's tags". The actual code reads secret.properties.tags from the companion Secret — Azure mirrors them, so the behaviour matches the description in practice, but the docstring promised something the code doesn't literally do. Spelling out the mirroring (and the deliberate read-from-Certificate in the 'both' fallback) makes the contract auditable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The backfill endpoint was returning f'error: {per_domain}' to the client,
which exposes the full Azure exception text (tenant id, request id,
WWW-Authenticate headers). The companion StorageBackendTest endpoint
already only returns the exception type. Match the pattern here so an
admin-only surface still doesn't leak SDK internals into the response
body — full detail stays in the server log.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The backend rejects backfill unless storage_mode='both' (in 'certificate' mode list_certificates() does not surface the legacy Secrets so the walk would silently no-op). Match the UI to that contract instead of letting the user click into a 400 — the original heuristic toggled visibility when mode != 'secrets', which falsely advertised the action in certificate-only mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The sanitiser never touched self — it's a pure transformation of its name argument. Marking it @staticmethod lets callers (including the importer's ``sanitize_name`` callback wiring and the test helper) reach it without a backend instance, removing the AzureKeyVaultBackend.__new__(...) trick the round-trip test was using to call it during fixture setup. The bound-method-as-callback wiring at _get_cert_importer() keeps working because Python returns the underlying function for staticmethods accessed via an instance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In 'both' mode the Secrets and Certificate surfaces are independent and a write to one should not block the other. The previous flow caught a Secrets-surface exception with the outer try/except, which short- circuited the method before the Certificate-object branch ran. A return- False from the same call (no exception) by contrast did fall through and import the cert — asymmetric, surprising, and worse during a real outage of the Secrets API since it would silently fail to update App Gateway-bound certs that consume the Certificate object. Each surface now runs under its own try/except. The method returns True only when every active surface succeeded, so partial failures still propagate to the caller. Domain validation still aborts up-front (no point attempting either write with an invalid domain). Adds a regression test that simulates a Secrets API outage and confirms the Certificate import is still attempted and the method returns False. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous flow wrapped both deletion surfaces in a single try/except,
so a Secrets-API outage would skip the Certificate-object deletion and
leave the cert orphaned in the vault (and vice versa). store_certificate
already runs each surface under its own try/except after the previous
commit; delete now matches.
Additional fix: _delete_secrets returns a bool to signal partial per-file
failures, but delete_certificate was discarding it and always logging
"successfully deleted". The bool now propagates so the overall result is
False whenever any surface (or any per-file delete inside the secrets
walk) reports a failure.
Tests added:
- test_both_mode_certificate_exception_does_not_skip_secrets_delete
(symmetric to the store regression test)
- test_both_mode_double_failure_returns_false
- test_both_mode_partial_secret_failure_returns_false (validates the
bool propagation from _delete_secrets)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous regression test only exercised "Secrets fails, Certificate succeeds". The 'both' surface-independence contract has three failure shapes worth pinning: - Cert fails, Secrets succeeds (mirror of the existing test) - Both surfaces fail (each is still attempted, return False) These are easy to add now that the store path has its surface-isolated try/except blocks. Closes the asymmetric coverage gap left after the 49b3bec fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… ambiguous flag
certbot-dns-azure exposes multiple --dns-azure-* options
(--dns-azure-credentials, --dns-azure-config,
--dns-azure-propagation-seconds). The default
DNSProviderStrategy.configure_certbot_arguments appended --{plugin_name},
which certbot's argparse rejected as ambiguous:
certbot: error: ambiguous option: --dns-azure could match
--dns-azure-propagation-seconds, --dns-azure-config,
--dns-azure-credentials
This is the same class of bug that 4ea7269 fixed for DuckDNS via a
per-strategy override. Generalising the fix in the base method makes
every plugin that lacks an override (Azure, Cloudflare, Google,
DigitalOcean, Linode, Gandi, OVH, Namecheap, ArvanCloud, Infomaniak,
EdgeDNS, Hetzner, …) immune to the same trap going forward, since
``--authenticator <name>`` is the canonical, prefix-collision-free
selector documented by certbot.
The existing per-provider overrides (DuckDNS, AcmeDNS, PowerDNS) keep
working — they already use --authenticator — and are left untouched in
this commit to keep the diff minimal.
Tests:
- test_azure_uses_authenticator_not_shorthand pins the exact failure
mode reported in fabriziosalmi#113.
- test_authenticator_selector_used[Cloudflare/Google] guards the base
method against regression on providers without an override.
- test_domain_alias_logs_cname_hint confirms the CNAME-delegation log
line still fires after the selector switch (the original report's
setup uses CNAME delegation).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
requirements-azure.txt pinned certbot-dns-azure==2.11.0, a version that was never published to PyPI. ``pip install -r requirements-azure.txt`` failed with "No matching distribution found", forcing operators to drop the pin or downgrade by hand (the issue reporter ended up on 2.2.0). The latest version actually published on PyPI is 2.6.1, which is the one already documented in docs/installation.md. Aligning the requirements file to that version restores ``pip install`` and matches the documentation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure validator + module-level constants (VALID_KEY_TYPES, VALID_RSA_KEY_SIZES, VALID_ELLIPTIC_CURVES) used by the upcoming configurable-key-type/size feature. The validator accepts ``(None, None, None)`` to mean "caller did not pick anything yet" so subsequent layers can hand it untouched API payloads. RSA sizes capped at 2048/3072/4096 (1024 is insecure, 8192 buys nothing in practice and slows handshakes). ECDSA curves limited to secp256r1 and secp384r1 — secp521r1 is excluded because Let's Encrypt rejects it and most consumers (browsers, load balancers) only implement the first two. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new top-level keys in the settings template: default_key_type (rsa|ecdsa), default_key_size (2048|3072|4096) and default_elliptic_curve (secp256r1|secp384r1). They control the public- key shape applied to any cert that does not pick a per-domain override. The migration path lists them in essential_keys so an upgraded install silently picks up rsa/2048 — the implicit certbot default that CertMate emitted before this setting existed. No behaviour change unless the operator opts in. save_settings now validates the triple via validate_key_options before persisting. The check is run only on the active branch (RSA → key_size, ECDSA → elliptic_curve) so toggling one to the other in the UI does not require both to be wiped on every switch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build_certbot_command now accepts key_type, key_size and elliptic_curve kwargs (all default None for backwards compat) and appends the corresponding certbot flags right after the SAN -d flags. When a caller omits all three the command is byte-identical to the pre-feature output so existing call sites keep working unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
create_certificate accepts the three key kwargs and propagates them to build_certbot_command. When all three are None it falls back to the global defaults from settings — this lets legacy callers (web routes, scripts, scheduled renewal jobs that don't carry per-domain state) inherit the configured default for free without having to fetch it. The fallback path that runs when ca_manager isn't available also appends the key flags manually so a stale ca_manager can't silently downgrade certs to certbot's RSA-2048 default. Validation runs once inside create_certificate too, in addition to the API-layer validation, so renew_certificate (which can pass values read from disk) never feeds a contradictory shape into certbot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
create_cert_model gains key_type / key_size / elliptic_curve as optional fields with proper enum constraints, and settings_model gains the matching default_key_* triple so swagger reflects what the backend accepts. CreateCertificate.post extracts the three values, validates them up-front (returning 400 with the field-specific reason on failure instead of letting certbot fail later with a stack trace), passes them to create_certificate, and persists the operator's explicit overrides into the domain entry so the UI can roundtrip them. Inheritors (i.e. certs that did not pick anything per-cert) intentionally do NOT have their effective shape persisted in the entry — that lets a later change to the global default apply to those certs while leaving explicit overrides untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A new sub-block inside the existing "Advanced Options" panel of the Create Certificate form lets the operator pick a per-cert shape. The Key Type selector defaults to "Use global default" so the form sends no key fields by default and the backend inherits the configured default; picking RSA reveals the size selector, picking ECDSA reveals the curve selector. The two extra selectors are wired to a single toggleCertKeyOptions() helper that mirrors the show/hide pattern of the existing storage-backend selector. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Settings → General now exposes the three default_key_* selectors under Certificate Management Settings. The size and curve pickers are mutually exclusive (toggleDefaultKeyOptions hides whichever is inactive) so the form never posts a contradictory pair. Load and save flows wire the selectors into the existing populate/collect helpers and the values roundtrip cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
20 unit tests in tests/test_key_options.py:
- validate_key_options accept matrix (3 RSA sizes + 2 ECDSA curves +
the all-None "use defaults" case)
- validate_key_options rejection matrix (unknown type, RSA with bad
size, RSA missing size, RSA with curve, ECDSA with size, ECDSA
missing curve, ECDSA with unsupported curve, partial shape with
type=None)
- build_certbot_command emits --key-type / --rsa-key-size for RSA
- build_certbot_command emits --key-type / --elliptic-curve for ECDSA
- build_certbot_command with no key kwargs emits no key flags (the
backwards-compat regression guard)
- build_certbot_command with key_type='rsa' but key_size=None does
not emit a half-flag that would crash certbot
- SettingsManager.load_settings backfills rsa/2048/secp256r1 on a
legacy settings.json that has no default_key_* keys
- SettingsManager.save_settings rejects an inconsistent triple
(key_type=rsa with key_size=1024)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
architecture.md gets the three new settings keys in the Configuration Structure example plus a small table explaining when each applies and how per-domain overrides interact with the globals. RELEASE_NOTES.md gets an Unreleased entry covering the operator-facing surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
certbot keeps the per-cert config (key shape, DNS plugin, CA, etc.) in
``certificates/{domain}/renewal/<domain>.conf`` and reads it back during
``certbot renew --cert-name``. That works fine when the certificates/
directory lives on a persistent volume, but in Docker/K8s deployments
where the local filesystem is ephemeral and the cert PEMs are restored
from a remote storage backend on pod restart, the renewal config is
lost. The renewal then either fails outright (cert not found) or
silently downgrades the cert to certbot's RSA-2048 default — losing
the shape the operator configured.
Persist the cert-config inputs (key_type / key_size / elliptic_curve /
ca_provider / domain_alias) into metadata.json at create time so a
later ``renew_certificate`` can reconstruct the original config from
the storage backend even when the renewal/<domain>.conf is gone.
``store_certificate`` already syncs metadata.json to the configured
backend (AzureKeyVault, AWS, Vault, Infisical, …), so the persisted
shape survives ephemeral filesystems.
This commit only widens the metadata payload; the renewal path that
consumes the new fields ships in a follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
create_certificate now accepts ``force=False``. When True it skips the "already exists" guard and appends ``--force-renewal`` to the certbot command so the existing cert is replaced. This is the entry point that ``renew_certificate`` will use in the next commit when the certbot renewal/<domain>.conf is missing — typical Docker/K8s ephemeral- filesystem case where the PEMs were restored from a remote storage backend but the renewal conf was not. The flag is wired through all three command-build paths (ca_manager happy path, ca_manager fallback for older signatures, and the no- ca_manager branch) so a force renew works identically regardless of how the cert was originally created. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
renew_certificate now detects a missing
``certificates/{domain}/renewal/<domain>.conf`` and rebuilds the cert
from scratch using the persisted metadata.json instead of letting
``certbot renew`` fail or silently regenerate with the wrong shape.
This is the Docker/K8s ephemeral-filesystem story: containers/pods
without a PVC on certificates/ rehydrate the cert PEMs from the remote
storage backend on startup, but certbot's own renewal conf does not
travel through the storage backend — only the PEMs and metadata.json
do. A naïve ``certbot renew --cert-name`` either errors out ("No
matching certificates found") or, worse, regenerates the cert with
certbot's RSA-2048 default and loses the original key shape.
The new ``_renew_from_metadata`` helper:
- reads metadata.json (including the key_type / key_size /
elliptic_curve / SAN list / DNS plugin / CA persisted at create
time by the previous commit),
- falls back to the per-domain settings entry if metadata is gone,
- falls back to the global default key shape as a last resort,
- delegates to create_certificate(force=True), which already knows
how to bypass the "already exists" guard and pass --force-renewal.
The PVC-backed setup keeps using the cheap ``certbot renew`` fast
path; the rebuild only kicks in when the conf is truly missing.
The non-reentrant per-domain lock had to grow a ``lock_owned`` flag
because the rebuild path hands the lock over to create_certificate
(which acquires it again) — the finally must not double-release.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three scenarios exercise the new ephemeral-filesystem renewal path:
- renewal/<domain>.conf missing AND metadata.json present (typical
K8s pod restart with remote storage backend): rebuild from
metadata, key shape and DNS plugin preserved, force=True passed
through to create_certificate.
- renewal/<domain>.conf AND metadata.json both missing, but the
per-domain settings entry still has dns_provider/account/key
overrides: rebuild from settings, overrides honoured.
- filesystem fully wiped, no email anywhere: hard-fail with a
descriptive RuntimeError instead of silently doing the wrong
thing.
The first test caught a real bug while being written: the rebuild
path released the per-domain lock from inside _renew_from_metadata
while the caller's finally block still expected to own it, causing a
"release unlocked lock" RuntimeError on the happy path. Now the
caller releases the lock before delegating and tracks the handoff
with a flag.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both RELEASE_NOTES.md (operator-facing) and docs/architecture.md (reference) now describe the PVC-fast-path vs ephemeral-rebuild fallback so deployers know what to expect under Docker/K8s without persistent volumes on certificates/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous renew_certificate fix released the per-domain Lock before delegating to _renew_from_metadata so that create_certificate could reacquire it without deadlock. That release+reacquire window was a race: a concurrent delete_certificate(domain) on another thread could slip in, blow away cert.pem, and let the rebuild path then resurrect a cert the operator had just asked to remove. Switching the per-domain lock to threading.RLock makes it reentrant on the same thread, so renew_certificate can hold the lock continuously through _renew_from_metadata → create_certificate. Cross-thread exclusion is unchanged (RLock blocks other threads exactly like Lock does), so a concurrent delete still has to wait for the renew to finish — closing the race. The lock_owned flag and the manual release/reacquire dance disappear with the change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The metadata-driven renewal rebuild from the previous commit only fires when ``cert.pem`` is on disk but ``renewal/<domain>.conf`` is gone. In the actual Docker/K8s scenario the feature documented (pod on emptyDir with a remote storage backend as the source of truth) ALL local files are missing on restart — including cert.pem — so renew_certificate would still hard-fail with FileNotFoundError before the rebuild path was even reached. This commit closes that gap. ``CertificateManager.hydrate_from_storage`` walks ``settings.domains`` at startup, calls ``retrieve_certificate`` on the configured backend for every domain whose cert.pem is missing locally, and writes the four PEMs plus metadata.json into ``cert_dir/<domain>/``. The certbot ``renewal/<domain>.conf`` is intentionally NOT recreated — the storage backend doesn't carry it, and the metadata-driven rebuild path handles its absence on the first post-restart renew. Wired in ``modules/core/factory.py`` after the CertificateManager is constructed. Failures are logged but never raised: a single unreachable cert must not block app startup. The private key is chmod-ed to 0600 on restore so the rehydrated copy doesn't widen perms vs the original certbot-issued file. Tests cover: happy-path restore (with key shape persisted in metadata), local-copy-already-present skip, backend-has-no-record marker, partial failure isolation across domains, storage_manager=None no-op, and the privkey.pem 0600 chmod. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Original RELEASE_NOTES and architecture.md described a two-tier story
(certbot fast path vs metadata-driven rebuild) and claimed the rebuild
"covered Docker without a PVC". A reviewer correctly pointed out that
without a startup-hydration step the rebuild path could not actually
fire on a fresh volume — cert.pem itself is missing, so
``renew_certificate`` would die with FileNotFoundError before the
rebuild even ran. The previous commit added that hydration step; this
commit updates the docs to reflect the real three-layer design:
1. PVC fast path (certbot renew, conf present)
2. Startup rehydration (PEMs and metadata.json restored from the
remote storage backend; renewal conf intentionally not restored)
3. Metadata-driven rebuild on first post-restart renew
The "always preserves the historical shape" claim is also corrected:
there's a single edge case — fresh volume AND empty storage backend —
where the rebuild has to fall back to whatever per-domain overrides
survive in settings.domains and, last-resort, the current global
default. Documenting that honestly is cheaper than promising more
than the code can deliver.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "persist only explicit overrides" rule lived inline inside the
CreateCertificate.post body, which made it hard to pin via tests
without standing up the whole Flask-RESTX stack. Extracted to a pure
``build_domain_entry`` helper in modules/core/utils.py — the endpoint
delegates to it. Same behaviour, now testable.
Adds three test classes a previous reviewer flagged as missing:
- TestDomainEntryPersistence (T2): inheritor entries have no
key_* fields; RSA override stores type+size only; ECDSA
override stores type+curve only.
- TestEndpointPayloadValidation (T1): six negative cases that the
endpoint must reject with a 400 (rsa+1024, rsa+curve, ecdsa+size,
ecdsa+secp521r1, dsa, size-without-type), plus the all-None
happy-path that must NOT 400.
- TestCorruptMetadataInRebuildPath (T3): if metadata.json is
malformed, _renew_from_metadata logs and falls back to the
settings.domains entry instead of letting json.JSONDecodeError
propagate. The test loads a corrupt blob and asserts
create_certificate is still invoked with the per-domain
settings overrides.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Native Certificate-object storage mode for the Azure Key Vault backend (secrets / certificate / both), plus surface-independence in store and delete, sanitised error messages and 38 unit tests. PR fabriziosalmi#118 against upstream is open and waiting on review.
Generalises the --authenticator selector across all DNS plugins so a plugin with several --<plugin>-* options does not break with 'ambiguous option' (Azure case in upstream issue fabriziosalmi#113). Pins certbot-dns-azure to 2.6.1 since the previous 2.11.0 pin was a non-existent PyPI version. PR fabriziosalmi#119 against upstream is open and waiting on review.
Configurable cert key type/size (RSA 2048/3072/4096 or ECDSA secp256r1/secp384r1) with a global default plus per-cert override. Includes the Docker/K8s-aware renewal flow: startup hydration from the storage backend, metadata.json carries the original cert config, and renew_certificate rebuilds via certbot certonly --force-renewal when the renewal conf is gone. Held back from upstream PR pending local smoke tests. # Conflicts: # RELEASE_NOTES.md
…bot 2.x compat)
Previous commit pinned to 2.6.1 because that's what docs/installation.md
listed and what PyPI's "latest" pointed at. Caught at install time:
2.6.0 and 2.6.1 require ``certbot>=3.0,<4.0``, but the repo's main
``requirements.txt`` is on ``certbot==2.10.0``. ``pip install -r
requirements-azure.txt`` (or building the Docker image with
EXTRA_REQUIREMENTS=requirements-storage-all.txt) hit a
ResolutionImpossible:
certbot-dns-cloudflare 2.10.0 depends on certbot>=2.10.0
certbot-dns-azure 2.6.1 depends on certbot<4.0 and >=3.0
Verified via pypi.org/pypi/certbot-dns-azure/<v>/json that 2.5.0 is
the latest that still declares ``certbot<3.0,>=2.0`` — perfect for the
2.10.0 line. 2.4.0 / 2.3.0 / 2.2.0 (the version the issue reporter
fell back to manually) are all in the same compatibility band; 2.5.0
is the most recent of them.
Note: ``docs/installation.md`` keeps documenting 2.6.1 because that
section describes an alternative install with ``certbot==4.1.1``,
where 2.6.1 IS compatible. Only the default install path (which uses
the pinned ``certbot==2.10.0``) needed adjusting.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous merge pinned to 2.6.1, which requires certbot 3.x and broke pip install when combined with the repo's certbot==2.10.0. 2.5.0 is the latest in the certbot-dns-azure line that still declares certbot<3.0,>=2.0. PR fabriziosalmi#119 already updated.
The Dockerfile builder stage only ever copied requirements.txt and
requirements-minimal.txt, so building the image with REQUIREMENTS_FILE
pointed at a different file (e.g. requirements-storage-all.txt) failed
with "No such file or directory" — the file simply wasn't in the build
context inside the layer.
Two changes:
1. Copy every requirements*.txt into the builder. The COPY pattern
is cached on its own layer, so adding more files there does not
rebuild the heavy pip-install layer below it.
2. New build-arg EXTRA_REQUIREMENTS (default empty). When set, a
second pip install runs after the main one. Lets operators bake
in the optional remote storage backends (Azure Key Vault, AWS
Secrets Manager, HashiCorp Vault, Infisical) at image build
time without having to maintain a hand-merged "all-in-one"
requirements file.
docker-compose.yml exposes the new arg via ${EXTRA_REQUIREMENTS:-} so
``EXTRA_REQUIREMENTS=requirements-storage-all.txt docker compose
build`` Just Works. The default remains empty, preserving the slim
build for users that do not need the extra backends.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single-file EXTRA_REQUIREMENTS forced operators to choose between
storage backends and extra DNS plugins; bundling both into one image
required a hand-merged requirements file. The arg now accepts a
list — quote it on the command line so the shell preserves the
spaces:
--build-arg EXTRA_REQUIREMENTS="requirements-azure.txt requirements-storage-all.txt"
The RUN step iterates with a shell for-loop and runs ``pip install``
once per file. Layer caching is unaffected — the COPY of all
requirements*.txt above is the only thing that invalidates this layer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Azure DNS is one of the four "Major cloud providers" CertMate already
ships out of the box (Cloudflare / Route53 / DigitalOcean / Google).
Splitting it into the optional ``requirements-azure.txt`` produced
runtime errors for users that picked Azure DNS in the UI of a default
build:
Certificate creation failed: The certbot plugin 'dns-azure' is not
installed. Install it with: pip install certbot-dns-azure
Pinned to 2.5.0 — the latest of the certbot-dns-azure 2.x line that
still declares ``certbot<3.0,>=2.0`` and is therefore compatible with
the repo's ``certbot==2.10.0`` pin. Versions 2.6.0/2.6.1 jump to
certbot 3.x and would block ``pip install -r requirements.txt``.
``requirements-azure.txt`` is left in place for users who build the
slim image (``REQUIREMENTS_FILE=requirements-minimal.txt``) and want
Azure DNS layered on top.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous Azure DNS support shipped a credentials INI with keys that
certbot-dns-azure does not recognise — ``dns_azure_subscription_id``,
``dns_azure_resource_group``, ``dns_azure_client_id``,
``dns_azure_client_secret``. The plugin silently ignored them and
aborted at validation time with the misleading
/etc/letsencrypt/.../azure.ini: No authentication methods have
been configured for Azure DNS. Either configure a service
principal, system/user assigned managed identity or configure the
use of azure cli or workload identity credentials
Inspecting certbot_dns_azure._internal.dns_azure shows the real
schema:
dns_azure_tenant_id = ...
dns_azure_sp_client_id = ... (NOT 'client_id')
dns_azure_sp_client_secret = ... (NOT 'client_secret')
dns_azure_zoneN = <FQDN>:<RESOURCE_GROUP_RESOURCE_ID>
The zone mapping is required — without it the plugin raises "At least
one zone mapping needs to be provided". The mapping is built from the
cert's primary domain (with any leading ``*.`` stripped, since the
wildcard cert lives in the apex zone), pointing at
``/subscriptions/<SUB>/resourceGroups/<RG>``. Azure's plugin matches
the longest FQDN suffix at validation time, so a single mapping at
the apex zone covers every SAN that lives in it.
Plumbing changes to make ``domain`` reachable inside
``create_azure_config``:
- DNSProviderStrategy.create_config_file gains an optional
``domain: Optional[str] = None`` argument. The base ABC keeps the
default so non-Azure strategies do not have to thread the value
through; bulk-applied to all 17 concrete strategies for signature
consistency.
- certificates.py:574 propagates the cert's primary domain when it
invokes the strategy.
- create_azure_config raises ValueError when domain is missing —
surfaces a misuse early rather than letting certbot fail later
with a vague message.
Tests:
- test_writes_sp_keys_and_zone_mapping pins the exact INI shape and
the absence of the legacy keys.
- test_wildcard_domain_strips_leading_asterisk_in_zone_mapping
covers the wildcard case.
- test_missing_domain_raises pins the defensive check.
- The two existing AzureStrategy tests now pass ``domain=`` so they
exercise the same code path as production.
Closes the runtime half of issue fabriziosalmi#113 (the package-not-installed half
was already addressed by 4ea7269 + the certbot-dns-azure pin).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This is a substantive 1700-line feature with a careful test surface (38 unit tests, mocked SDK clients, backwards-compat migration, three modes, CRC-aware list filter fix in passing). I want to give it a proper review rather than a rushed one — the architecture decisions (composition pattern, surface-independent store/delete, the metadata allow-list, sanitised error messages) deserve careful reading. Realistic timeline: I'll target this as a v2.5.0 candidate (after the smaller in-flight items: #119 fix #113, #122 dns alias rebase, #120 *_FILE secrets, #126 download Quick first-pass comments (not blockers — putting them here so they're not lost):
Will do the full pass — checked the test file already exists in the diff and has good coverage of the helper class. Ping me if you want me to start the review now and just deferred merging, vs deferred everything. |
APScheduler jobs run in background threads where Flask's thread-local current_app proxy is unbound, causing RuntimeError when the jobs tried to access current_app.config. Fix by: - keeping a module-level _flask_app reference - setting app.config['MANAGERS'] so jobs can resolve managers - extracting a shared _run_manager_job() helper with logging and exception handling - ordering create_app() so _flask_app is ready before the scheduler starts (fixes race on recovered persistent jobs)
Synchronize with upstream fabriziosalmi/certmate main: - Domain alias mode (dns_alias_hook.py, CNAME chain verification) - v2.4.7 base image bump bookworm → trixie - SECRET_KEY_FILE / API_BEARER_TOKEN_FILE support - ?file= param on /api/certificates/<domain>/download - CertMate.html auto-escape helper + UI refactor - Security bumps (requests CVE, postcss 8.5.10) - CI workflow fix for fork-originated PRs Merge preserves our fork features: - Configurable certificate key type/size (RSA/ECDSA) - Azure Key Vault native Certificate-object support - APScheduler app_context fix (background jobs) - Hydrate-from-storage on startup - Metadata-driven renewal rebuild Resolved conflicts: - modules/core/certificates.py (domain alias + key type + Azure KV) - static/js/dashboard.js (alias indicator + CertMate.html) - tests/test_issue113_azure_dns_ambiguous.py - RELEASE_NOTES.md (combined unreleased + upstream versions)
- provide localhost fallback email when metadata.json lacks one to avoid RuntimeError in _renew_from_metadata during domain-alias renewal rebuild (merge artefact from upstream+ours code paths) - fix _manager test mock to include required email field
Address maintainer feedback from fabriziosalmi#118 review: 1. Stale read detection in 'both' mode: retrieve_certificate now compares updated_on timestamps of Secrets and Certificate surfaces, returning whichever is freshest. Prevents returning stale data when one surface succeeds but the other fails during renewal. 2. New _retrieve_from_secrets returns latest_update timestamp alongside cert_files and metadata so callers can compare. 3. New get_certificate_update_time() helper on the importer. 4. Tests: 2 new cases covering Certificate-newer-than-Secrets and Secrets-newer-than-Certificate divergence (40 total). 5. Release notes: stale read detection entry + separate CRC-aware regex fix entry (per maintainer request). 6. Updated existing 3 tests to mock updated_on with real datetimes.
|
Thanks for the thoughtful first-pass review! I've addressed all three points you raised: 1. Stale read detection in
|
|
Apologies — I just noticed that the Here's what leaked in and what each thing is: Leaked features (not part of this PR):
What belongs to this PR:
Plan: I'll clean this up — either by doing a proper rebase of just the Azure KV commits on top of upstream/main, or by opening a fresh PR with only the relevant changes. Let me know which you prefer and I'll get it sorted quickly. Really sorry for the noise in the diff. |
Full Review: Azure Key Vault Certificate-object StorageThanks for the thorough work on this — the architecture is genuinely impressive. The composition pattern, surface-independent writes, stale-read detection (addressing my earlier comment about version skew in 🔴 Blocker: PR is contaminatedAs you noted in your latest comment, the
Please open a fresh PR with only the Azure KV commits (cherry-pick or interactive rebase). The 3413-line diff is not reviewable in its current state — I need to see only the ~2200 lines that belong to this feature. ✅ Architecture positives (confirmed on code review)
🟡 Non-blocking suggestions for the clean PR
TimelineThis is a v2.5.0 candidate — once the PR is rebased clean, I'll do a final pass and merge. No rush, take your time getting the rebase right. |
|
@fabriziosalmi Opened the clean rebase as #139 — same 12 Azure-KV commits cherry-picked onto current I also addressed all three non-blocker suggestions from your review in #139:
Tests: 40/40 pass. Closing this one in favour of #139. |
Summary
Adds a configurable
storage_modeto the Azure Key Vault storage backend so it can persist certificates as nativeCertificateobjects (PKCS12,issuer_name=\"Unknown\") in addition to — or instead of — the existing per-PEM Secrets layout. Native Certificate objects are consumed directly by App Service, Application Gateway, Front Door, API Management and AKS Ingress without manual PFX export.storage_mode: secrets(default) — current per-PEM Secrets layout, fully backwards-compatible.storage_mode: certificate— single Certificate object, metadata stored as tags.storage_mode: both— write to both surfaces; reads prefer Secrets (cheaper).A new admin endpoint
POST /api/storage/azure-keyvault/backfill-certificatesimports Certificate objects for domains already stored as Secrets (skipping ones already imported); restricted tostorage_mode='both'so the legacy Secrets are still listed during the walk. Exposed as a "Backfill Certificate objects" button in Settings → Storage when the backend mode allows it.Implementation notes
AzureKeyVaultBackendkeeps auth/naming/retry; the new private_AzureKeyVaultCertificateImporterencapsulates Certificate-API specifics. Both Azure SDK clients are lazy so an install withoutazure-keyvault-certificatesonly fails when a Certificate-mode call is actually attempted.Environment,CostCenter, …) do not pollute the rehydrated metadata. Truncated SAN lists drop the (incomplete-by-construction) trailing fragment to avoid exposing a malformed FQDN.bothmodestore_certificateanddelete_certificaterun each surface under its own try/except. A Secrets-API outage no longer aborts the Certificate-object write (and vice versa); the method returns False on partial failure so callers can react._list_secret_domainsfilteredendswith('-metadata'), which never matched anything in production because_sanitize_secret_namealways appends an 8-char CRC32 suffix. The filter is now anchored as^cert-.+-metadata-[0-9a-f]{8}$. Without this,list_certificates()and the new backfill endpoint would have walked an empty list.storage_mode='secrets'for upgraded installs at load time so existing settings.json files keep working without manual edits.docs/architecture.md.secretsmode is unchanged;certificate/bothmodes additionally require CertificatesGet/List/Import/Deleteand keep SecretsGet/List(Key Vault exposes the imported PFX, including the private key, only via the Secret with the same name as the Certificate object).StorageBackendTestand the new backfill endpoint return only the exception type to the client; full Azure SDK detail (tenant ids, request ids, WWW-Authenticate headers) stays inlogger.error.Files changed
modules/core/storage_backends.py— new helper class, three modes, surface-independent store/delete, CRC-aware list filter.modules/core/settings.py— nested-default migration forazure_keyvault.storage_mode.modules/api/{models,resources}.py—storage_modefield on the model, new backfill endpoint, Certificate-API probe inStorageBackendTest.templates/partials/settings_storage.html+static/js/settings.js— storage-mode selector and Backfill button (visible only when active mode isboth).requirements-azure-storage.txt,requirements-storage-all.txt—azure-keyvault-certificates>=4.7.0.docs/architecture.md— three-mode table + Service Principal permissions matrix.RELEASE_NOTES.md— Unreleased entry.tests/test_azure_keyvault_certificate_storage.py— 38 new unit tests with mocked SDK clients.Test plan
pytest tests/test_azure_keyvault_certificate_storage.py -v— 38/38 pass (2 skip cleanly whenazure-keyvault-certificatesis not installed).pytest tests/ -m unit -v— full unit suite, 0 regressions.storage_mode, confirm it migrates tosecretssilently.storage_mode=both; press Test connection — both Secrets and Certificate API access verified.cert-…-{cert,chain,fullchain,privkey}-pem-{crc}AND acert-{domain}-{crc}Certificate object (with domain/dns_provider/staging tags) both appear.storage_mode=certificate; issue a new cert — only the Certificate object is created, no Secrets.GET /api/certificates/{domain}— returns the four PEMs reconstructed from the PFX export.DELETE /api/certificates/{domain}— both surfaces removed inbothmode.bothwith legacy Secrets-only domains, press Backfill Certificate objects — every legacy domain gets an imported Certificate object; already-imported ones are reported asskipped.Notes for reviewers
storage_mode='secrets'means existing installs see no behaviour change after upgrade.azure-keyvault-certificatesSDK is in the optional storage extras (requirements-azure-storage.txt); CI does not install it by default, so the two SDK-touching tests skip cleanly underpytest.importorskip._with_retrydecorator is currently shadowed by innertry/exceptblocks across all Azure KV methods (and other backends). This is pre-existing inmainand out of scope here, but worth a follow-up issue.🤖 Generated with Claude Code