Skip to content

Expose angie ACME state and reuse cert expiry#13

Merged
sergeyfast merged 1 commit into
masterfrom
feat/angie-acme-collector
May 13, 2026
Merged

Expose angie ACME state and reuse cert expiry#13
sergeyfast merged 1 commit into
masterfrom
feat/angie-acme-collector

Conversation

@sergeyfast
Copy link
Copy Markdown
Contributor

Summary

Adds visibility into angie's built-in ACME client (1.5+) without inventing a separate expiry metric — the existing topsrv_ssl_certificate_expiry_seconds series picks up ACME-managed PEMs the same way it picks up static ones.

New: angie.ACMECollector

  • Polls /status/http/acme_clients/?date=epoch on every scrape — same angie process as APICollector, derived from StatusURL.
  • Emits two gauges:
    • topsrv_angie_acme_state{name, state, certificate}value=1 per active acme_client tuple. state ∈ {ready, requesting, disabled, failed}; certificate ∈ {valid, expired, missing, mismatch, error}.
    • topsrv_angie_acme_next_run_seconds{name} — Unix timestamp of next scheduled action (omitted when state=disabled/requesting).
  • 404-silent on hosts without acme_client configured — no log spam.
  • URL derivation guards against a misconfigured StatusURL that already points at the ACME endpoint (no double /http/acme_clients/).

Reuse: ACME PEM auto-discovery in nginx.DiscoverConfig

  • Parses acme_client <name> directives from angie.conf (line-by-line, skips # acme_client comments).
  • Adds <defaultACMEStatePath>/<name>/certificate.pem to SSLCertificates for the existing nginx.SSLCollector — ACME-cert expiry flows through topsrv_ssl_certificate_expiry_seconds automatically. No new expiry metric, no dashboard split.
  • Defense in depth: rejects names containing /, \, or .. (path-traversal into filepath.Join).
  • Diagnostic: app logs "angie: ACME certs missing at default state path" when the config has more acme_client directives than certs found on disk — hints at a custom acme_client_path build option we don't auto-discover.

Docs reshuffle

  • docs/metrics.md keeps the metric reference clean (no inline PromQL anymore).
  • New docs/promql-recipes.md holds the recipe blocks (Netstat listening ports, Netstat connection scope, Angie ACME).

Verification

  • Real angie 1.x JSON shape captured against a production host: {state, certificate, details, next_run} map keyed by client name; date=epoch returns plain Unix int.
  • Real angie filesystem layout: /var/lib/angie/acme/<client_name>/certificate.pem (plus private.key, account.key).
  • Endpoint corrected from initially guessed /api/http/acme_clients/ to the documented /status/http/acme_clients/.

Test plan

  • make fmt lint test — 0 lint issues, all unit tests green
  • make test-integration — full docker-compose stack passes
  • New TestIntegrationACMECollectorSurvivesNoACME — against real angie:minimal in docker-compose: 404 path is silent, no series emitted, no panic
  • Unit coverage: collector happy path (with angie 1.x response shape), 404-silent, next_run omission for state=requesting, URL derivation (trailing slash / no slash / IPv6 / named host / double-suffix guard), malformed URL → constructor error
  • Discovery coverage: real-layout PEM pickup, missing-file skip, dedup against pinned certs, commented-line skip, traversal-name reject
  • Smoke-check on a production angie host after deploy — confirm topsrv_angie_acme_state shows up with state="ready", certificate="valid" and that topsrv_ssl_certificate_expiry_seconds carries the ACME certificate.pem paths

Out of scope / follow-ups

  • Custom acme_client_path parsing — operators on non-default builds get a warning log instead of silent expiry coverage. Adding directive parsing is a small follow-up if it shows up in practice.
  • Active TLS probe collector — would close the gap for both custom ACME paths and externally-managed certs, but it's a new code path, separate PR.

- New angie ACMECollector polls /status/http/acme_clients/?date=epoch
  and emits topsrv_angie_acme_state{name,state,certificate} plus
  topsrv_angie_acme_next_run_seconds, so dashboards can alert on
  certificate!=valid, state=error, or stuck renewal. 404-silent on
  hosts without acme_client configured
- Extend nginx.DiscoverConfig to parse acme_client directives and
  pick up <state_path>/<name>/certificate.pem from angie's default
  /var/lib/angie/acme location; the file lands in the existing
  topsrv_ssl_certificate_expiry_seconds metric so ACME-cert expiry
  reuses the same series as static certs — no new metric, no
  dashboard split
- Skip commented `# acme_client` lines and reject names with path
  separators or `..` components — defense in depth around the
  filepath.Join into defaultACMEStatePath
- Guard NewACMECollector URL derivation against a misconfigured
  statusURL that already points at the ACME endpoint (no double
  /http/acme_clients/ suffix)
- Surface "angie: ACME certs missing at default state path" at
  startup when the config has more acme_client directives than
  certs on disk — operators on custom acme_client_path builds get
  a hint instead of silent expiry gaps
- Extract PromQL recipe blocks from docs/metrics.md into a new
  docs/promql-recipes.md so the metric reference stays a reference
- 10 tests: collector (happy path with angie 1.x response shape,
  404-silent, next_run omission, URL derivation incl double-suffix
  guard, malformed URL) and discovery (real-layout pickup, missing
  file skip, dedup, commented-line skip, traversal-name reject)
@sergeyfast sergeyfast merged commit 5e719e4 into master May 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant