Skip to content

Operator observability: metrics, Events, richer status#37

Merged
joy-software merged 1 commit into
mainfrom
feat/43-observability
Jun 7, 2026
Merged

Operator observability: metrics, Events, richer status#37
joy-software merged 1 commit into
mainfrom
feat/43-observability

Conversation

@joy-software

Copy link
Copy Markdown
Contributor

What

Makes the Elpio operator observable (issue #43): a Prometheus metrics surface, Kubernetes Events at reconcile transitions, and the wiring to turn the exporter on.

Metrics

New src/elpio/operator/metrics.py exposing:

  • elpio_reconcile_total{kind,result} counter
  • elpio_reconcile_duration_seconds{kind} histogram
  • elpio_services_ready gauge

record_reconcile(kind, result, seconds) only touches the in-process registry, so it is unit-tested without a running server. start_metrics_server() is opt-in via ELPIO_METRICS=1 and binds on ELPIO_METRICS_PORT (default 9095). The HTTP server import is lazy, so importing the module never opens a socket. prometheus-client is added to the core dependencies.

Events

The ElpioService and ElpioFunction reconcilers emit Events at the key transitions via kopf.event / kopf.warn: Reconciled, Ready / Progressing, ReconcileFailed for services, plus BuildStarted, Reconciled, BuildFailed for functions. The shared apply_all and conditions helpers in common.py are unchanged, and handler return values / status writes keep their existing shape.

Wiring

An @kopf.on.startup hook starts the exporter behind the flag. The service reconciler times its work and records a success or error count with the duration.

Tests

tests/unit/test_operator_metrics.py covers the counter / histogram / gauge increments, label separation by kind and result, the env helpers, and that the server is a no-op when disabled. Full suite: 165 passed, ruff clean.

Make the Elpio operator observable from outside the cluster.

Metrics: new src/elpio/operator/metrics.py exposes a Prometheus surface
(elpio_reconcile_total{kind,result}, elpio_reconcile_duration_seconds{kind},
elpio_services_ready) built on prometheus_client. record_reconcile() is pure
with respect to the cluster so it unit-tests against the in-process registry.
start_metrics_server() is opt-in via ELPIO_METRICS=1 and binds an exporter on
ELPIO_METRICS_PORT (default 9095). The HTTP server import is lazy so importing
the module never opens a socket.

Events: the ElpioService and ElpioFunction reconcilers now emit Kubernetes
Events at the key transitions via kopf.event / kopf.warn (Reconciled,
BuildStarted, Ready/Progressing, ReconcileFailed, BuildFailed). The shared
apply_all/conditions helpers in common.py are untouched.

Wiring: an @kopf.on.startup hook starts the exporter behind the flag, and the
service reconciler records a success/error count plus duration around its work.

Tests: test_operator_metrics.py covers the counter/histogram/gauge increments,
the env helpers, and that the server is a no-op when disabled.
@joy-software joy-software merged commit c828faf into main Jun 7, 2026
7 of 8 checks passed
@joy-software joy-software deleted the feat/43-observability branch June 7, 2026 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant