Operator observability: metrics, Events, richer status#37
Merged
Conversation
Make the Elpio operator observable from outside the cluster.
Metrics: new src/elpio/operator/metrics.py exposes a Prometheus surface
(elpio_reconcile_total{kind,result}, elpio_reconcile_duration_seconds{kind},
elpio_services_ready) built on prometheus_client. record_reconcile() is pure
with respect to the cluster so it unit-tests against the in-process registry.
start_metrics_server() is opt-in via ELPIO_METRICS=1 and binds an exporter on
ELPIO_METRICS_PORT (default 9095). The HTTP server import is lazy so importing
the module never opens a socket.
Events: the ElpioService and ElpioFunction reconcilers now emit Kubernetes
Events at the key transitions via kopf.event / kopf.warn (Reconciled,
BuildStarted, Ready/Progressing, ReconcileFailed, BuildFailed). The shared
apply_all/conditions helpers in common.py are untouched.
Wiring: an @kopf.on.startup hook starts the exporter behind the flag, and the
service reconciler records a success/error count plus duration around its work.
Tests: test_operator_metrics.py covers the counter/histogram/gauge increments,
the env helpers, and that the server is a no-op when disabled.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Makes the Elpio operator observable (issue #43): a Prometheus metrics surface, Kubernetes Events at reconcile transitions, and the wiring to turn the exporter on.
Metrics
New
src/elpio/operator/metrics.pyexposing:elpio_reconcile_total{kind,result}counterelpio_reconcile_duration_seconds{kind}histogramelpio_services_readygaugerecord_reconcile(kind, result, seconds)only touches the in-process registry, so it is unit-tested without a running server.start_metrics_server()is opt-in viaELPIO_METRICS=1and binds onELPIO_METRICS_PORT(default 9095). The HTTP server import is lazy, so importing the module never opens a socket.prometheus-clientis added to the core dependencies.Events
The
ElpioServiceandElpioFunctionreconcilers emit Events at the key transitions viakopf.event/kopf.warn:Reconciled,Ready/Progressing,ReconcileFailedfor services, plusBuildStarted,Reconciled,BuildFailedfor functions. The sharedapply_alland conditions helpers incommon.pyare unchanged, and handler return values / status writes keep their existing shape.Wiring
An
@kopf.on.startuphook starts the exporter behind the flag. The service reconciler times its work and records a success or error count with the duration.Tests
tests/unit/test_operator_metrics.pycovers the counter / histogram / gauge increments, label separation by kind and result, the env helpers, and that the server is a no-op when disabled. Full suite: 165 passed, ruff clean.