Skip to content

Monitoring - metrics#117

Open
imol-ai wants to merge 3 commits into
mainfrom
feat/monitoring-metrics
Open

Monitoring - metrics#117
imol-ai wants to merge 3 commits into
mainfrom
feat/monitoring-metrics

Conversation

@imol-ai

@imol-ai imol-ai commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds monitoring for metrics on the python and spring services, collects them via prometheus, and displays them in grafana.

Closes #119

Type of change

  • Bug fix
  • New feature
  • Refactor / cleanup
  • Infrastructure / CI
  • Documentation

API changes

  • This PR does not affect the API
  • This PR changes the API → api/openapi.yaml updated and api/scripts/gen-all.sh re-run

Definition of Done

  • CI passes
  • Pre-commit hooks pass locally
  • Relevant tests added or updated

@imol-ai imol-ai force-pushed the feat/monitoring-metrics branch from c4377fb to 04fcd80 Compare June 26, 2026 11:29
@imol-ai imol-ai requested review from jschoedl and paulwiese June 26, 2026 13:32
@imol-ai imol-ai changed the title Feat/monitoring metrics Monitoring - metrics Jun 26, 2026

@jschoedl jschoedl left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also setup Prometheus Operator, right? but this could be a later PR

namespace: monitoring
spec:
accessModes:
- ReadWriteOnce

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per default, k8s does a rolling upgrade, so during upgrade 2 instances are up at the same time. ReadWriteOnce means that the first instance holds a lock, so the second one waits -> deadlock. To not use rolling upgrade, set

spec:
    strategy:
        type: Recreate

in the Deployment spec. Same for Prometheus.

- name: Upsert Grafana secret
run: |
kubectl create secret generic grafana-secret -n monitoring \
--from-literal=admin-password="${{ secrets.GRAFANA_ADMIN_PASSWORD }}" \

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GRAFANA_ADMIN_PASSWORD is missing in the comment at the top of this file

name: grafana-ingress
namespace: monitoring
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cert-manager.io/cluster-issuer: letsencrypt-prod
# no TLS certificate, ingress.yaml already requests one

& drop spec.tls

(otherwise, we need more Let's Encrypt calls, which are rate limited)

Comment on lines +115 to +116
- name: Deploy monitoring
run: kubectl apply -f infra/k8s/monitoring/

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could add a status check too / make sure that deployment worked (see "wait for rollouts" a few lines below)

Comment on lines +4 to +15
metadata:
name: grafana-datasources
namespace: monitoring
data:
prometheus.yaml: |
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus.monitoring.svc.cluster.local:9090
isDefault: true
access: proxy

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I get it correctly that we have a persistent datasource, but there are no graphs in it (except when they are created manually in the UI)? Or did you want to add them later? (course requirement is to have persistent dashboards)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Monitoring - metrics

2 participants