Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .github/workflows/deploy-k8s.yml
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,15 @@ jobs:
kubectl apply -f infra/k8s/web-client/
kubectl apply -f infra/k8s/ingress.yaml
- name: Upsert Grafana secret
run: |
kubectl create secret generic grafana-secret -n monitoring \
--from-literal=admin-password="${{ secrets.GRAFANA_ADMIN_PASSWORD }}" \

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GRAFANA_ADMIN_PASSWORD is missing in the comment at the top of this file

--dry-run=client -o yaml | kubectl apply -f -
- name: Deploy monitoring
run: kubectl apply -f infra/k8s/monitoring/
Comment on lines +115 to +116

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could add a status check too / make sure that deployment worked (see "wait for rollouts" a few lines below)


- name: Restart deployments to pull latest images
if: github.event_name != 'push'
run: |
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ Azure deployment (Docker Compose): http://135.116.196.120/
Coverage reports: https://aet-devops26.github.io/team-devsecops/

API scheme (Swagger UI): https://devsecops.stud.k8s.aet.cit.tum.de/swagger-ui/index.html

Monitoring (Grafana): https://devsecops.stud.k8s.aet.cit.tum.de/grafana
## Local development

The full stack runs under Docker Compose with live-reload:
Expand Down
29 changes: 29 additions & 0 deletions infra/k8s/monitoring/grafana-ingress.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
# Exposes Grafana at https://devsecops.stud.k8s.aet.cit.tum.de/grafana/
# Grafana is configured with GF_SERVER_SERVE_FROM_SUB_PATH=true so it handles
# the /grafana/ prefix itself — no rewrite-target needed.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana-ingress
namespace: monitoring
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cert-manager.io/cluster-issuer: letsencrypt-prod
# no TLS certificate, ingress.yaml already requests one

& drop spec.tls

(otherwise, we need more Let's Encrypt calls, which are rate limited)

nginx.ingress.kubernetes.io/proxy-read-timeout: "60"
spec:
ingressClassName: nginx
tls:
- hosts:
- devsecops.stud.k8s.aet.cit.tum.de
secretName: grafana-tls-cert
rules:
- host: devsecops.stud.k8s.aet.cit.tum.de
http:
paths:
- path: /grafana
pathType: Prefix
backend:
service:
name: grafana
port:
number: 3000
92 changes: 92 additions & 0 deletions infra/k8s/monitoring/grafana.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
namespace: monitoring
data:
prometheus.yaml: |
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus.monitoring.svc.cluster.local:9090
isDefault: true
access: proxy
Comment on lines +4 to +15

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I get it correctly that we have a persistent datasource, but there are no graphs in it (except when they are created manually in the UI)? Or did you want to add them later? (course requirement is to have persistent dashboards)

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-data
namespace: monitoring
spec:
accessModes:
- ReadWriteOnce

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per default, k8s does a rolling upgrade, so during upgrade 2 instances are up at the same time. ReadWriteOnce means that the first instance holds a lock, so the second one waits -> deadlock. To not use rolling upgrade, set

spec:
    strategy:
        type: Recreate

in the Deployment spec. Same for Prometheus.

resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
securityContext:
fsGroup: 472
containers:
- name: grafana
image: grafana/grafana:11.6.1
env:
- name: GF_SECURITY_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
name: grafana-secret
key: admin-password
- name: GF_SERVER_ROOT_URL
value: https://devsecops.stud.k8s.aet.cit.tum.de/grafana/
- name: GF_SERVER_SERVE_FROM_SUB_PATH
value: "true"
ports:
- containerPort: 3000
volumeMounts:
- name: datasources
mountPath: /etc/grafana/provisioning/datasources
- name: data
mountPath: /var/lib/grafana
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
volumes:
- name: datasources
configMap:
name: grafana-datasources
- name: data
persistentVolumeClaim:
claimName: grafana-data
---
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
spec:
selector:
app: grafana
ports:
- name: http
port: 3000
targetPort: 3000
102 changes: 102 additions & 0 deletions infra/k8s/monitoring/prometheus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_configs:
- job_name: spring-api
metrics_path: /actuator/prometheus
static_configs:
- targets: ['spring-api.app.svc.cluster.local:8080']
labels:
service: spring-api

- job_name: py-help-service
static_configs:
- targets: ['py-help-service.app.svc.cluster.local:8080']
labels:
service: py-help-service

- job_name: py-recipe-service
static_configs:
- targets: ['py-recipe-service.app.svc.cluster.local:8080']
labels:
service: py-recipe-service
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-data
namespace: monitoring
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
securityContext:
fsGroup: 65534
containers:
- name: prometheus
image: prom/prometheus:v3.4.1
args:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.retention.time=7d
ports:
- containerPort: 9090
volumeMounts:
- name: config
mountPath: /etc/prometheus
- name: data
mountPath: /prometheus
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
volumes:
- name: config
configMap:
name: prometheus-config
- name: data
persistentVolumeClaim:
claimName: prometheus-data
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitoring
spec:
selector:
app: prometheus
ports:
- name: http
port: 9090
targetPort: 9090
2 changes: 2 additions & 0 deletions services/py-help-service/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from langchain.chat_models import init_chat_model
from langchain_core.messages import HumanMessage, SystemMessage

from prometheus_fastapi_instrumentator import Instrumentator
from client.cooking_assistant_gen_ai_services_api_internal_client.models.help_request_forwarded import (
HelpRequestForwarded,
)
Expand All @@ -21,6 +22,7 @@
load_dotenv()

app = FastAPI(title="Cooking Assistant GenAI Service")
Instrumentator().instrument(app).expose(app)


@app.exception_handler(HTTPException)
Expand Down
1 change: 1 addition & 0 deletions services/py-help-service/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ pydantic==2.7.4
python-dotenv==1.0.1
attrs==23.2.0
python-dateutil==2.9.0.post0
prometheus-fastapi-instrumentator==7.1.0
2 changes: 2 additions & 0 deletions services/py-recipe-service/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from langchain.chat_models import init_chat_model
from langchain_core.messages import HumanMessage, SystemMessage

from prometheus_fastapi_instrumentator import Instrumentator
from client.cooking_assistant_gen_ai_services_api_internal_client.models.recipe_request_forwarded import (
RecipeRequestForwarded,
)
Expand All @@ -20,6 +21,7 @@
load_dotenv()

app = FastAPI(title="Cooking Assistant GenAI Service")
Instrumentator().instrument(app).expose(app)


@app.exception_handler(HTTPException)
Expand Down
1 change: 1 addition & 0 deletions services/py-recipe-service/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ pydantic==2.7.4
python-dotenv==1.0.1
attrs==23.2.0
python-dateutil==2.9.0.post0
prometheus-fastapi-instrumentator==7.1.0
Loading