This project contains a Docker Compose based observability stack with Prometheus and Grafana.
docker compose up -dPrometheus: http://localhost:9090
Sample App: http://localhost:8080
Node Exporter: http://localhost:9100/metrics
Grafana: http://localhost:3000
Dashboards:
- Prometheus Overview
- App Service Metrics
- SRE Golden Signals
- Availability Multiples
Default Grafana login:
- User:
admin - Password:
admin
docker compose downTo remove persisted Prometheus and Grafana data:
docker compose down -vEdit prometheus/prometheus.yml and add another entry under scrape_configs.
For a service running directly on the host, Docker Desktop can usually reach it through host.docker.internal.
curl http://localhost:8080/
curl http://localhost:8080/health
curl "http://localhost:8080/work?delayMs=250"
curl http://localhost:8080/error
curl http://localhost:8080/metricsFor dashboard panels that use rate(), generate traffic for at least 30 seconds:
for i in $(seq 1 30); do
curl -s "http://localhost:8080/work?delayMs=150" >/dev/null
if [ $((i % 10)) -eq 0 ]; then curl -s http://localhost:8080/error >/dev/null; fi
sleep 1
donePrometheus scrapes the app through the app:8080 target. Useful queries:
rate(http_requests_total{job="app"}[5m])
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{job="app"}[5m])) by (le, route))
sum(rate(http_requests_total{job="app",status_code=~"5.."}[5m])) / clamp_min(sum(rate(http_requests_total{job="app"}[5m])), 0.001)
nodejs_eventloop_lag_p99_seconds{job="app"}
The sample app exposes lab-only metrics for practicing critical-situation detection:
lab_capacity_limit_rps{job="app"}
lab_node_count{job="app"}
The Availability Multiples dashboard uses these queries:
# 최대가용배수 = 한계 사용량 / 현재 사용량
max(lab_capacity_limit_rps{job="app"}) / clamp_min(sum(rate(http_requests_total{job="app"}[5m])), 0.001)
# 부하증가배수 = n / (n - 1)
max(lab_node_count{job="app"}) / (max(lab_node_count{job="app"}) - 1)
# 임계상황 = 부하증가배수 > 최대가용배수
(max(lab_node_count{job="app"}) / (max(lab_node_count{job="app"}) - 1)) > bool (max(lab_capacity_limit_rps{job="app"}) / clamp_min(sum(rate(http_requests_total{job="app"}[5m])), 0.001))
Generate enough traffic to trigger the critical-situation alert:
for i in $(seq 1 120); do
curl -s "http://localhost:8080/work?delayMs=50" >/dev/null &
curl -s "http://localhost:8080/work?delayMs=50" >/dev/null &
sleep 0.2
done
wait