Skip to content

Add Prometheus Metrics Monitoring Documentation #9

@pizofreude

Description

@pizofreude

Description

I noticed the lack of docs/ directory in GRAB despite having it in the project structure. So I thought it would be appropriate to create a new issue about adding comprehensive documentation for the Prometheus metrics monitoring feature recently implemented in the main boilerplate repository.

The documentation should provide users with complete guidance on configuring, accessing, and utilizing the observability stack.

Related

Motivation

Users need detailed documentation to:

  • Understand the monitoring architecture
  • Configure metrics collection
  • Access and use Prometheus/Grafana
  • Write custom queries for their use cases
  • Troubleshoot common issues
  • Extend metrics for business needs

Scope

1. Monitoring Overview Page

Create a new documentation page covering:

  • What is Metrics Monitoring?

    • Benefits of observability
    • When to use metrics vs logs
    • Production readiness considerations
  • Architecture Overview

    • Component diagram (API → Prometheus → Grafana)
    • Data flow visualization
    • Port mappings and network layout
  • Available Metrics

    • http_requests_total - Request counter
    • http_request_duration_seconds - Latency histogram
    • http_requests_in_progress - Concurrent requests gauge
    • http_request_size_bytes - Request payload size
    • http_response_size_bytes - Response payload size

2. Configuration Guide

Document configuration options:

YAML Configuration:

metrics:
  enabled: true
  path: /metrics

Environment Variables:

METRICS_ENABLED=true
METRICS_PATH=/metrics

Skip Paths Configuration:

  • How to customize skip paths
  • Why /health and /metrics are excluded
  • Adding custom skip paths

3. Prometheus Setup

Accessing Prometheus:

  • URL: http://localhost:9090
  • Verifying target health
  • Exploring available metrics

Prometheus Configuration:

  • Explain prometheus.yml structure
  • Scrape interval configuration
  • Adding additional scrape targets
  • Service discovery options

Example Queries:

Basic queries:

# Total requests
sum(http_requests_total)

# Request rate (per second)
rate(http_requests_total[5m])

# Requests by endpoint
sum by (path) (http_requests_total)

# Requests by status code
sum by (status) (http_requests_total)

Performance queries:

# 95th percentile latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Average latency
rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])

# Slow endpoints (p99 > 1s)
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) > 1

Error tracking:

# Error rate (5xx responses)
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))

# 4xx rate
sum(rate(http_requests_total{status=~"4.."}[5m])) / sum(rate(http_requests_total[5m]))

# Failed requests count
sum(http_requests_total{status=~"[45].."}))

Resource monitoring:

# Concurrent requests
http_requests_in_progress

# Request size average
rate(http_request_size_bytes_sum[5m]) / rate(http_request_size_bytes_count[5m])

# Response size average
rate(http_response_size_bytes_sum[5m]) / rate(http_response_size_bytes_count[5m])

4. Grafana Setup

Accessing Grafana:

  • URL: http://localhost:3000
  • Default credentials: admin/admin
  • First-time setup guide

Adding Prometheus Data Source:

  1. Navigate to Configuration → Data Sources
  2. Click "Add data source"
  3. Select "Prometheus"
  4. Set URL: http://prometheus:9090
  5. Click "Save & Test"

Creating Dashboards:

Pre-built panels to include:

  • Request rate over time (graph)
  • Latency percentiles (graph)
  • Status code distribution (pie chart)
  • Endpoint performance (table)
  • Error rate (stat panel)
  • Concurrent requests (gauge)

Dashboard JSON Export:

  • Provide ready-to-import dashboard JSON
  • Include panels for all key metrics
  • Add alerts configuration examples

5. Architecture Diagram

Create visual diagrams showing:

System Architecture:

┌─────────────┐
│   Client    │
└──────┬──────┘
       │ HTTP Requests
       ▼
┌─────────────────────────┐
│   Go REST API           │
│   (port 8080)           │
│                         │
│  ┌──────────────────┐   │
│  │ Metrics          │   │
│  │ Middleware       │   │
│  └────────┬─────────┘   │
│           │             │
│  /metrics endpoint ◄────┼─── Prometheus Scraping
│           │             │
└───────────┼─────────────┘
            │
            ▼
    ┌───────────────┐
    │  Prometheus   │
    │  (port 9090)  │
    │               │
    │  - Storage    │
    │  - Querying   │
    │  - Alerting   │
    └───────┬───────┘
            │
            ▼
    ┌───────────────┐
    │   Grafana     │
    │  (port 3000)  │
    │               │
    │  - Dashboards │
    │  - Alerts     │
    └───────────────┘

Data Flow Diagram:

HTTP Request → Middleware → Metrics Collection → Prometheus Exposition Format → /metrics Endpoint
                                                                                        ↓
Prometheus Scraper (every 15s) → Time Series Database → PromQL Queries ← Grafana Dashboards

Metrics Collection Flow:

1. Request arrives
2. Middleware increments in_progress gauge
3. Request processed
4. Middleware records:
   - Duration (histogram)
   - Status code (counter)
   - Request size (histogram)
   - Response size (histogram)
5. Middleware decrements in_progress gauge

6. Best Practices

Metrics Naming:

  • Follow Prometheus naming conventions
  • Use consistent label names
  • Avoid high cardinality labels

Performance Considerations:

  • Impact of metrics collection (~1-2ms overhead)
  • When to disable metrics
  • Cardinality management

Production Deployment:

  • Persistent storage for Prometheus
  • Retention policies
  • Backup strategies
  • Security considerations (authentication for Prometheus/Grafana)

Alerting Rules:
Example alerts to configure:

groups:
  - name: api_alerts
    rules:
      - alert: HighErrorRate
        expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
        for: 5m
        
      - alert: HighLatency
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
        for: 5m

7. Troubleshooting Guide

Common issues and solutions:

Metrics endpoint returns 404:

  • Check metrics.enabled configuration
  • Verify endpoint path matches metrics.path
  • Ensure middleware is registered

Prometheus can't scrape metrics:

  • Check Docker network configuration
  • Verify app container is healthy
  • Check prometheus.yml target configuration
  • Review Prometheus logs

No data in Grafana:

  • Verify Prometheus data source connection
  • Check time range in dashboard
  • Ensure metrics are being collected

High memory usage:

  • Review label cardinality
  • Adjust retention policies
  • Optimize scrape intervals

Windows line ending issues:

  • Covered by .gitattributes
  • Dockerfile sed command explanation
  • How to verify line endings

8. Extension Guide

Adding Custom Metrics:

Example: Track user registrations

var userRegistrations = promauto.NewCounter(prometheus.CounterOpts{
    Name: "user_registrations_total",
    Help: "Total number of user registrations",
})

// In handler
userRegistrations.Inc()

Adding Database Metrics:

var dbConnections = promauto.NewGauge(prometheus.GaugeOpts{
    Name: "db_connections_active",
    Help: "Number of active database connections",
})

Business Metrics Examples:

  • Active users
  • API key usage
  • Rate limit hits
  • JWT token operations

9. Integration Examples

Kubernetes Deployment:

  • ServiceMonitor configuration
  • Pod annotations
  • Ingress for Prometheus/Grafana

Cloud Deployments:

  • AWS (CloudWatch integration)
  • GCP (Cloud Monitoring)
  • Azure (Application Insights)

CI/CD Integration:

  • Metrics validation in tests
  • Performance regression detection
  • Alert rule validation

Deliverables

  • New documentation page: docs/MONITORING_OVERVIEW.md
  • Configuration guide: docs/MONITORING_CONFIGURATION.md
  • Prometheus guide: docs/MONITORING_PROMETHEUS.md
  • Grafana guide: docs/MONITORING_GRAFANA.md
  • Architecture diagrams (SVG/PNG format)
  • Troubleshooting guide: docs/MONITORING_TROUBLESHOOTING.md
  • Extension guide: docs/MONITORING_EXTENDING_METRICS.md
  • Dashboard JSON export file
  • Update main documentation index/navigation

Acceptance Criteria

  • All sections listed above are documented
  • Architecture diagrams are clear and accurate
  • Example queries are tested and working
  • Grafana dashboard is provided and importable
  • Troubleshooting covers common issues
  • Code examples are syntax-highlighted and correct
  • Screenshots/diagrams enhance understanding
  • Documentation follows existing style guide
  • Internal links between pages work correctly
  • Documentation is searchable

Additional Context

The metrics implementation includes:

  • 5 core HTTP metrics (counter, histograms, gauge)
  • Configurable via YAML and environment variables
  • Docker compose setup with Prometheus and Grafana
  • Skip paths to prevent infinite loops
  • Path normalization for cardinality control

Target audience:

  • Developers implementing the boilerplate
  • DevOps engineers deploying to production
  • Users wanting to extend metrics

Priority

Medium - Documentation should follow shortly after implementation to ensure users can effectively utilize the new feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions