Description
I noticed the lack of docs/ directory in GRAB despite having it in the project structure. So I thought it would be appropriate to create a new issue about adding comprehensive documentation for the Prometheus metrics monitoring feature recently implemented in the main boilerplate repository.
The documentation should provide users with complete guidance on configuring, accessing, and utilizing the observability stack.
Related
Motivation
Users need detailed documentation to:
- Understand the monitoring architecture
- Configure metrics collection
- Access and use Prometheus/Grafana
- Write custom queries for their use cases
- Troubleshoot common issues
- Extend metrics for business needs
Scope
1. Monitoring Overview Page
Create a new documentation page covering:
2. Configuration Guide
Document configuration options:
YAML Configuration:
metrics:
enabled: true
path: /metrics
Environment Variables:
METRICS_ENABLED=true
METRICS_PATH=/metrics
Skip Paths Configuration:
- How to customize skip paths
- Why
/health and /metrics are excluded
- Adding custom skip paths
3. Prometheus Setup
Accessing Prometheus:
- URL:
http://localhost:9090
- Verifying target health
- Exploring available metrics
Prometheus Configuration:
- Explain prometheus.yml structure
- Scrape interval configuration
- Adding additional scrape targets
- Service discovery options
Example Queries:
Basic queries:
# Total requests
sum(http_requests_total)
# Request rate (per second)
rate(http_requests_total[5m])
# Requests by endpoint
sum by (path) (http_requests_total)
# Requests by status code
sum by (status) (http_requests_total)
Performance queries:
# 95th percentile latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# Average latency
rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])
# Slow endpoints (p99 > 1s)
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) > 1
Error tracking:
# Error rate (5xx responses)
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
# 4xx rate
sum(rate(http_requests_total{status=~"4.."}[5m])) / sum(rate(http_requests_total[5m]))
# Failed requests count
sum(http_requests_total{status=~"[45].."}))
Resource monitoring:
# Concurrent requests
http_requests_in_progress
# Request size average
rate(http_request_size_bytes_sum[5m]) / rate(http_request_size_bytes_count[5m])
# Response size average
rate(http_response_size_bytes_sum[5m]) / rate(http_response_size_bytes_count[5m])
4. Grafana Setup
Accessing Grafana:
- URL:
http://localhost:3000
- Default credentials:
admin/admin
- First-time setup guide
Adding Prometheus Data Source:
- Navigate to Configuration → Data Sources
- Click "Add data source"
- Select "Prometheus"
- Set URL:
http://prometheus:9090
- Click "Save & Test"
Creating Dashboards:
Pre-built panels to include:
- Request rate over time (graph)
- Latency percentiles (graph)
- Status code distribution (pie chart)
- Endpoint performance (table)
- Error rate (stat panel)
- Concurrent requests (gauge)
Dashboard JSON Export:
- Provide ready-to-import dashboard JSON
- Include panels for all key metrics
- Add alerts configuration examples
5. Architecture Diagram
Create visual diagrams showing:
System Architecture:
┌─────────────┐
│ Client │
└──────┬──────┘
│ HTTP Requests
▼
┌─────────────────────────┐
│ Go REST API │
│ (port 8080) │
│ │
│ ┌──────────────────┐ │
│ │ Metrics │ │
│ │ Middleware │ │
│ └────────┬─────────┘ │
│ │ │
│ /metrics endpoint ◄────┼─── Prometheus Scraping
│ │ │
└───────────┼─────────────┘
│
▼
┌───────────────┐
│ Prometheus │
│ (port 9090) │
│ │
│ - Storage │
│ - Querying │
│ - Alerting │
└───────┬───────┘
│
▼
┌───────────────┐
│ Grafana │
│ (port 3000) │
│ │
│ - Dashboards │
│ - Alerts │
└───────────────┘
Data Flow Diagram:
HTTP Request → Middleware → Metrics Collection → Prometheus Exposition Format → /metrics Endpoint
↓
Prometheus Scraper (every 15s) → Time Series Database → PromQL Queries ← Grafana Dashboards
Metrics Collection Flow:
1. Request arrives
2. Middleware increments in_progress gauge
3. Request processed
4. Middleware records:
- Duration (histogram)
- Status code (counter)
- Request size (histogram)
- Response size (histogram)
5. Middleware decrements in_progress gauge
6. Best Practices
Metrics Naming:
- Follow Prometheus naming conventions
- Use consistent label names
- Avoid high cardinality labels
Performance Considerations:
- Impact of metrics collection (~1-2ms overhead)
- When to disable metrics
- Cardinality management
Production Deployment:
- Persistent storage for Prometheus
- Retention policies
- Backup strategies
- Security considerations (authentication for Prometheus/Grafana)
Alerting Rules:
Example alerts to configure:
groups:
- name: api_alerts
rules:
- alert: HighErrorRate
expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
for: 5m
- alert: HighLatency
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
for: 5m
7. Troubleshooting Guide
Common issues and solutions:
Metrics endpoint returns 404:
- Check
metrics.enabled configuration
- Verify endpoint path matches
metrics.path
- Ensure middleware is registered
Prometheus can't scrape metrics:
- Check Docker network configuration
- Verify app container is healthy
- Check prometheus.yml target configuration
- Review Prometheus logs
No data in Grafana:
- Verify Prometheus data source connection
- Check time range in dashboard
- Ensure metrics are being collected
High memory usage:
- Review label cardinality
- Adjust retention policies
- Optimize scrape intervals
Windows line ending issues:
- Covered by .gitattributes
- Dockerfile
sed command explanation
- How to verify line endings
8. Extension Guide
Adding Custom Metrics:
Example: Track user registrations
var userRegistrations = promauto.NewCounter(prometheus.CounterOpts{
Name: "user_registrations_total",
Help: "Total number of user registrations",
})
// In handler
userRegistrations.Inc()
Adding Database Metrics:
var dbConnections = promauto.NewGauge(prometheus.GaugeOpts{
Name: "db_connections_active",
Help: "Number of active database connections",
})
Business Metrics Examples:
- Active users
- API key usage
- Rate limit hits
- JWT token operations
9. Integration Examples
Kubernetes Deployment:
- ServiceMonitor configuration
- Pod annotations
- Ingress for Prometheus/Grafana
Cloud Deployments:
- AWS (CloudWatch integration)
- GCP (Cloud Monitoring)
- Azure (Application Insights)
CI/CD Integration:
- Metrics validation in tests
- Performance regression detection
- Alert rule validation
Deliverables
Acceptance Criteria
Additional Context
The metrics implementation includes:
- 5 core HTTP metrics (counter, histograms, gauge)
- Configurable via YAML and environment variables
- Docker compose setup with Prometheus and Grafana
- Skip paths to prevent infinite loops
- Path normalization for cardinality control
Target audience:
- Developers implementing the boilerplate
- DevOps engineers deploying to production
- Users wanting to extend metrics
Priority
Medium - Documentation should follow shortly after implementation to ensure users can effectively utilize the new feature.
Description
I noticed the lack of
docs/directory in GRAB despite having it in the project structure. So I thought it would be appropriate to create a new issue about adding comprehensive documentation for the Prometheus metrics monitoring feature recently implemented in the main boilerplate repository.The documentation should provide users with complete guidance on configuring, accessing, and utilizing the observability stack.
Related
Motivation
Users need detailed documentation to:
Scope
1. Monitoring Overview Page
Create a new documentation page covering:
What is Metrics Monitoring?
Architecture Overview
Available Metrics
http_requests_total- Request counterhttp_request_duration_seconds- Latency histogramhttp_requests_in_progress- Concurrent requests gaugehttp_request_size_bytes- Request payload sizehttp_response_size_bytes- Response payload size2. Configuration Guide
Document configuration options:
YAML Configuration:
Environment Variables:
Skip Paths Configuration:
/healthand/metricsare excluded3. Prometheus Setup
Accessing Prometheus:
http://localhost:9090Prometheus Configuration:
Example Queries:
Basic queries:
Performance queries:
Error tracking:
Resource monitoring:
4. Grafana Setup
Accessing Grafana:
http://localhost:3000admin/adminAdding Prometheus Data Source:
http://prometheus:9090Creating Dashboards:
Pre-built panels to include:
Dashboard JSON Export:
5. Architecture Diagram
Create visual diagrams showing:
System Architecture:
Data Flow Diagram:
Metrics Collection Flow:
6. Best Practices
Metrics Naming:
Performance Considerations:
Production Deployment:
Alerting Rules:
Example alerts to configure:
7. Troubleshooting Guide
Common issues and solutions:
Metrics endpoint returns 404:
metrics.enabledconfigurationmetrics.pathPrometheus can't scrape metrics:
No data in Grafana:
High memory usage:
Windows line ending issues:
sedcommand explanation8. Extension Guide
Adding Custom Metrics:
Example: Track user registrations
Adding Database Metrics:
Business Metrics Examples:
9. Integration Examples
Kubernetes Deployment:
Cloud Deployments:
CI/CD Integration:
Deliverables
docs/MONITORING_OVERVIEW.mddocs/MONITORING_CONFIGURATION.mddocs/MONITORING_PROMETHEUS.mddocs/MONITORING_GRAFANA.mddocs/MONITORING_TROUBLESHOOTING.mddocs/MONITORING_EXTENDING_METRICS.mdAcceptance Criteria
Additional Context
The metrics implementation includes:
Target audience:
Priority
Medium - Documentation should follow shortly after implementation to ensure users can effectively utilize the new feature.