feat: implement Prometheus monitoring, Grafana dashboards, and Telegram alerting#79
Open
memreo wants to merge 10 commits into
Open
feat: implement Prometheus monitoring, Grafana dashboards, and Telegram alerting#79memreo wants to merge 10 commits into
memreo wants to merge 10 commits into
Conversation
… code readability
…variables to inline definitions
…add resource monitoring panels to DevPulse dashboard
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements the complete Prometheus & Grafana monitoring and Telegram alerting stack for the DevPulse application, fully optimized for local Docker Compose development and ready for Kubernetes deployment (Rancher & Azure cloud).
Key Features Implemented:
devpulse_dashboard.json,devpulse_resources_dashboard.json,azure_sizing_dashboard.json) loaded automatically at startup.container_memory_working_set_bytesandcontainer_cpu_usage_seconds_total) from thesetopsnamespace, ensuring database (postgres) and message broker (rabbitmq) metrics are tracked accurately.setopsnamespace, routed through custom HTML templates to a Telegram chat point.Visual Evidence:
1. System Monitoring & Client Log Ingestion
Split-screen displaying log ingestion throughput/latency metrics on Grafana (left) alongside the DevPulse React UI displaying AI insights (right):

2. Azure VM Sizing & Capacity Planning
Dynamic Azure VM size recommender calculating resource budgets by aggregating actual container memory footprints:

3. Service Runtime Resource Utilization
Granular monitoring of JVM heap usage, Hikari connection pools, thread counts, and FastAPI resource consumption:

4. Custom Telegram Notifications in Action
Telegram channel displaying firing system outage alerts and instant recovery notifications formatted with the custom HTML template:

Component
client/api/services/spring-ingestion/services/spring-logbook/services/spring-alerts/services/py-intelligence/infra/.github/workflows/API Impact
api/openapi.yamlwas updated.Testing
Checklist
(feat|fix)/(issue_id)/(name_of_issue).Related Issue
Closes #15