Add App Insights AI Monitoring Agent with telemetry integration#24
Add App Insights AI Monitoring Agent with telemetry integration#24devin-ai-integration[bot] wants to merge 1 commit into
Conversation
- Add Shared.Monitoring library with Application Insights SDK integration, custom telemetry initializers, processors, and service metrics collection - Add Monitoring.Agent service with AI-powered anomaly detection engine, health scoring engine, and insights generator - Integrate App Insights monitoring into all microservices (Identity, Customer, Order, Product, Notification) and API Gateway - Fix incorrect relative paths to Shared projects in service csproj files (../../Shared -> ../../../Shared) - Add REST API endpoints for monitoring dashboard, anomaly reports, AI insights, and service health summaries - Add background service for periodic telemetry collection and analysis
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
| public static MonitoringDashboard? LatestDashboard => | ||
| RecentDashboards.TryPeek(out var dashboard) ? dashboard : null; |
There was a problem hiding this comment.
🔴 LatestDashboard returns the oldest dashboard instead of the most recent
ConcurrentQueue.TryPeek retrieves the item at the head (beginning) of the queue, which is the oldest enqueued item. Since dashboards are added via Enqueue (to the tail), TryPeek returns the very first dashboard ever recorded, not the latest. The property LatestDashboard and the controller endpoint at MonitoringController.cs:36-43 (GET dashboard/latest) both intend to return the most recent snapshot, so this returns stale/incorrect data to API consumers.
Prompt for agents
In MonitoringBackgroundService.cs, the LatestDashboard property uses ConcurrentQueue.TryPeek which returns the head (oldest) element. To get the most recent dashboard, either: (1) change the backing store from ConcurrentQueue to a pattern that tracks the last added item (e.g. a volatile field updated on each Enqueue), or (2) use RecentDashboards.Reverse().FirstOrDefault() / RecentDashboards.ToArray().LastOrDefault(). Option 1 is cleaner: add a `private static volatile MonitoringDashboard? _latestDashboard;` field, assign it in ExecuteAsync after Enqueue, and return it from LatestDashboard.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
|
||
| var gcInfo = GC.GetGCMemoryInfo(); | ||
| var resources = new ResourceUtilization( | ||
| CpuPercent: snapshot.CpuTimeSeconds, |
There was a problem hiding this comment.
🔴 CpuPercent field populated with total CPU time in seconds instead of a percentage
The ResourceUtilization.CpuPercent field (defined at AnomalyReport.cs:53) expects a CPU utilization percentage (0–100), but it is assigned snapshot.CpuTimeSeconds which is the cumulative total processor time in seconds (from Process.TotalProcessorTime). This value grows monotonically (e.g. 45.32, 120.7, …) and is not a percentage. This produces nonsensical CPU utilization readings in the monitoring dashboard and health reports.
Prompt for agents
In ServiceHealthAggregator.cs line 135, CpuPercent is assigned snapshot.CpuTimeSeconds which is a cumulative time value, not a percentage. To compute actual CPU percentage, you need to measure CPU time delta over a wall-clock time delta between two consecutive probes. For example, store the previous snapshot's CpuTimeSeconds and Timestamp, then compute: cpuPercent = (currentCpuTime - prevCpuTime) / (currentTimestamp - prevTimestamp).TotalSeconds / Environment.ProcessorCount * 100. This requires tracking state across probes per service. As a simpler alternative, you could report the raw CpuTimeSeconds value and rename the field, but that changes the API contract.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Adds a comprehensive Azure Application Insights AI Monitoring Agent to the microservices platform. This includes:
New Projects
Shared.Monitoring (
src/Shared/Shared.Monitoring/) — Shared library providing:ServiceTelemetryInitializer— Sets cloud role name/instance for Application Map groupingCorrelationTelemetryInitializer— Propagates X-Correlation-ID headers into telemetry propertiesAiDiagnosticTelemetryProcessor— Enriches request/dependency telemetry with AI diagnostic metadata (slow request flagging, duration bucketing, failed dependency tagging)ServiceMetricsCollector— Rolling-window request/error metrics with P95 calculationAiTelemetryMiddleware— Per-request telemetry capture middlewareAppInsightsHealthCheck— Health check for App Insights connectivityAppInsightsServiceExtensions— Single-call DI registration for all monitoring featuresMonitoring.Agent (
src/Services/Monitoring/Monitoring.Agent/) — Standalone AI monitoring service providing:AnomalyDetectionEngine— Statistical anomaly detection (error rate spikes, response time degradation, memory pressure, dependency failures, 2-sigma response time spike detection)HealthScoringEngine— Composite health scoring (0-100) across availability, performance, error rate, and resource utilization dimensionsAiInsightsEngine— Cross-service pattern analysis generating actionable recommendations for performance, reliability, scalability, and cost optimizationServiceHealthAggregator— Polls all service health endpoints and produces unified dashboardsMonitoringBackgroundService— Periodic telemetry collection with App Insights metric publishing/api/monitoring/dashboard,/api/monitoring/services,/api/monitoring/anomalies,/api/monitoring/insights,/api/monitoring/history,/api/monitoring/summaryIntegration
Shared.MonitoringviaAddAppInsightsMonitoring()andUseAppInsightsMonitoring()Bug Fix
.csprojfiles (../../Shared/→../../../Shared/) — resolves pre-existing MSB9008 warningsReview & Testing Checklist for Human
ApplicationInsights:ConnectionStringinappsettings.jsonis configured with a valid Azure Application Insights connection string before deploying to a real environmentMonitoredServicesURLs inMonitoring.Agent/appsettings.jsonmatch your deployment topologyGET /api/monitoring/dashboard) returns valid JSON with health scores for all servicesAnomalyDetectionconfig are appropriate for your workload patternsdocker compose up --buildto verify all services start correctly with the new monitoring middlewareNotes
../../Shared/→../../../Shared/) resolves a known issue documented in the repo where builds would fail with MSB9008 warningsLink to Devin session: https://partner-workshops.devinenterprise.com/sessions/92d2181c6896487687232a75e23c35d0