Skip to content

Add App Insights AI Monitoring Agent with telemetry integration#24

Open
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1777559703-app-insights-monitoring-agent
Open

Add App Insights AI Monitoring Agent with telemetry integration#24
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1777559703-app-insights-monitoring-agent

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented Apr 30, 2026

Summary

Adds a comprehensive Azure Application Insights AI Monitoring Agent to the microservices platform. This includes:

New Projects

  • Shared.Monitoring (src/Shared/Shared.Monitoring/) — Shared library providing:

    • ServiceTelemetryInitializer — Sets cloud role name/instance for Application Map grouping
    • CorrelationTelemetryInitializer — Propagates X-Correlation-ID headers into telemetry properties
    • AiDiagnosticTelemetryProcessor — Enriches request/dependency telemetry with AI diagnostic metadata (slow request flagging, duration bucketing, failed dependency tagging)
    • ServiceMetricsCollector — Rolling-window request/error metrics with P95 calculation
    • AiTelemetryMiddleware — Per-request telemetry capture middleware
    • AppInsightsHealthCheck — Health check for App Insights connectivity
    • AppInsightsServiceExtensions — Single-call DI registration for all monitoring features
  • Monitoring.Agent (src/Services/Monitoring/Monitoring.Agent/) — Standalone AI monitoring service providing:

    • AnomalyDetectionEngine — Statistical anomaly detection (error rate spikes, response time degradation, memory pressure, dependency failures, 2-sigma response time spike detection)
    • HealthScoringEngine — Composite health scoring (0-100) across availability, performance, error rate, and resource utilization dimensions
    • AiInsightsEngine — Cross-service pattern analysis generating actionable recommendations for performance, reliability, scalability, and cost optimization
    • ServiceHealthAggregator — Polls all service health endpoints and produces unified dashboards
    • MonitoringBackgroundService — Periodic telemetry collection with App Insights metric publishing
    • REST API endpoints: /api/monitoring/dashboard, /api/monitoring/services, /api/monitoring/anomalies, /api/monitoring/insights, /api/monitoring/history, /api/monitoring/summary

Integration

  • All existing microservices (Identity, Customer, Order, Product, Notification) and API Gateway now integrate Shared.Monitoring via AddAppInsightsMonitoring() and UseAppInsightsMonitoring()
  • Each service is assigned a unique cloud role name for Application Map visualization

Bug Fix

  • Fixed incorrect relative paths to Shared projects in all service .csproj files (../../Shared/../../../Shared/) — resolves pre-existing MSB9008 warnings

Review & Testing Checklist for Human

  • Verify the ApplicationInsights:ConnectionString in appsettings.json is configured with a valid Azure Application Insights connection string before deploying to a real environment
  • Confirm the MonitoredServices URLs in Monitoring.Agent/appsettings.json match your deployment topology
  • Test the monitoring dashboard endpoint (GET /api/monitoring/dashboard) returns valid JSON with health scores for all services
  • Validate that the anomaly detection thresholds in AnomalyDetection config are appropriate for your workload patterns
  • Run docker compose up --build to verify all services start correctly with the new monitoring middleware

Notes

  • The monitoring agent runs on its own port (configurable, follows the existing service port pattern)
  • App Insights works in a degraded mode when no connection string is provided — telemetry is collected locally but not sent to Azure
  • The csproj path fix (../../Shared/../../../Shared/) resolves a known issue documented in the repo where builds would fail with MSB9008 warnings

Link to Devin session: https://partner-workshops.devinenterprise.com/sessions/92d2181c6896487687232a75e23c35d0


Open in Devin Review

- Add Shared.Monitoring library with Application Insights SDK integration,
  custom telemetry initializers, processors, and service metrics collection
- Add Monitoring.Agent service with AI-powered anomaly detection engine,
  health scoring engine, and insights generator
- Integrate App Insights monitoring into all microservices (Identity,
  Customer, Order, Product, Notification) and API Gateway
- Fix incorrect relative paths to Shared projects in service csproj files
  (../../Shared -> ../../../Shared)
- Add REST API endpoints for monitoring dashboard, anomaly reports,
  AI insights, and service health summaries
- Add background service for periodic telemetry collection and analysis
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +36 to +37
public static MonitoringDashboard? LatestDashboard =>
RecentDashboards.TryPeek(out var dashboard) ? dashboard : null;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 LatestDashboard returns the oldest dashboard instead of the most recent

ConcurrentQueue.TryPeek retrieves the item at the head (beginning) of the queue, which is the oldest enqueued item. Since dashboards are added via Enqueue (to the tail), TryPeek returns the very first dashboard ever recorded, not the latest. The property LatestDashboard and the controller endpoint at MonitoringController.cs:36-43 (GET dashboard/latest) both intend to return the most recent snapshot, so this returns stale/incorrect data to API consumers.

Prompt for agents
In MonitoringBackgroundService.cs, the LatestDashboard property uses ConcurrentQueue.TryPeek which returns the head (oldest) element. To get the most recent dashboard, either: (1) change the backing store from ConcurrentQueue to a pattern that tracks the last added item (e.g. a volatile field updated on each Enqueue), or (2) use RecentDashboards.Reverse().FirstOrDefault() / RecentDashboards.ToArray().LastOrDefault(). Option 1 is cleaner: add a `private static volatile MonitoringDashboard? _latestDashboard;` field, assign it in ExecuteAsync after Enqueue, and return it from LatestDashboard.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


var gcInfo = GC.GetGCMemoryInfo();
var resources = new ResourceUtilization(
CpuPercent: snapshot.CpuTimeSeconds,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CpuPercent field populated with total CPU time in seconds instead of a percentage

The ResourceUtilization.CpuPercent field (defined at AnomalyReport.cs:53) expects a CPU utilization percentage (0–100), but it is assigned snapshot.CpuTimeSeconds which is the cumulative total processor time in seconds (from Process.TotalProcessorTime). This value grows monotonically (e.g. 45.32, 120.7, …) and is not a percentage. This produces nonsensical CPU utilization readings in the monitoring dashboard and health reports.

Prompt for agents
In ServiceHealthAggregator.cs line 135, CpuPercent is assigned snapshot.CpuTimeSeconds which is a cumulative time value, not a percentage. To compute actual CPU percentage, you need to measure CPU time delta over a wall-clock time delta between two consecutive probes. For example, store the previous snapshot's CpuTimeSeconds and Timestamp, then compute: cpuPercent = (currentCpuTime - prevCpuTime) / (currentTimestamp - prevTimestamp).TotalSeconds / Environment.ProcessorCount * 100. This requires tracking state across probes per service. As a simpler alternative, you could report the raw CpuTimeSeconds value and rename the field, but that changes the API contract.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants