Skip to content

Add standalone App Insights AI Monitoring Agent (single C# application)#26

Open
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1777560633-appinsights-single-app
Open

Add standalone App Insights AI Monitoring Agent (single C# application)#26
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1777560633-appinsights-single-app

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented Apr 30, 2026

Summary

Adds a standalone, self-contained Azure Application Insights AI Monitoring Agent as a single C# ASP.NET Core web application (AppInsightsMonitoringAgent). This is a single project with zero dependencies on other projects in the solution — it can be built, deployed, and run independently.

Project Structure

All components live under src/Services/Monitoring/AppInsightsMonitoringAgent/:

AppInsightsMonitoringAgent/
├── Program.cs                          # App entry point & DI wiring
├── appsettings.json                    # Configuration (App Insights, thresholds, service URLs)
├── Models/MonitoringModels.cs          # All DTOs, records, and enums
├── Telemetry/
│   ├── ServiceTelemetryInitializer.cs  # Cloud role name tagging
│   ├── CorrelationTelemetryInitializer.cs # X-Correlation-ID propagation
│   └── AiDiagnosticTelemetryProcessor.cs  # Slow request flagging & duration bucketing
├── Metrics/ServiceMetricsCollector.cs  # Rolling-window request metrics & P95 calculation
├── Middleware/AiTelemetryMiddleware.cs # Per-request timing capture
├── HealthChecks/AppInsightsHealthCheck.cs # App Insights connectivity check
├── AiEngine/
│   ├── AnomalyDetectionEngine.cs       # Statistical anomaly detection (5 detection algorithms)
│   ├── HealthScoringEngine.cs          # Composite 0-100 health scoring
│   ├── AiInsightsEngine.cs             # Cross-service pattern analysis & recommendations
│   ├── ServiceHealthAggregator.cs      # Service probing & dashboard generation
│   └── MonitoringBackgroundService.cs  # Periodic telemetry collection loop
└── Controllers/MonitoringController.cs # REST API (7 endpoints)

Key Capabilities

  • Anomaly Detection: Error rate spikes, P95 response time breaches, memory pressure, dependency failures, 2-sigma response time spike detection
  • Health Scoring: Weighted composite score across availability (30%), performance (30%), error rate (25%), resource usage (15%)
  • AI Insights: Cross-service pattern analysis generating actionable recommendations for performance, reliability, scalability
  • REST API: /api/monitoring/dashboard, /services, /anomalies, /insights, /history, /summary
  • Background Monitoring: Configurable-interval periodic collection with App Insights metric publishing

Review & Testing Checklist for Human

  • Set ApplicationInsights:ConnectionString in appsettings.json before deploying to a real Azure environment
  • Verify MonitoredServices URLs match your deployment topology
  • Run dotnet build Services/Monitoring/AppInsightsMonitoringAgent/AppInsightsMonitoringAgent.csproj to confirm it builds
  • Start the agent and test GET /api/monitoring/dashboard returns valid JSON

Notes

  • This is a fully self-contained single application — no shared library dependencies on other solution projects
  • Works in degraded mode without an App Insights connection string (telemetry collected locally but not sent to Azure)
  • The background service polls all configured microservice /healthz endpoints every 60 seconds (configurable)

Link to Devin session: https://partner-workshops.devinenterprise.com/sessions/92d2181c6896487687232a75e23c35d0


Open in Devin Review

- Self-contained ASP.NET Core web application with zero external project dependencies
- Includes Azure Application Insights SDK integration with custom telemetry
  initializers, processors, and per-request metrics collection
- AI-powered anomaly detection engine with statistical analysis (error rate
  spikes, response time degradation, memory pressure, 2-sigma spike detection)
- Health scoring engine computing composite 0-100 scores across availability,
  performance, error rate, and resource utilization
- AI insights engine for cross-service pattern analysis with actionable
  recommendations
- Background service for periodic telemetry collection and App Insights
  metric publishing
- REST API endpoints for dashboard, service health, anomalies, insights,
  history, and system summary
- Swagger/OpenAPI documentation enabled in development mode
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 potential issues.

View 6 additional findings in Devin Review.

Open in Devin Review

Comment on lines +36 to +37
public static MonitoringDashboard? LatestDashboard =>
RecentDashboards.TryPeek(out var dashboard) ? dashboard : null;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 LatestDashboard returns the oldest dashboard instead of the most recent one

ConcurrentQueue<T>.TryPeek returns the element at the head of the queue, which is the oldest item (since Enqueue adds to the tail and TryDequeue removes from the head). The LatestDashboard property is supposed to return the most recently collected dashboard snapshot, but instead returns the oldest one still in the queue. This is consumed by the GET /api/monitoring/dashboard/latest endpoint at Controllers/MonitoringController.cs:38, so API callers will receive stale data (up to 60 cycles old) instead of the current state.

Prompt for agents
The LatestDashboard property in MonitoringBackgroundService uses ConcurrentQueue.TryPeek which returns the oldest (head) element, not the newest. The intent is to return the most recently enqueued dashboard. Since ConcurrentQueue does not have a method to peek at the tail, the simplest fix is to maintain a separate field (e.g. a volatile MonitoringDashboard? _latestDashboard) that is updated each time a new dashboard is enqueued in the ExecuteAsync loop (around line 53). The LatestDashboard property would then return that field instead of using TryPeek. Alternatively, the data structure could be changed from ConcurrentQueue to something that supports efficient access to the last element.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +30 to +36
RecordSnapshot(snapshot);

DetectHighErrorRate(snapshot, anomalies);
DetectSlowResponseTime(snapshot, anomalies);
DetectHighMemoryUsage(snapshot, anomalies);
DetectDependencyFailures(snapshot, anomalies);
DetectResponseTimeSpike(snapshot, anomalies);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Response time spike detection baseline includes the current snapshot being tested

In AnomalyDetectionEngine.Analyze(), RecordSnapshot(snapshot) is called at line 30 before DetectResponseTimeSpike(snapshot, anomalies) at line 36. Inside DetectResponseTimeSpike, the history (now including the current snapshot) is used to compute the baseline mean and standard deviation (lines 131-140). This means the current snapshot's AverageResponseTimeMs inflates both the mean and stddev of the baseline, making it systematically harder to detect a genuine spike — the very value being tested for anomaly is part of its own reference distribution.

Suggested change
RecordSnapshot(snapshot);
DetectHighErrorRate(snapshot, anomalies);
DetectSlowResponseTime(snapshot, anomalies);
DetectHighMemoryUsage(snapshot, anomalies);
DetectDependencyFailures(snapshot, anomalies);
DetectResponseTimeSpike(snapshot, anomalies);
DetectResponseTimeSpike(snapshot, anomalies);
RecordSnapshot(snapshot);
DetectHighErrorRate(snapshot, anomalies);
DetectSlowResponseTime(snapshot, anomalies);
DetectHighMemoryUsage(snapshot, anomalies);
DetectDependencyFailures(snapshot, anomalies);
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

ActiveConnections: 0);

var resources = new ResourceUtilization(
CpuPercent: snapshot.CpuTimeSeconds,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CpuPercent field is populated with cumulative CPU time in seconds, not a percentage

ResourceUtilization.CpuPercent is set to snapshot.CpuTimeSeconds (ServiceHealthAggregator.cs:132), which is the total processor time since the process started (e.g. 300.5 seconds). This value is not a percentage (0–100) — it's a monotonically increasing number of seconds. This incorrect metric is then published to Application Insights via MonitoringBackgroundService and exposed in the dashboard API, leading to nonsensical CPU utilization reporting.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +111 to +112
var response = await _httpClient.GetAsync($"{baseUrl}/healthz");
isReachable = response.IsSuccessStatusCode;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 HttpResponseMessage from health probe is never disposed, leaking sockets

HttpResponseMessage implements IDisposable and must be disposed to release the underlying socket connection back to the pool. In ProbeServiceAsync, the response from _httpClient.GetAsync() at line 111 is stored in a local variable but never disposed. Since this method is called once per monitored service per monitoring cycle (every 60s for 6 services by default), this will accumulate undisposed response objects over time, potentially exhausting the socket pool.

Suggested change
var response = await _httpClient.GetAsync($"{baseUrl}/healthz");
isReachable = response.IsSuccessStatusCode;
using var response = await _httpClient.GetAsync($"{baseUrl}/healthz");
isReachable = response.IsSuccessStatusCode;
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants