Skip to content

feat: Add AI Monitoring Agent with Azure Application Insights integration#23

Open
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1777559047-ai-monitoring-agent
Open

feat: Add AI Monitoring Agent with Azure Application Insights integration#23
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1777559047-ai-monitoring-agent

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented Apr 30, 2026

Summary

Adds a new AI Monitoring Agent microservice (port 5006) under src/Services/Monitoring/AIMonitoringAgent/ that integrates with Azure Application Insights to provide cross-cutting observability for the microservices platform.

Key capabilities:

  • AI Model Performance Tracking — Track latency, token usage (prompt/completion), success/failure rates per model via POST /api/monitoring/ai-model/track
  • Service Health Monitoring — Background service polls all microservice /healthz endpoints at configurable intervals, records availability telemetry to App Insights
  • Statistical Anomaly Detection — Rolling-window standard deviation analysis detects abnormal metric values with configurable sensitivity
  • Configurable Alerting — CRUD alert rules with threshold conditions (GreaterThan, LessThan, etc.) and severity levels; alerts fire to App Insights events
  • Dashboard APIGET /api/monitoring/dashboard returns aggregated AI model summaries, service health, and recent anomalies
  • Full App Insights Integration — Events, custom metrics, availability telemetry, dependency tracking, and exception tracking

Files added/modified:

  • src/Services/Monitoring/AIMonitoringAgent/ — Complete .NET 10 Web API project (models, services, controllers, configuration)
  • src/docker-compose.yml — Added ai-monitoring-agent service definition
  • README.md — Updated architecture table and project structure

Default alert rules included:

  • High Response Time (> 5000ms)
  • Consecutive Failures (>= 3)
  • High AI Model Latency (> 10000ms)
  • AI Model Failure Rate (> 5 failures)

Review & Testing Checklist for Human

  • Verify the App Insights connection string is correctly configured for your Azure environment before deployment
  • Run dotnet build src/Services/Monitoring/AIMonitoringAgent/AIMonitoringAgent.csproj to confirm the project compiles
  • Test the API endpoints via Swagger UI at http://localhost:5006/swagger by running dotnet run from the project directory
  • Validate health check polling works by starting other microservices and checking GET /api/monitoring/health/services
  • Review default alert rule thresholds in appsettings.json and adjust for your environment

Notes

  • The agent works without an App Insights connection string (telemetry is logged locally), but full functionality requires a valid Azure connection string
  • The anomaly detection requires a minimum of 10 data points before it starts flagging anomalies
  • The Dockerfile uses mcr.microsoft.com/dotnet/sdk:10.0-preview base images to match the existing .NET 10 services

Link to Devin session: https://partner-workshops.devinenterprise.com/sessions/d05917139ee340c89cb4b7ef82266e3c


Open in Devin Review

…tion

- New AIMonitoringAgent service (port 5006) under Services/Monitoring/
- Azure Application Insights telemetry: events, metrics, availability, exceptions
- AI model performance tracking: latency, token usage, success/failure rates
- Service health monitoring with background polling of all microservice endpoints
- Statistical anomaly detection using rolling-window standard deviation analysis
- Configurable alert rules with threshold conditions and severity levels
- REST API with dashboard, monitoring, and alerts controllers
- Swagger UI for development
- Dockerfile and docker-compose integration
- Default alert rules for response time, failures, and AI model metrics
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

builder.Services.AddSingleton<IAlertingService, AlertingService>();

// Health monitor with HttpClient
builder.Services.AddHttpClient<IServiceHealthMonitor, ServiceHealthMonitor>();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 ServiceHealthMonitor registered as transient via AddHttpClient, causing all instance state to be lost between requests

AddHttpClient<IServiceHealthMonitor, ServiceHealthMonitor>() at Program.cs:28 registers ServiceHealthMonitor with a transient lifetime. However, ServiceHealthMonitor stores critical state in its instance field _latestStatuses (ServiceHealthMonitor.cs:24). Because a new instance is created for every DI resolution:

  1. GET /api/monitoring/health/services always returns an empty list — the controller (MonitoringController.cs:59) gets a fresh ServiceHealthMonitor with an empty _latestStatuses.
  2. GET /api/monitoring/dashboard always returns empty services — same reason (MonitoringController.cs:95).
  3. ConsecutiveFailures never accumulates across health check cycles — the HealthCheckBackgroundService creates a new scope (and thus a new ServiceHealthMonitor) each cycle (HealthCheckBackgroundService.cs:31-32), so the lookup at ServiceHealthMonitor.cs:69 never finds a previous status.

The _latestStatuses state needs to persist across requests. Either extract the state into a separate singleton, or register ServiceHealthMonitor differently (e.g., using AddHttpClient with an explicit singleton wrapper for the state).

Prompt for agents
The problem is that AddHttpClient<IServiceHealthMonitor, ServiceHealthMonitor>() registers ServiceHealthMonitor as transient, but the class stores state in _latestStatuses (a ConcurrentDictionary) that must persist across requests and background service cycles.

Affected file: src/Services/Monitoring/AIMonitoringAgent/Services/ServiceHealthMonitor.cs (field _latestStatuses on line 24)
Registration: src/Services/Monitoring/AIMonitoringAgent/Program.cs line 28
Consumers: MonitoringController.cs (lines 59, 95) and HealthCheckBackgroundService.cs (lines 31-33)

Approach 1: Extract _latestStatuses into a separate singleton service (e.g. IServiceHealthStateStore) that both ServiceHealthMonitor instances and controllers can share.

Approach 2: Register a named HttpClient and manually create ServiceHealthMonitor as a singleton that takes IHttpClientFactory instead of HttpClient directly. Change the constructor to accept IHttpClientFactory and create clients on demand.

Approach 2 is cleaner because it keeps the state co-located with the health monitor logic while still getting proper HttpClient management via the factory pattern.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/docker-compose.yml
Comment on lines +96 to +98
environment:
- ASPNETCORE_ENVIRONMENT=Development
- Monitoring__ApplicationInsightsConnectionString=
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Docker Compose health check endpoints use localhost instead of Docker service names

The ServiceEndpoints in appsettings.json default to localhost URLs (e.g., http://localhost:5001/healthz). The docker-compose.yml adds the ai-monitoring-agent service but does not override these endpoints with Docker Compose service names. In Docker Compose networking, localhost inside the monitoring agent container resolves to the container itself—not the other services. All background health checks (HealthCheckBackgroundService.cs:33ServiceHealthMonitor.CheckAllServicesAsync) will fail with connection refused errors every 30 seconds. The endpoints should be http://identity-service:5001/healthz, http://customer-service:5002/healthz, etc.

Suggested change
environment:
- ASPNETCORE_ENVIRONMENT=Development
- Monitoring__ApplicationInsightsConnectionString=
environment:
- ASPNETCORE_ENVIRONMENT=Development
- Monitoring__ApplicationInsightsConnectionString=
- Monitoring__ServiceEndpoints__identity-service=http://identity-service:5001/healthz
- Monitoring__ServiceEndpoints__customer-service=http://customer-service:5002/healthz
- Monitoring__ServiceEndpoints__order-service=http://order-service:5003/healthz
- Monitoring__ServiceEndpoints__product-service=http://product-service:5004/healthz
- Monitoring__ServiceEndpoints__notification-service=http://notification-service:5005/healthz
- Monitoring__ServiceEndpoints__api-gateway=http://api-gateway:5000/healthz
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants