Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ The monolith's bounded contexts are decomposed into the following independently
| `product-service` | 5004 | Product catalog management | `ProductsController`, product models |
| `notification-service` | 5005 | Email and in-app notifications | `NotificationService`, notification models |
| `api-gateway` | 5000 | YARP reverse proxy, request routing, rate limiting | New — replaces monolith's single entry point |
| `monitoring-agent` | 7071 | AI-powered App Insights monitoring Azure Function | New — observability and anomaly detection |

## Project Structure

Expand Down Expand Up @@ -72,6 +73,16 @@ src/
│ ├── Notification.API/
│ ├── Notification.Domain/
│ └── Notification.Infrastructure/
├── Functions/
│ └── Monitoring/
│ └── Monitoring.Functions/ # Azure Function — AI monitoring agent
│ ├── Functions/ # Timer & HTTP-triggered functions
│ ├── Services/ # App Insights query, AI analysis, alerting
│ ├── Models/ # Telemetry, anomaly, alert models
│ ├── Configuration/ # Monitoring options
│ ├── Program.cs
│ ├── host.json
│ └── Dockerfile
├── Shared/
│ ├── Shared.Contracts/ # Shared DTOs, events, interfaces
│ └── Shared.Infrastructure/ # Common middleware, logging, health checks
Expand All @@ -86,6 +97,9 @@ src/
- **Entity Framework Core** — per-service database (database-per-service pattern)
- **YARP** — API gateway / reverse proxy
- **RabbitMQ** — async messaging between services
- **Azure Functions v4** — serverless monitoring agent (isolated worker)
- **Azure Application Insights** — telemetry collection and querying
- **Azure OpenAI** — AI-powered anomaly analysis and health insights
- **Docker** — containerized services
- **Kubernetes** — orchestration (see `app_dotnet_angular_containerized_decomposition_iac` for Helm charts)

Expand All @@ -102,6 +116,44 @@ cd src/Services/Identity/Identity.API
dotnet run
```

## AI Monitoring Agent

The `monitoring-agent` is an Azure Function (C#, .NET 10, isolated worker) that provides AI-powered observability for all microservices via Azure Application Insights.

### Functions

| Function | Trigger | Schedule | Description |
|----------|---------|----------|-------------|
| `AnomalyDetector` | Timer | Every 5 min | Queries App Insights telemetry and detects anomalies using rule-based + statistical (z-score) + AI analysis |
| `HealthMonitor` | Timer | Every 30 min | Generates comprehensive health reports with AI summaries and sends webhook notifications |
| `GetHealthReport` | HTTP GET | On-demand | Returns full platform health report with per-service AI insights (`/api/monitoring/health`) |
| `GetServiceTelemetry` | HTTP GET | On-demand | Returns detailed telemetry and AI analysis for a specific service (`/api/monitoring/services/{name}`) |
| `GetAnomalies` | HTTP GET | On-demand | Lists all detected anomalies across services (`/api/monitoring/anomalies`) |
| `AlertWebhook` | HTTP POST | On-demand | Receives App Insights alert webhooks and enriches with AI analysis (`/api/monitoring/alerts/webhook`) |
| `TriggerManualAnalysis` | HTTP POST | On-demand | Triggers on-demand AI analysis for a specific service (`/api/monitoring/analyze/{name}`) |

### Capabilities

- **Anomaly Detection**: Rule-based thresholds (failure rate, response time) + statistical z-score analysis on time-series data
- **AI-Powered Insights**: Azure OpenAI generates root cause analysis, health summaries, and exception pattern analysis
- **Webhook Alerts**: Sends adaptive card notifications to Microsoft Teams or Slack
- **KQL Queries**: Queries App Insights via Azure Monitor Query SDK (requests, exceptions, dependencies)

### Configuration

Set the following environment variables (or `local.settings.json` values):

| Variable | Description |
|----------|-------------|
| `APPLICATIONINSIGHTS_CONNECTION_STRING` | App Insights connection string |
| `Monitoring__WorkspaceId` | Log Analytics workspace ID |
| `Monitoring__AzureOpenAIEndpoint` | Azure OpenAI endpoint URL |
| `Monitoring__AzureOpenAIDeployment` | Model deployment name (default: `gpt-4o`) |
| `Monitoring__AlertWebhookUrl` | Teams/Slack incoming webhook URL |
| `Monitoring__FailureRateThresholdPercent` | Failure rate alert threshold (default: `5.0`) |
| `Monitoring__ResponseTimeThresholdMs` | P95 response time threshold (default: `2000`) |
| `Monitoring__MonitoredServices` | Comma-separated service names to monitor |

## Related Repositories

| Repo | Purpose |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
namespace Monitoring.Functions.Configuration;

public sealed class MonitoringOptions
{
public string ApplicationInsightsConnectionString { get; set; } = string.Empty;
public string WorkspaceId { get; set; } = string.Empty;
public string AzureOpenAIEndpoint { get; set; } = string.Empty;
public string AzureOpenAIDeployment { get; set; } = "gpt-4o";
public string AlertWebhookUrl { get; set; } = string.Empty;
public int AnomalyDetectionIntervalMinutes { get; set; } = 5;
public int HealthCheckIntervalMinutes { get; set; } = 2;
public double FailureRateThresholdPercent { get; set; } = 5.0;
public double ResponseTimeThresholdMs { get; set; } = 2000;
public string MonitoredServices { get; set; } = string.Empty;

public IReadOnlyList<string> GetMonitoredServiceNames() =>
MonitoredServices
.Split(',', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries)
.ToList();

public IReadOnlyList<MonitoredServiceConfig> GetServiceConfigs() =>
GetMonitoredServiceNames()
.Select(name => new MonitoredServiceConfig(name))
.ToList();
}

public record MonitoredServiceConfig(string ServiceName)
{
public string CloudRoleName => ServiceName;
}
15 changes: 15 additions & 0 deletions src/Functions/Monitoring/Monitoring.Functions/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build
WORKDIR /src

COPY Shared/Shared.Contracts/Shared.Contracts.csproj Shared/Shared.Contracts/
COPY Functions/Monitoring/Monitoring.Functions/Monitoring.Functions.csproj Functions/Monitoring/Monitoring.Functions/
RUN dotnet restore Functions/Monitoring/Monitoring.Functions/Monitoring.Functions.csproj

COPY Shared/Shared.Contracts/ Shared/Shared.Contracts/
COPY Functions/Monitoring/Monitoring.Functions/ Functions/Monitoring/Monitoring.Functions/
RUN dotnet publish Functions/Monitoring/Monitoring.Functions/Monitoring.Functions.csproj \
-c Release -o /app/publish --no-restore

FROM mcr.microsoft.com/azure-functions/dotnet-isolated:4-dotnet-isolated10.0
ENV AzureWebJobsScriptRoot=/home/site/wwwroot
COPY --from=build /app/publish /home/site/wwwroot
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
using System.Text.Json;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.Functions.Worker;
using Microsoft.Extensions.Logging;
using Monitoring.Functions.Models;
using Monitoring.Functions.Services;

namespace Monitoring.Functions.Functions;

public sealed class AlertWebhookFunction
{
private readonly IAppInsightsQueryService _queryService;
private readonly IAiAnalysisService _aiService;
private readonly IAlertService _alertService;
private readonly ILogger<AlertWebhookFunction> _logger;

public AlertWebhookFunction(
IAppInsightsQueryService queryService,
IAiAnalysisService aiService,
IAlertService alertService,
ILogger<AlertWebhookFunction> logger)
{
_queryService = queryService;
_aiService = aiService;
_alertService = alertService;
_logger = logger;
}

[Function("AlertWebhook")]
public async Task<IActionResult> RunAsync(
[HttpTrigger(AuthorizationLevel.Function, "post", Route = "monitoring/alerts/webhook")] HttpRequest req,
CancellationToken cancellationToken)
{
_logger.LogInformation("Alert webhook triggered");

try
{
var body = await new StreamReader(req.Body).ReadToEndAsync(cancellationToken);
var alertPayload = JsonSerializer.Deserialize<AppInsightsAlertPayload>(body,
new JsonSerializerOptions { PropertyNameCaseInsensitive = true });

if (alertPayload is null)
return new BadRequestObjectResult(new { error = "Invalid alert payload" });

var serviceName = alertPayload.Data?.Context?.ResourceName ?? "unknown-service";

var telemetry = await _queryService.GetServiceTelemetryAsync(
serviceName, TimeSpan.FromMinutes(30), cancellationToken);

var anomalies = await _aiService.DetectAnomaliesAsync(telemetry, cancellationToken);
var exceptionAnalysis = await _aiService.AnalyzeExceptionPatternAsync(
telemetry.TopExceptions, serviceName, cancellationToken);

var alert = await _alertService.CreateAlertAsync(
serviceName,
MapSeverity(alertPayload.Data?.Context?.Severity),
alertPayload.Data?.Context?.Name ?? "App Insights Alert",
$"Alert triggered: {alertPayload.Data?.Context?.Description}. " +
$"AI Analysis: {exceptionAnalysis}",
anomalies,
cancellationToken);

await _alertService.SendAlertNotificationAsync(alert, cancellationToken);

return new OkObjectResult(new
{
alertId = alert.AlertId,
processed = true,
anomaliesDetected = anomalies.Count,
aiEnriched = true
});
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to process alert webhook");
return new StatusCodeResult(500);
}
}

[Function("TriggerManualAnalysis")]
public async Task<IActionResult> TriggerManualAnalysisAsync(
[HttpTrigger(AuthorizationLevel.Function, "post", Route = "monitoring/analyze/{serviceName}")] HttpRequest req,
string serviceName,
CancellationToken cancellationToken)
{
_logger.LogInformation("Manual analysis triggered for {Service}", serviceName);

var telemetry = await _queryService.GetServiceTelemetryAsync(
serviceName, TimeSpan.FromHours(1), cancellationToken);

var anomalies = await _aiService.DetectAnomaliesAsync(telemetry, cancellationToken);
var insight = await _aiService.GenerateHealthInsightAsync(telemetry, cancellationToken);
var exceptionAnalysis = await _aiService.AnalyzeExceptionPatternAsync(
telemetry.TopExceptions, serviceName, cancellationToken);

if (anomalies.Count > 0)
{
var maxSeverity = anomalies.Max(a => a.Severity);
var alert = await _alertService.CreateAlertAsync(
serviceName,
maxSeverity == AnomalySeverity.Critical ? AlertLevel.Critical : AlertLevel.Warning,
$"Manual analysis: {serviceName}",
insight,
anomalies,
cancellationToken);

await _alertService.SendAlertNotificationAsync(alert, cancellationToken);
}

return new OkObjectResult(new
{
serviceName,
telemetry,
healthInsight = insight,
exceptionAnalysis,
anomalies,
analyzedAt = DateTime.UtcNow
});
}

private static AlertLevel MapSeverity(string? severity) => severity?.ToLowerInvariant() switch
{
"sev0" or "critical" => AlertLevel.Critical,
"sev1" or "error" => AlertLevel.Error,
"sev2" or "warning" => AlertLevel.Warning,
_ => AlertLevel.Information
};
}

public record AppInsightsAlertPayload(
string? SchemaId,
AppInsightsAlertData? Data
);

public record AppInsightsAlertData(
AppInsightsAlertContext? Context
);

public record AppInsightsAlertContext(
string? Name,
string? Description,
string? ResourceName,
string? ResourceGroupName,
string? Severity,
string? ConditionType
);
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
using Microsoft.Azure.Functions.Worker;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using Monitoring.Functions.Configuration;
using Monitoring.Functions.Models;
using Monitoring.Functions.Services;

namespace Monitoring.Functions.Functions;

public sealed class AnomalyDetectorFunction
{
private readonly IAppInsightsQueryService _queryService;
private readonly IAiAnalysisService _aiService;
private readonly IAlertService _alertService;
private readonly MonitoringOptions _options;
private readonly ILogger<AnomalyDetectorFunction> _logger;

public AnomalyDetectorFunction(
IAppInsightsQueryService queryService,
IAiAnalysisService aiService,
IAlertService alertService,
IOptions<MonitoringOptions> options,
ILogger<AnomalyDetectorFunction> logger)
{
_queryService = queryService;
_aiService = aiService;
_alertService = alertService;
_options = options.Value;
_logger = logger;
}

[Function("AnomalyDetector")]
public async Task RunAsync(
[TimerTrigger("0 */5 * * * *")] TimerInfo timer,
CancellationToken cancellationToken)
{
_logger.LogInformation("Anomaly detection cycle started at {Time}", DateTime.UtcNow);

var monitoringPeriod = TimeSpan.FromMinutes(_options.AnomalyDetectionIntervalMinutes * 6);

try
{
var allTelemetry = await _queryService.GetAllServicesTelemetryAsync(
monitoringPeriod, cancellationToken);

var allAnomalies = new List<AnomalyResult>();

foreach (var telemetry in allTelemetry)
{
var anomalies = await _aiService.DetectAnomaliesAsync(telemetry, cancellationToken);

if (anomalies.Count > 0)
{
_logger.LogWarning(
"Detected {Count} anomalies for {Service}",
anomalies.Count, telemetry.ServiceName);

allAnomalies.AddRange(anomalies);

var maxSeverity = anomalies.Max(a => a.Severity);
var alertLevel = maxSeverity switch
{
AnomalySeverity.Critical => AlertLevel.Critical,
AnomalySeverity.Warning => AlertLevel.Warning,
_ => AlertLevel.Information
};

var alert = await _alertService.CreateAlertAsync(
telemetry.ServiceName,
alertLevel,
$"Anomalies detected in {telemetry.ServiceName}",
$"{anomalies.Count} anomalies detected: " +
string.Join(", ", anomalies.Select(a => a.Type.ToString()).Distinct()),
anomalies,
cancellationToken);

if (alertLevel >= AlertLevel.Warning)
{
await _alertService.SendAlertNotificationAsync(alert, cancellationToken);
}
}
}

_logger.LogInformation(
"Anomaly detection completed. Services: {ServiceCount}, Anomalies: {AnomalyCount}",
allTelemetry.Count, allAnomalies.Count);
}
catch (Exception ex)
{
_logger.LogError(ex, "Anomaly detection cycle failed");
throw;
}
}
}
Loading