diff --git a/docs/tutorial-agent-watchdog.md b/docs/tutorial-agent-watchdog.md index d8c71666..947debc9 100644 --- a/docs/tutorial-agent-watchdog.md +++ b/docs/tutorial-agent-watchdog.md @@ -11,6 +11,9 @@ sources: queried via KQL. 3. **Foundry control plane** — agent metadata and recent runs read through `azure-ai-projects`. +4. **Azure resource posture** — a read-only WAF-AI Security pillar audit + of the Cognitive Services / Azure OpenAI account that hosts the agent + and judge model. The agent runs the same checks (regression, latency, errors, safety) in three form factors: @@ -67,7 +70,7 @@ Exit codes are CI-friendly: - `2` — a finding meets the configured `--severity-fail` floor - `1` — runtime / configuration error -## 2b. Security posture audit (WAF-AI) +## 3. Security posture audit (WAF-AI) The watchdog can also run a **read-only audit of the Azure footprint** hosting your agent against the [Microsoft Well-Architected Framework @@ -75,6 +78,12 @@ for AI workloads — Security pillar][waf-ai]. This is opt-in: the findings live in their own `security` category and are skipped unless both the `azure_resources` source and the `posture` check are enabled. +Why is this opt-in? The telemetry checks use App Insights and Foundry +metadata that you already configured in the previous step. Security +posture requires management-plane reads against the Azure resource group, +so the tutorial asks for the subscription, resource group, and Cognitive +Services account explicitly instead of guessing them. + The audit runs five high-impact rules against the Cognitive Services / Azure OpenAI account: @@ -86,30 +95,113 @@ Azure OpenAI account: | `waf.security.diagnostic_settings` | warning | Diagnostic logs flowing to Log Analytics / storage / event hub | | `waf.security.content_filter` | critical | Every model deployment has a RAI policy applied | -Required RBAC: **Reader** on the resource group (or on each -individual resource), granted to whoever runs `agentops agent analyze` -(your local identity locally, or the OIDC-federated identity in CI). +Required RBAC: **Reader** on the resource group (or on each individual +resource), granted to whoever runs `agentops agent analyze` (your local +identity locally, or the OIDC-federated identity in CI). + +Find the account to audit: + +```powershell +$env:AZURE_SUBSCRIPTION_ID = az account show --query id -o tsv +$resourceGroup = "" + +az cognitiveservices account list ` + --resource-group $resourceGroup ` + --query "[].{name:name,kind:kind,location:location,disableLocalAuth:properties.disableLocalAuth,publicNetworkAccess:properties.publicNetworkAccess}" ` + -o table +``` + +Pick the account that hosts your Azure OpenAI / AI Services deployment: + +```powershell +$cognitiveAccount = "" +``` Enable in `.agentops/agent.yaml`: -```yaml +```powershell +@" +version: 1 +lookback_days: 7 + sources: + results_history: + enabled: true + path: .agentops/results + lookback_runs: 10 + azure_monitor: + enabled: true + app_insights_resource_id: $appInsightsId + foundry_control: + enabled: true + project_endpoint_env: AZURE_AI_FOUNDRY_PROJECT_ENDPOINT azure_resources: enabled: true - subscription_id_env: AZURE_SUBSCRIPTION_ID # or set subscription_id directly - resource_group: rg-myproject - cognitive_services_account: ai-services-myproject + subscription_id_env: AZURE_SUBSCRIPTION_ID + resource_group: $resourceGroup + cognitive_services_account: $cognitiveAccount checks: + latency: + p95_threshold_seconds: 10.0 + errors: + rate_threshold: 0.05 posture: enabled: true pillar: security - # Skip individual rules without disabling the whole check, e.g. - # exclude_rules: - # - waf.security.diagnostic_settings exclude_rules: [] +"@ | Set-Content .agentops/agent.yaml -Encoding utf8 +``` + +Run only the security category first: + +```powershell +agentops agent analyze --categories security --severity-fail critical +code .agentops/agent/report.md +``` + +In the test run for this tutorial, `azure_resources` changed from +`disabled` to `ok` and the report produced two WAF-AI findings: + +```text +## Verdict: ⚠️ Warnings found + +| Category | Count | +|---|---| +| Security posture (WAF-AI — Security pillar) | 2 | + +| Source | Status | Detail | +|---|---|---| +| azure_resources | ok | + +| Severity | ID | Title | Source | +|---|---|---|---| +| warning | waf.security.diagnostic_settings | Diagnostic settings are missing or incomplete | azure_resources | +| warning | waf.security.public_network_access | Public network access is open and unrestricted | azure_resources | ``` +The evidence blocks in that run showed: + +```json +{ + "account": "aif-agentops-exp", + "diagnostic_settings": [] +} +``` + +```json +{ + "account": "aif-agentops-exp", + "public_network_access": "Enabled", + "private_endpoint_count": 0, + "network_acls_default_action": "Allow" +} +``` + +Those are real management-plane findings: the account had Entra-only +authentication enabled, but it still needed diagnostic settings and a +network restriction plan. + Run only the security category, or skip a specific rule from the CLI: ```bash @@ -127,12 +219,12 @@ agentops agent analyze --exclude-rules waf.security.diagnostic_settings,waf.secu ``` The Markdown report groups findings by category, so security findings -appear under their own `### 🔐 Security` heading with a footer link -back to the WAF-AI guidance. +appear under their own `### Security posture (WAF-AI — Security pillar)` +heading with a footer link back to the WAF-AI guidance. [waf-ai]: https://learn.microsoft.com/azure/well-architected/ai/security -## 3. CI scheduled run +## 4. CI scheduled run Pair the analyzer with a GitHub Actions schedule: @@ -161,7 +253,7 @@ jobs: path: .agentops/agent/report.md ``` -## 4. Copilot Chat extension (local) +## 5. Copilot Chat extension (local) ```bash pip install "agentops-toolkit[agent] @ git+https://github.com/Azure/agentops.git@develop" @@ -173,7 +265,7 @@ Then point a GitHub App's Copilot Extension webhook at local-only** — never expose that endpoint publicly without signature validation. -## 5. Hosted Copilot Extension on Azure Container Apps +## 6. Hosted Copilot Extension on Azure Container Apps The repo ships a minimal scaffold: diff --git a/docs/tutorial-end-to-end.md b/docs/tutorial-end-to-end.md index 22d01ee8..e5aebe34 100644 --- a/docs/tutorial-end-to-end.md +++ b/docs/tutorial-end-to-end.md @@ -762,9 +762,21 @@ az role assignment create ` ### 9.3 Configure the watchdog Now write `.agentops/agent.yaml`. This is the file that tells the -watchdog which signal sources to use: +watchdog which signal sources to use. In addition to eval history, +Application Insights, and Foundry metadata, this tutorial enables the +read-only WAF-AI security posture audit for the Azure AI account: ```powershell +$env:AZURE_SUBSCRIPTION_ID = az account show --query id -o tsv +$cognitiveAccount = az cognitiveservices account list ` + --resource-group $resourceGroup ` + --query "[?kind=='AIServices' || kind=='OpenAI'].name | [0]" ` + -o tsv + +if (-not $cognitiveAccount) { + throw "No AIServices/OpenAI account found in resource group $resourceGroup" +} + @" version: 1 lookback_days: 7 @@ -780,14 +792,32 @@ sources: foundry_control: enabled: true project_endpoint_env: AZURE_AI_FOUNDRY_PROJECT_ENDPOINT + azure_resources: + enabled: true + subscription_id_env: AZURE_SUBSCRIPTION_ID + resource_group: $resourceGroup + cognitive_services_account: $cognitiveAccount checks: latency: p95_threshold_seconds: 5.0 errors: rate_threshold: 0.05 + posture: + enabled: true + pillar: security + exclude_rules: [] "@ | Set-Content .agentops/agent.yaml -Encoding utf8 ``` +If your resource group or account name is different, list candidates with: + +```powershell +az cognitiveservices account list ` + --resource-group $resourceGroup ` + --query "[].{name:name,kind:kind,location:location,disableLocalAuth:properties.disableLocalAuth,publicNetworkAccess:properties.publicNetworkAccess}" ` + -o table +``` + ### 9.4 Generate telemetry, then analyze it Install both the Foundry runtime and the watchdog extras, set the @@ -805,26 +835,27 @@ Start-Sleep -Seconds 90 agentops agent analyze code .agentops/agent/report.md + +# Optional: focus only on WAF-AI security posture. +agentops agent analyze --categories security --severity-fail critical ``` -The report should now show `azure_monitor` as `ok`, not `skipped`. The -watchdog can combine: +The report should now show `azure_monitor` and `azure_resources` as `ok`, +not `skipped`. The watchdog can combine: - eval-history regressions from `.agentops/results`; - live p95 latency and error-rate signals from Application Insights; -- Foundry control-plane metadata from `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`. +- Foundry control-plane metadata from `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`; +- WAF-AI security posture findings from the Cognitive Services / Azure + OpenAI account. If the findings table is empty, that means the configured checks passed; the **Sources** table still proves which signal sources were queried. -> **Optional — WAF-AI security audit.** The watchdog can also run a -> read-only audit of your Foundry resource group against the -> [Well-Architected Framework for AI workloads — Security pillar][waf-ai]. -> Enable the `azure_resources` source and the `posture` check in -> `agent.yaml` (commented stanzas are included), grant your identity -> `Reader` on the resource group, and re-run with -> `agentops agent analyze --categories security`. Full walkthrough: -> [`tutorial-agent-watchdog.md`](tutorial-agent-watchdog.md#2b-security-posture-audit-waf-ai). +In the tutorial test environment, the posture-only run produced two +warnings: missing diagnostic settings and unrestricted public network +access on the AI Services account. Full walkthrough: +[`tutorial-agent-watchdog.md`](tutorial-agent-watchdog.md#3-security-posture-audit-waf-ai). For deeper integration (Copilot Chat extension, ACA deploy), see [`tutorial-agent-watchdog.md`](tutorial-agent-watchdog.md).