diff --git a/CHANGELOG.md b/CHANGELOG.md index c71a717..2559113 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,6 +15,24 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres - Updated the tutorials to prefer the interactive `agentops init` wizard, explain evaluator deployment separately from initialization, and include forced regression/fix loops for prompt and hosted agent paths. +- Re-ask starter `agent` and `dataset` values during the first interactive + `agentops init` run so tutorial users replace `my-agent:1` with their target. +- Removed the interactive App Insights question from `agentops init`; runtime + commands discover it from the Foundry project when possible, and + `--appinsights-connection-string` remains available for explicit setup. +- Made `workflow analyze` output use a lighter PowerShell-friendly summary, + Markdown tables, and user-facing Foundry eval labels; also removed a + non-actionable latency warning from the normal analysis output. +- Made `workflow generate` next steps gentler for PowerShell and tutorial users: + PR/watchdog-only output now asks for only the `dev` environment, explains + that deploy setup can wait, and points users to Copilot-assisted GitHub/OIDC + setup. + +### Fixed +- **Doctor App Insights discovery.** The `azure_monitor` source now falls back + to an App Insights `ApplicationId` from `APPLICATIONINSIGHTS_CONNECTION_STRING` + or Foundry project telemetry discovery, so Doctor no longer reports runtime + telemetry as unconfigured when Cockpit can already resolve App Insights. ## [0.2.0] - 2026-05-22 diff --git a/docs/ci-github-actions.md b/docs/ci-github-actions.md index 3bdecfa..7cba1d5 100644 --- a/docs/ci-github-actions.md +++ b/docs/ci-github-actions.md @@ -10,8 +10,8 @@ deployment wiring (azd, prompt-agent, or placeholder) and the eval runner. Generate the PR gate first: `agentops workflow generate --kinds pr`. Add DEV/QA/PROD after GitHub Environments and Azure OIDC are ready. Repos with `azure.yaml` use azd-backed deploys; Foundry prompt agents can use -prompt-agent deploys and the official Microsoft Foundry AI Agent Evaluation -runner when the dataset is compatible. +prompt-agent deploys and the Microsoft Foundry AI Agent Evaluation runner when +the dataset is compatible. The full scaffold ships five templates: @@ -77,6 +77,32 @@ agentops workflow generate --kinds pr agentops workflow generate --kinds pr,dev,qa,prod --deploy-mode auto --force ``` +## Copilot-assisted setup + +The GitHub setup spans repository creation, Azure OIDC, Actions variables, +GitHub Environments, and branch protection. For a smoother first run, install +the AgentOps workflow skill and hand this setup to Copilot: + +```bash +agentops skills install --platform copilot +``` + +Then open Copilot and run `/skills`. Confirm `agentops-workflow` is loaded +before continuing. + +When the skill is loaded, ask Copilot: + +```text +Use the AgentOps workflow skill to get the generated AgentOps GitHub Actions +workflows running end to end. + +This may be a new folder with no Git repo or GitHub remote yet. Create or +connect the GitHub repo if needed, wire Azure OIDC and required Actions +variables, create only the environments used by the generated workflows, show me +the plan before changing GitHub or Azure, and call out anything that needs +owner/admin permission. +``` + ## Configuration walkthrough ### 1. Repository variables (OIDC) @@ -89,7 +115,7 @@ In Settings → Secrets and variables → Actions → **Variables**, add: | `AZURE_TENANT_ID` | Azure AD tenant | | `AZURE_SUBSCRIPTION_ID` | Target subscription | | `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT` | Foundry project URL (used by the eval step) | -| `AZURE_OPENAI_DEPLOYMENT` | Model deployment used by local evaluators and the official AI Agent Evaluation runner | +| `AZURE_OPENAI_DEPLOYMENT` | Model deployment used by local evaluators and Microsoft Foundry AI Agent Evaluation | | `APPLICATIONINSIGHTS_CONNECTION_STRING` | Optional fallback when the Foundry project's App Insights connection cannot be auto-discovered | Then on the Azure side, configure Workload Identity Federation @@ -165,9 +191,9 @@ signals, and existing CI folders. README matches such as GPT-RAG, Live Voice, or AI Landing Zone are treated as hints; structural files drive the recommendation. `workflow generate --deploy-mode auto` uses the same recommendation, so the analysis and generated templates do not drift. The analyzer also reports the -eval runner: `official-ai-agent-evaluation` for compatible Foundry prompt -agents, otherwise `agentops-local`. If you omit `--deploy-mode`, the default is -`auto`; the command output prints the selected effective mode, for example +eval runner: Microsoft Foundry AI Agent Evaluation for compatible Foundry prompt +agents, otherwise AgentOps local eval. If you omit `--deploy-mode`, the default +is `auto`; the command output prints the selected effective mode, for example `azd (auto default)` or `placeholder (auto default)`. Use one of these modes: @@ -237,7 +263,7 @@ Each deploy workflow does this: 1. stages a candidate Foundry prompt-agent version from `prompt_file`; 2. writes `.agentops/deployments/agentops.candidate.yaml` pointing at the candidate `name:version`; -3. runs the official AI Agent Evaluation runner against that candidate version +3. runs Microsoft Foundry AI Agent Evaluation against that candidate version when supported, or `agentops eval run` as the local fallback; 4. runs `agentops doctor --evidence-pack` so the exact candidate has release evidence; 5. records `.agentops/deployments/foundry-agent.json` as a CI artifact only @@ -245,12 +271,12 @@ Each deploy workflow does this: This keeps the invariant clear: **the evaluated agent version is the deployed agent version**. Foundry manages the candidate agent versions; AgentOps -prepares the official-eval input under `.agentops/official-eval/` when that -runner is selected, and always supplies the repo-side gate, deployment record, -and Cockpit visibility. +prepares the Microsoft Foundry eval input under `.agentops/official-eval/` when +that runner is selected, and always supplies the repo-side gate, deployment +record, and Cockpit visibility. Preview branches can temporarily route the generated GitHub workflow to a fork -of the official eval action before an upstream action PR is merged: +of the Microsoft Foundry eval action before an upstream action PR is merged: ```powershell $env:AGENTOPS_OFFICIAL_EVAL_ACTION = "placerda/ai-agent-evals@v3-beta" @@ -370,9 +396,9 @@ contract to gate deploys: | `2` | Eval ran, one or more thresholds failed | ❌ fail (deploy never runs) | | `1` | Runtime / config error | ❌ fail | -When `official-ai-agent-evaluation` is selected, the Microsoft action/task owns -the eval job result. AgentOps still uploads the prepared input and metadata so -the release has repo-side proof of what was evaluated. +When Microsoft Foundry AI Agent Evaluation is selected, the Microsoft +action/task owns the eval job result. AgentOps still uploads the prepared input +and metadata so the release has repo-side proof of what was evaluated. ## Artifacts @@ -383,7 +409,7 @@ Each workflow uploads (always - even on failure): - `cloud_evaluation.json` - present when using Foundry cloud evaluation; contains a deep link to the New Foundry Experience Evaluations page - `.agentops/official-eval/input.json`, `metadata.json`, and `result.json` - - present when using the official AI Agent Evaluation runner + present when using Microsoft Foundry AI Agent Evaluation - `evidence.json` and `evidence.md` - present in PR, PROD, and watchdog workflows after `agentops doctor --evidence-pack` diff --git a/docs/concepts.md b/docs/concepts.md index a744023..a572837 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -158,10 +158,10 @@ evidence outputs into a release gate. | Generic HTTP/JSON endpoint | No | Yes | Use local runner. | | Raw model deployment (`model:`) | No | Yes | Use local runner. | -For CI pipelines that only need a supported Foundry-native eval, prefer the -official AI Agent Evaluation action or Azure DevOps extension. Use AgentOps when -the repo also needs thresholds, baselines, local fallback, Doctor readiness, -release evidence, or trace-to-regression review. +For CI pipelines that only need a supported Foundry-native eval, prefer +Microsoft Foundry AI Agent Evaluation. Use AgentOps when the repo also needs +thresholds, baselines, local fallback, Doctor readiness, release evidence, or +trace-to-regression review. ## Evaluation Scenarios diff --git a/docs/how-it-works.md b/docs/how-it-works.md index 207cfda..706331e 100644 --- a/docs/how-it-works.md +++ b/docs/how-it-works.md @@ -15,8 +15,9 @@ is the proof?** It: 5. Writes release evidence with `agentops doctor --evidence-pack`. Foundry owns agent creation, deployment, runtime, traces, monitoring, -red-teaming, datasets, and official evaluation drilldown. AgentOps references -the candidate those tools produced and adds the repo-controlled release proof: +red-teaming, datasets, and Microsoft-hosted evaluation drilldown. AgentOps +references the candidate those tools produced and adds the repo-controlled +release proof: config, gates, artifacts, PR reports, Doctor diagnostics, release evidence, trace-to-regression promotion, and Cockpit links back to Foundry/Azure Monitor. @@ -542,9 +543,9 @@ The `execution: cloud` trade-offs (so you can decide consciously): For CI pipelines that only need a supported Foundry-native eval and do not need AgentOps artifacts, baselines, Doctor readiness, or release evidence, the -official AI Agent Evaluation GitHub Action or Azure DevOps extension may be the -cleaner entry point. AgentOps is the wrapper when the repo needs a release gate -and proof pack around those signals. +Microsoft Foundry AI Agent Evaluation GitHub Action or Azure DevOps extension +may be the cleaner entry point. AgentOps is the wrapper when the repo needs a +release gate and proof pack around those signals. Implementation lives in [src/agentops/pipeline/publisher.py](../src/agentops/pipeline/publisher.py) (Classic) and [src/agentops/pipeline/cloud_runner.py](../src/agentops/pipeline/cloud_runner.py) diff --git a/docs/tutorial-end-to-end.md b/docs/tutorial-end-to-end.md index ebd06da..7bbf2a7 100644 --- a/docs/tutorial-end-to-end.md +++ b/docs/tutorial-end-to-end.md @@ -225,11 +225,21 @@ Answer the prompts as the wizard asks them: | Foundry project endpoint | `https://.services.ai.azure.com/api/projects/` | | Agent | The value in `$env:TRAVEL_AGENT_TARGET`, such as `travel-agent:1` or `http://127.0.0.1:8000/chat` | | Dataset path | `.agentops/data/travel-smoke.jsonl` | -| Application Insights connection string | Paste it if you have one, or press Enter to let AgentOps auto-discover/leave it blank | + +The wizard does not ask for App Insights. Later runtime commands such as eval, +Doctor, and Cockpit use the Foundry project endpoint to ask the Azure AI +Projects SDK for the App Insights resource attached to that Foundry project. If +discovery is unavailable and you want to force a value, run +`agentops init --appinsights-connection-string ""` or set +`APPLICATIONINSIGHTS_CONNECTION_STRING` manually in `.azure/dev/.env`. + +If the first run shows starter defaults such as `Agent [my-agent:1]` or +`Dataset path [.agentops/data/smoke.jsonl]`, replace them with your Travel Agent +target and dataset. Those defaults only come from the scaffolded starter file. The wizard saves `agent` and `dataset` to `agentops.yaml`. It saves the Foundry -project endpoint and App Insights connection string to `.azure/dev/.env`, which -is git-ignored and compatible with azd. +project endpoint to `.azure/dev/.env`, which is git-ignored and compatible with +azd. If you force an App Insights connection string later, it is saved there too. For a hosted HTTP endpoint, add the endpoint protocol fields: @@ -252,13 +262,14 @@ Expected result: | Agent target | Runner | |---|---| -| `agent: name:version` | `official-ai-agent-evaluation` | +| `agent: name:version` | Microsoft Foundry AI Agent Evaluation | | `agent: https://...` | `agentops-local` | | `agent: model:` | `agentops-local` | -This is the key alignment rule. Foundry-native prompt agents use the official -runner where possible. AgentOps keeps the local path for hosted endpoints, -models, unsupported evaluator mappings, and repo-specific threshold evidence. +This is the key alignment rule. Foundry-native prompt agents use the Microsoft +Foundry AI Agent Evaluation action/task where possible. AgentOps keeps the local +path for hosted endpoints, models, unsupported evaluator mappings, and +repo-specific threshold evidence. ## 5. Run the first eval @@ -287,10 +298,10 @@ Before running that workflow, set the CI variable: AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini ``` -This value is not an `agentops init` answer. It tells the official eval runner -which model deployment should judge responses. +This value is not an `agentops init` answer. It tells the Microsoft Foundry AI +Agent Evaluation runner which model deployment should judge responses. -The generated workflow prepares official eval input under: +The generated workflow prepares Microsoft Foundry eval input under: ```text .agentops/official-eval/ @@ -331,8 +342,8 @@ and rerun the same gate. version such as `travel-agent:3`, re-run `agentops init --reconfigure`, and run the pipeline again. -This exercises Foundry prompt versioning, the official AI Agent Evaluation -runner, and AgentOps evidence for the exact version under release review. +This exercises Foundry prompt versioning, Microsoft Foundry AI Agent Evaluation, +and AgentOps evidence for the exact version under release review. ### Hosted/HTTP regression @@ -388,7 +399,8 @@ The generated workflows are intentionally boring: Foundry and Azure Monitor own live observability. AgentOps only checks whether the repo and runtime are wired to those signals. -Set the Application Insights connection string in the active azd env: +If runtime discovery does not find the connected App Insights resource, set the +connection string in the active azd env: ```powershell agentops init show --reveal-secrets @@ -482,7 +494,7 @@ agentops cockpit --workspace . Use Cockpit as the local command center: - Foundry connection and deep links; -- official eval or local eval gate status; +- Microsoft Foundry eval or AgentOps local eval gate status; - Doctor findings; - release evidence; - local eval history; @@ -496,7 +508,8 @@ You are ready for a release review when: - The agent target is explicit in `agentops.yaml`. - CI uses the expected runner for the target. -- Eval results or official eval metadata are attached to the workflow artifact. +- Eval results or Microsoft Foundry eval metadata are attached to the workflow + artifact. - The workshop includes one deliberate regression and one fixed rerun, either through Foundry prompt versions or AgentOps local baseline comparison. - `agentops doctor --evidence-pack` writes `evidence.md`. diff --git a/docs/tutorial-hosted-agent-quickstart.md b/docs/tutorial-hosted-agent-quickstart.md index da049af..804ab1c 100644 --- a/docs/tutorial-hosted-agent-quickstart.md +++ b/docs/tutorial-hosted-agent-quickstart.md @@ -171,7 +171,7 @@ You need: | Foundry project endpoint | optional, but recommended for links and evaluators | | Azure OpenAI endpoint | `https://.openai.azure.com`, used later by local AI-assisted evaluators | | Evaluator model deployment | `gpt-4o-mini`, used later by local AI-assisted evaluators | -| Application Insights connection string | optional, but recommended | +| Application Insights connection string | optional later, for observability | If the deployed endpoint needs a bearer token: @@ -192,7 +192,17 @@ Answer the prompts as the wizard asks them: | Foundry project endpoint | `https://.services.ai.azure.com/api/projects/`, or press Enter if you are only testing the local endpoint | | Agent | The value in `$env:TRAVEL_AGENT_ENDPOINT`, for example `http://127.0.0.1:8000/chat` | | Dataset path | `.agentops/data/travel-smoke.jsonl` | -| Application Insights connection string | Paste it if you have one, or press Enter to let AgentOps auto-discover/leave it blank | + +The wizard does not ask for App Insights. Later runtime commands such as eval, +Doctor, and Cockpit use the Foundry project endpoint to ask the Azure AI +Projects SDK for the App Insights resource attached to that Foundry project. If +discovery is unavailable and you want to force a value, run +`agentops init --appinsights-connection-string ""` or set +`APPLICATIONINSIGHTS_CONNECTION_STRING` manually in `.azure/dev/.env`. + +If the first run shows starter defaults such as `Agent [my-agent:1]` or +`Dataset path [.agentops/data/smoke.jsonl]`, replace them with the hosted Travel +Agent values above. Those defaults only come from the scaffolded starter file. If you want an azd environment name other than the default `dev`, run `agentops init --azd-env `. @@ -208,8 +218,8 @@ request_field: message response_field: text ``` -The Foundry project endpoint and App Insights connection string live in -`.azure/dev/.env`, not in source control. +The Foundry project endpoint lives in `.azure/dev/.env`, not in source control. +If you force an App Insights connection string later, it is saved there too. For a deployed endpoint protected by a bearer token, add: diff --git a/docs/tutorial-prompt-agent-quickstart.md b/docs/tutorial-prompt-agent-quickstart.md index f74e77f..7dee919 100644 --- a/docs/tutorial-prompt-agent-quickstart.md +++ b/docs/tutorial-prompt-agent-quickstart.md @@ -7,7 +7,7 @@ and Cockpit. This path validates the Foundry-native route: -- Foundry owns the prompt agent runtime and official AI Agent Evaluation. +- Foundry owns the prompt agent runtime and Microsoft Foundry AI Agent Evaluation. - AgentOps owns repo-side readiness: `agentops.yaml`, CI gates, Doctor, release evidence, and Cockpit. @@ -106,7 +106,7 @@ You need: |---|---| | Foundry project endpoint | `https://.services.ai.azure.com/api/projects/` | | Prompt agent reference | `travel-agent:1` or the version Foundry published | -| Application Insights connection string | optional, but recommended | +| Application Insights connection string | optional later, for observability | You do not need to set an evaluator deployment before initialization. `agentops init` collects the workspace values. The evaluator deployment is a @@ -125,7 +125,17 @@ Answer the prompts as the wizard asks them: | Foundry project endpoint | `https://.services.ai.azure.com/api/projects/` | | Agent | `travel-agent:1`, or the exact published version from Foundry | | Dataset path | `.agentops/data/travel-smoke.jsonl` | -| Application Insights connection string | Paste it if you have one, or press Enter to let AgentOps auto-discover/leave it blank | + +The wizard does not ask for App Insights. Later runtime commands such as eval, +Doctor, and Cockpit use the Foundry project endpoint to ask the Azure AI +Projects SDK for the App Insights resource attached to that Foundry project. If +discovery is unavailable and you want to force a value, run +`agentops init --appinsights-connection-string ""` or set +`APPLICATIONINSIGHTS_CONNECTION_STRING` manually in `.azure/dev/.env`. + +If the first run shows starter defaults such as `Agent [my-agent:1]` or +`Dataset path [.agentops/data/smoke.jsonl]`, replace them with the Travel Agent +values above. Those defaults only come from the scaffolded starter file. The interactive path is intentional: you see what each value means, and each answer is saved as soon as it validates. If you want an azd environment name @@ -148,8 +158,8 @@ agent: travel-agent:1 dataset: .agentops/data/travel-smoke.jsonl ``` -The Foundry project endpoint and App Insights connection string live in -`.azure/dev/.env`, not in source control. +The Foundry project endpoint lives in `.azure/dev/.env`, not in source control. +If you force an App Insights connection string later, it is saved there too. ## 6. Check the selected eval runner @@ -157,14 +167,19 @@ The Foundry project endpoint and App Insights connection string live in agentops workflow analyze --format text ``` -For `agent: name:version`, AgentOps should recommend: +For `agent: name:version`, AgentOps should recommend the Foundry eval runner: ```text -recommended_eval_runner: official-ai-agent-evaluation +Recommendation + deploy prompt-agent + evaluate Microsoft Foundry AI Agent Evaluation + workflow edits not needed - generated workflow should work as-is + Copilot skills not needed - no Copilot handoff for this project shape ``` -That means generated CI uses the official Microsoft AI Agent Evaluation runner -for the eval step, then uses AgentOps to collect evidence and readiness signals. +That means generated CI uses the Microsoft Foundry AI Agent Evaluation +action/task for the eval step, then uses AgentOps to collect evidence and +readiness signals. ## 7. Generate the PR gate @@ -172,23 +187,57 @@ for the eval step, then uses AgentOps to collect evidence and readiness signals. agentops workflow generate --kinds pr,watchdog --force ``` -Before you run the generated workflow in GitHub Actions or Azure Pipelines, set -the evaluator deployment as a CI variable: +At this point the workflow files exist only on your machine. CI will not run +until the folder is a GitHub repository, pushed, and connected to Azure with +OIDC. + +Recommended path: let Copilot use the installed AgentOps workflow skill as the +guide, because this step crosses repo, GitHub, and Azure permissions. + +Refresh the Copilot skills with AgentOps instead of checking folders manually: + +```powershell +agentops skills install --platform copilot +``` + +Then open Copilot in this repo and run: + +```text +/skills +``` + +Confirm `agentops-workflow` is loaded before continuing. + +When the skill is loaded, paste: ```text -AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini +Use the AgentOps workflow skill to get the generated PR gate and watchdog +workflows running on GitHub Actions for this Foundry prompt-agent project. + +This may be a brand-new folder with no Git repo or GitHub remote yet. Keep the +scope to the PR gate and watchdog only: create or connect the GitHub repo if +needed, wire Azure OIDC and required Actions variables, create only the `dev` +environment, and do not set up `qa`, `production`, or deploy workflows yet. +Show me the plan before changing GitHub or Azure, and call out anything that +needs owner/admin permission. ``` -That variable is not an `agentops init` answer. It tells the official eval -runner which model deployment should judge responses. +For the `pr,watchdog` quickstart, the generated workflows use the `dev` +environment for OIDC and variables. You do **not** need `qa` or `production` +yet; add them when you generate deploy workflows later. + +The workflow skill will copy the needed CI variables from your local +AgentOps/azd configuration into the GitHub `dev` environment. If a value such +as the evaluator model deployment is missing, it will ask you. -The PR workflow should contain the official eval action: +The PR workflow should contain the Microsoft Foundry eval action: ```text microsoft/ai-agent-evals@v3-beta ``` -It also records: +It also records provenance and release-evidence files. The detailed quality +scores stay in Foundry Evaluations: ```text .agentops/official-eval/metadata.json @@ -201,6 +250,17 @@ It also records: This step makes the quickstart more than a happy path. You will intentionally ship a worse prompt, watch the eval gate or metrics move, then recover. +This walkthrough assumes this concrete sequence: + +- `travel-agent:2` is the last good version that already has a green run. +- `travel-agent:3` is the intentionally regressed version you are about to test. +- `travel-agent:4` is the restored version you publish after the regression + test. + +If your Foundry project publishes different numbers because you saved or +published extra times, use the exact `travel-agent:` values shown in +your GitHub summary and Foundry run pages. + 1. In Foundry, edit the `travel-agent` instructions to this intentionally bad version: @@ -209,23 +269,78 @@ ship a worse prompt, watch the eval gate or metrics move, then recover. plans, practical notes, constraints, or booking caveats. ``` -2. Publish it as the next version, for example `travel-agent:2`. +2. Publish it as the next version, for example `travel-agent:3`. 3. Re-run the wizard and update only the agent value: ```powershell agentops init --reconfigure ``` - Keep the same endpoint and dataset, but answer `Agent` with the regressed - version such as `travel-agent:2`. -4. Run the PR workflow, or run the official eval step from your pipeline branch. - In Foundry Evaluations and the workflow summary, compare the new run with the - previous `travel-agent:1` run. The regressed prompt should lose quality - because it no longer satisfies the dataset expectations. + Keep the same endpoint and dataset, but answer `Agent` with + `travel-agent:3`. +4. Create a regression branch, push it, and open a PR to `main`: + + ```powershell + git switch -c feature/regress-travel-agent + git add agentops.yaml + git commit -m "Evaluate regressed travel agent prompt" + git push -u origin feature/regress-travel-agent + gh pr create --base main --head feature/regress-travel-agent --title "Test AgentOps regression gate" --body "Evaluates the intentionally regressed travel-agent prompt." + ``` + + The PR gate reads `agent: travel-agent:3` from `agentops.yaml` in that + branch and evaluates the regressed version. Open the PR in GitHub and watch + **Checks** -> **AgentOps PR / Eval (PR gate)**: + + ```powershell + gh pr view --web + ``` + + Use the GitHub Actions summary as the handoff to Foundry: + + - In GitHub, wait for **Checks** -> **AgentOps PR / Eval (PR gate)** to + finish, then click **Details**. + - Still in GitHub, on the workflow run **Summary**, find **Azure AI + Evaluation**. The table shows the exact regressed Agent ID and its pass + rates. Confirm it says `travel-agent:3`. + - Still in GitHub, click **View run results** in that table. This opens + Foundry in a new page for the regressed agent run. Keep this Foundry page + open and use **Overall metric results** as the quality source of truth; the + GitHub artifact is only provenance. + - Now in Foundry, click the back arrow to return to **Evaluations**. Open the + earlier green run for `travel-agent:2` in another browser tab. + - Compare the two Foundry pages side by side: pass rate and average score in + **Overall metric results**, then the same three rows in **Detailed metrics + result**. If you need row-level explanations, click **Analyze Results** on + each run. + The regressed run should score lower because it no longer returns + day-by-day plans, practical notes, constraints, or booking caveats. + 5. Restore the original Travel Agent instructions from step 1, publish again - as the next version, for example `travel-agent:3`. -6. Re-run `agentops init --reconfigure`, set `Agent` to the fixed version, and - run the pipeline again. + as the next version, for example `travel-agent:4`. +6. Point the repo at the fixed version: + + ```powershell + agentops init --reconfigure + ``` + + Keep the same endpoint and dataset, but answer `Agent` with + `travel-agent:4`. +7. Create a fix branch, push it, and open a PR to `main`: + + ```powershell + git switch main + git pull + git switch -c fix/restore-travel-agent + git add agentops.yaml + git commit -m "Restore travel agent prompt evaluation" + git push -u origin fix/restore-travel-agent + gh pr create --base main --head fix/restore-travel-agent --title "Restore Travel Agent eval target" --body "Points AgentOps at the restored travel-agent prompt version." + ``` + + The new PR gate should evaluate `travel-agent:4`. In the GitHub Actions + summary, click **View run results** and confirm the Foundry metrics recover + relative to the regressed `travel-agent:3` run. The learning loop is the point: Foundry owns prompt versioning and the managed evaluation run; AgentOps keeps the repo pointed at the exact version under @@ -257,15 +372,15 @@ agentops cockpit --workspace . ``` Open the local URL printed by the command. The Cockpit should show Foundry -connection, official eval readiness, Doctor findings, release evidence, CI/CD, -and next actions. +connection, Microsoft Foundry eval readiness, Doctor findings, release +evidence, CI/CD, and next actions. ## Success criteria You are done when: - The Travel Agent exists in Foundry and has a published `travel-agent:` reference. -- `agentops workflow analyze` selects `official-ai-agent-evaluation`. +- `agentops workflow analyze` selects Microsoft Foundry AI Agent Evaluation. - `agentops workflow generate` creates a PR workflow with `microsoft/ai-agent-evals@v3-beta`. - You published a deliberately regressed prompt version, saw the eval/pipeline diff --git a/plugins/agentops/skills/agentops-workflow/SKILL.md b/plugins/agentops/skills/agentops-workflow/SKILL.md index c375a49..6306a1d 100644 --- a/plugins/agentops/skills/agentops-workflow/SKILL.md +++ b/plugins/agentops/skills/agentops-workflow/SKILL.md @@ -1,6 +1,6 @@ --- name: agentops-workflow -description: Set up AgentOps release-readiness workflows: PR eval gates, Doctor/evidence artifacts, and safe deploy handoffs to azd or Foundry prompt-agent tooling. Trigger on "CI", "CD", "pipeline", "workflow", "GitHub Actions", "Azure DevOps", "ADO", "PR gate", "deploy", "environments", "GitFlow", "release branch", "promote to prod", "DevOps", "can we ship". +description: "Set up AgentOps release-readiness workflows: PR eval gates, Doctor/evidence artifacts, and safe deploy handoffs to azd or Foundry prompt-agent tooling. Trigger on CI, CD, pipeline, workflow, GitHub Actions, Azure DevOps, ADO, PR gate, deploy, environments, GitFlow, release branch, promote to prod, DevOps, can we ship." --- # AgentOps Workflow @@ -39,6 +39,80 @@ parallel deployment system. AgentOps should gate quality and record proof; `azd provision`, `azd deploy`, azd hooks, Foundry Toolkit, the `microsoft-foundry` skill, and project tooling own lifecycle actions. +## Fast path - generated GitHub setup + +Use this path when the user already generated GitHub workflows or asks to get +the PR gate/watchdog running. Stay local-first and deterministic; do not start +by discovering the whole Azure subscription. + +1. Inspect the repo before cloud discovery: + - `agentops init show --dir .` without `--reveal-secrets`. + - `agentops.yaml`. + - `.azure/config.json`, then the active `.azure//.env`. + - `azd env get-values` when `azure.yaml` exists and azd is available. + - `.github/workflows/agentops-*.yml`. +2. Read the generated workflows to determine exactly which GitHub environments + and variables are needed. For the prompt-agent quickstart, `pr,watchdog` + normally means only `environment: dev`. +3. Treat `dev` here as a GitHub Actions environment for OIDC and variables. It + normally points at the Foundry project already configured by `agentops init`; + it does not require creating a new Foundry project. +4. Proceed only when these values are known or deliberately chosen: + - GitHub `owner/repo`. + - workflow environment names from `jobs.*.environment`. + - `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_SUBSCRIPTION_ID`. + - `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`. + - `AZURE_OPENAI_DEPLOYMENT`. + - optional `APPLICATIONINSIGHTS_CONNECTION_STRING`. +5. Prefer existing values and exact checks: + - `git remote get-url origin` and `gh repo view --json nameWithOwner`. + - `gh variable list --env ` and `gh secret list --env `. + - `agentops init show`, local `.azure//.env`, and `azd env get-values` + values before `az account show`. + - `az account show` only as a proposal for tenant/subscription; confirm + before writing it to GitHub variables. +6. Copy CI variables from local AgentOps/azd configuration into the GitHub + environment used by the workflow. Reuse local values for + `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`, `AZURE_OPENAI_ENDPOINT`, + `AZURE_OPENAI_DEPLOYMENT`, and optional + `APPLICATIONINSIGHTS_CONNECTION_STRING` instead of asking the user to type + them again. Explain `AZURE_OPENAI_DEPLOYMENT` only if it is missing: it is + the Azure OpenAI deployment used as the evaluator/judge model, not the + user's agent. +7. Do not enumerate subscriptions, Foundry projects, Azure OpenAI resources, or + model deployments to guess missing values. If `AZURE_SUBSCRIPTION_ID`, + `AZURE_TENANT_ID`, `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`, or + `AZURE_OPENAI_DEPLOYMENT` is absent from AgentOps/azd/local env, ask the user + to choose or provide it. Only run a scoped Azure query after the user confirms + the subscription and the exact missing value. +8. For GitHub OIDC, derive the federated credential subject from the generated + workflow. If the job has `environment: dev`, the subject is normally + `repo:/:environment:dev`. Do not assume branch or + `pull_request` subjects without reading the workflow. +9. Ask before creating or updating GitHub repos, GitHub environments, + variables/secrets, Entra app registrations/service principals, federated + credentials, managed identities, or Azure RBAC assignments. +10. When creating federated credentials from PowerShell, avoid fragile + interpolation. Do **not** write `"repo:$repo:environment:$envName"` because + `$repo:` can be parsed as a scoped variable. Use + `"repo:${repo}:environment:${envName}"` or + `("repo:{0}:environment:{1}" -f $repo, $envName)`, then build JSON from a + PowerShell object with `ConvertTo-Json`. +11. After creating or updating a federated credential, read it back and verify + before triggering a workflow: + - `subject` exactly matches the generated workflow subject. + - `issuer` is `https://token.actions.githubusercontent.com`. + - `audiences` includes `api://AzureADTokenExchange`. + If any value differs, fix the credential before running GitHub Actions. +12. Do not dispatch `gh workflow run` as a surprise validation step. First show + that the GitHub environment, variables/secrets, federated credential, and + Azure RBAC are ready, then ask the user before triggering workflows. +13. Avoid broad discovery unless local config is missing. Do **not** run broad + `az resource list`, `az graph query`, SDK inspection, or web search to find + the Foundry project when `agentops.yaml` or `.azure//.env` already has + `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`. If the endpoint is missing, say exactly + what is missing and ask the user before scanning the subscription. + ## Branch model assumed ``` @@ -126,21 +200,18 @@ Useful flags: ### GitHub Actions -Walk the user through Settings → Environments and create three: +Read the generated workflow files and create only the GitHub Environments used +by `jobs.*.environment`. For `pr,watchdog`, that is usually only **`dev`**. For +the full scaffold, create **`dev`**, **`qa`**, and **`production`**. -1. **`dev`** - no extra protection. Set any DEV-specific variables here - (e.g. `ACA_APP_NAME`, `AZURE_RESOURCE_GROUP` pointing at the dev RG). -2. **`qa`** - usually no required reviewers, but isolated variables for - the QA environment. -3. **`production`** - set: - - **Required reviewers**: at least one (deploys to PROD will pause - here until approved). - - (Optional) **Wait timer** for an extra delay. - - (Optional) **Deployment branches**: restrict to `main`. - - PROD-specific variables (e.g. production resource group). +- **`dev`** - no extra protection. Store the OIDC variables here when the + generated jobs use `environment: dev`. +- **`qa`** - usually no required reviewers, but isolated variables for QA. +- **`production`** - set required reviewers, optional wait timer, optional + deployment branch restriction to `main`, and production-specific variables. -Tell the user that env-specific variables on the `production` environment -will override repo-level ones automatically inside the prod workflow. +Tell the user that environment-level variables override repository-level ones +inside jobs that declare that environment. ### Azure DevOps @@ -169,14 +240,17 @@ so the PR-comment step can post. ### GitHub Actions (OIDC) -At repository level (Settings → Secrets and variables → Actions → -**Variables** tab), set: +At the GitHub Environment level when the workflow declares an environment +(preferred for the quickstart), or at repository level when intentionally shared +across environments, set: - `AZURE_CLIENT_ID` - App registration / managed identity used for OIDC. - `AZURE_TENANT_ID` - `AZURE_SUBSCRIPTION_ID` - `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT` - Foundry project URL used by the eval step. +- `AZURE_OPENAI_DEPLOYMENT` - existing Azure OpenAI deployment used as the + evaluator/judge model. Reuse the local AgentOps/azd value when available. - `APPLICATIONINSIGHTS_CONNECTION_STRING` - optional fallback as a variable or secret. Generated workflows first try to auto-discover App Insights from the Foundry project endpoint; this value makes eval and diff --git a/pyproject.toml b/pyproject.toml index 3605f8c..1765d1d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -80,3 +80,4 @@ local_scheme = "no-local-version" [tool.mypy] plugins = ["pydantic.mypy"] +disable_error_code = ["import-untyped"] diff --git a/src/agentops/agent/checks/errors.py b/src/agentops/agent/checks/errors.py index 6eeed8c..dba8c0d 100644 --- a/src/agentops/agent/checks/errors.py +++ b/src/agentops/agent/checks/errors.py @@ -266,10 +266,10 @@ def _check_no_runtime_telemetry( summary=summary, recommendation=( "Configure `sources.azure_monitor.app_insights_resource_id` " - "in `.agentops/agent.yaml`, install the `[agent]` extra, " - "and connect Azure Monitor OpenTelemetry on the agent " - "runtime (set `APPLICATIONINSIGHTS_CONNECTION_STRING` " - "and call `configure_azure_monitor()` on startup). " + "or set `APPLICATIONINSIGHTS_CONNECTION_STRING` with an " + "`ApplicationId`, install the `[agent]` extra, and connect " + "Azure Monitor OpenTelemetry on the agent runtime " + "(call `configure_azure_monitor()` on startup). " "See `docs/tutorial-end-to-end.md` -> " "'Wire observability'." ), diff --git a/src/agentops/agent/cockpit.py b/src/agentops/agent/cockpit.py index 1b260c5..ee08eb8 100644 --- a/src/agentops/agent/cockpit.py +++ b/src/agentops/agent/cockpit.py @@ -22,7 +22,7 @@ import subprocess from importlib.resources import files as _pkg_files from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple +from typing import Any, Dict, List, Optional, Tuple, cast from urllib.parse import quote from agentops.agent.history import AnalysisRecord, load_analysis_history @@ -2494,8 +2494,18 @@ def _release_evidence_status(workspace: Path) -> Dict[str, Any]: if not payload: return {"status": "unreadable", "path": path} - latest_eval = payload.get("latest_eval") if isinstance(payload.get("latest_eval"), dict) else {} - official_eval = payload.get("official_eval") if isinstance(payload.get("official_eval"), dict) else {} + latest_eval_raw = payload.get("latest_eval") + latest_eval = ( + cast(Dict[str, Any], latest_eval_raw) + if isinstance(latest_eval_raw, dict) + else {} + ) + official_eval_raw = payload.get("official_eval") + official_eval = ( + cast(Dict[str, Any], official_eval_raw) + if isinstance(official_eval_raw, dict) + else {} + ) return { "status": payload.get("status") or "unknown", "path": path, @@ -2878,9 +2888,9 @@ def _sparkline_svg( return "" window = series[-12:] label_window = (labels or [])[-12:] - link_window = (links or [])[-12:] - alt_link_window = (alt_links or [])[-12:] - alt_label_window = (alt_labels or [])[-12:] + link_window: List[Optional[str]] = list((links or [])[-12:]) + alt_link_window: List[Optional[str]] = list((alt_links or [])[-12:]) + alt_label_window: List[Optional[str]] = list((alt_labels or [])[-12:]) # Align label/link count with the window. if len(label_window) < len(window): label_window = label_window + [""] * (len(window) - len(label_window)) diff --git a/src/agentops/agent/sources/azure_monitor.py b/src/agentops/agent/sources/azure_monitor.py index ca360f0..f3eb760 100644 --- a/src/agentops/agent/sources/azure_monitor.py +++ b/src/agentops/agent/sources/azure_monitor.py @@ -7,9 +7,13 @@ from __future__ import annotations +import json import logging +import os +import re from dataclasses import dataclass, field from typing import Any, Dict, List, Optional +from urllib import error, request from agentops.agent.config import AzureMonitorSourceConfig @@ -107,11 +111,24 @@ def collect_azure_monitor( return AzureMonitorPayload(diagnostics=diagnostics) if not config.app_insights_resource_id and not config.log_analytics_workspace_id: + application_id, source, reason = _resolve_application_id() + if application_id: + diagnostics["target"] = application_id + diagnostics["target_kind"] = "application_id" + diagnostics["target_source"] = source + return _collect_application_insights_by_app_id( + application_id, + lookback_days, + diagnostics, + ) diagnostics["status"] = "skipped" diagnostics["reason"] = ( "neither app_insights_resource_id nor log_analytics_workspace_id " - "is configured" + "is configured, and no App Insights ApplicationId could be " + "discovered from the connection string or Foundry project" ) + if reason: + diagnostics["discovery_reason"] = reason return AzureMonitorPayload(diagnostics=diagnostics) try: @@ -132,7 +149,10 @@ def collect_azure_monitor( diagnostics["target"] = workspace_or_resource try: - credential = DefaultAzureCredential(exclude_developer_cli_credential=True, process_timeout=30) + credential = DefaultAzureCredential( + exclude_developer_cli_credential=True, + process_timeout=30, + ) client = LogsQueryClient(credential) kql = _REQUESTS_KQL.format(lookback_days=int(lookback_days)) if config.log_analytics_workspace_id: @@ -302,3 +322,202 @@ def collect_azure_monitor( log.info("Rate-limit KQL probe failed (non-fatal): %s", exc) return payload + + +def _resolve_application_id() -> tuple[Optional[str], Optional[str], Optional[str]]: + """Resolve an App Insights ApplicationId for REST API queries.""" + last_reason: Optional[str] = None + for env_name in ( + "APPLICATIONINSIGHTS_CONNECTION_STRING", + "AGENTOPS_APPLICATIONINSIGHTS_CONNECTION_STRING", + ): + connection_string = os.getenv(env_name) + if not connection_string: + continue + application_id = _extract_application_id(connection_string) + if application_id: + return application_id, env_name, None + last_reason = f"{env_name} has no ApplicationId segment" + + try: + from agentops.utils.foundry_discovery import ( + resolve_appinsights_connection_from_env_with_reason, + ) + + connection_string, reason = resolve_appinsights_connection_from_env_with_reason() + except Exception as exc: # noqa: BLE001 + return None, None, f"Foundry App Insights discovery failed: {exc}" + + if connection_string: + application_id = _extract_application_id(connection_string) + if application_id: + return application_id, "foundry_project_telemetry", None + return ( + None, + None, + "Foundry App Insights connection string has no ApplicationId segment", + ) + return None, None, reason or last_reason + + +def _extract_application_id(connection_string: Optional[str]) -> Optional[str]: + if not connection_string: + return None + match = re.search(r"(?:^|;)ApplicationId=([^;]+)", connection_string) + return match.group(1).strip() if match else None + + +def _collect_application_insights_by_app_id( + application_id: str, + lookback_days: int, + diagnostics: Dict[str, Any], +) -> AzureMonitorPayload: + """Query App Insights by ApplicationId when no ARM resource id is configured.""" + try: + bearer = _acquire_application_insights_token() + except ImportError as exc: + diagnostics["status"] = "skipped" + diagnostics["reason"] = "azure-identity not installed (install agentops-toolkit[agent])" + log.info("azure-identity unavailable: %s", exc) + return AzureMonitorPayload(diagnostics=diagnostics) + except Exception as exc: # pragma: no cover - network / auth errors + diagnostics["status"] = "error" + diagnostics["reason"] = str(exc) + log.warning("App Insights token acquisition failed: %s", exc) + return AzureMonitorPayload(diagnostics=diagnostics) + + payload = AzureMonitorPayload(diagnostics=diagnostics) + kql = _REQUESTS_KQL.format(lookback_days=int(lookback_days)) + summary = _query_application_insights(application_id, bearer, kql) + if summary is None: + diagnostics["status"] = "error" + diagnostics["reason"] = "Application Insights query failed" + return payload + + diagnostics["status"] = "ok" + row = _first_rest_row(summary) + if row: + _apply_summary_row(payload, row) + + try: + safety_kql = _SAFETY_KQL.format(lookback_days=int(lookback_days)) + safety = _query_application_insights(application_id, bearer, safety_kql) + if safety is None: + diagnostics["safety_status"] = "error" + diagnostics["safety_reason"] = "query failed" + else: + hits = int((_first_rest_row(safety) or {}).get("hits", 0) or 0) + diagnostics["safety_status"] = "ok" + diagnostics["safety_hits"] = hits + if hits > 0: + payload.safety_violations.append( + {"signal": "content_filter", "hits": hits} + ) + except Exception as exc: # pragma: no cover - best effort + diagnostics["safety_status"] = "error" + diagnostics["safety_reason"] = str(exc) + log.info("Safety App Insights probe failed (non-fatal): %s", exc) + + try: + token_kql = _TOKEN_USAGE_KQL.format(lookback_days=int(lookback_days)) + token = _query_application_insights(application_id, bearer, token_kql) + if token is None: + diagnostics["token_status"] = "error" + diagnostics["token_reason"] = "query failed" + else: + token_row = _first_rest_row(token) or {} + in_t = token_row.get("input_tokens") + out_t = token_row.get("output_tokens") + payload.input_token_count = int(in_t) if in_t is not None else 0 + payload.output_token_count = int(out_t) if out_t is not None else 0 + diagnostics["token_status"] = "ok" + except Exception as exc: # pragma: no cover - best effort + diagnostics["token_status"] = "error" + diagnostics["token_reason"] = str(exc) + log.info("Token-usage App Insights probe failed (non-fatal): %s", exc) + + try: + rl_kql = _RATE_LIMIT_KQL.format(lookback_days=int(lookback_days)) + rate_limit = _query_application_insights(application_id, bearer, rl_kql) + if rate_limit is None: + diagnostics["rate_limit_status"] = "error" + diagnostics["rate_limit_reason"] = "query failed" + else: + hits = int((_first_rest_row(rate_limit) or {}).get("hits", 0) or 0) + payload.rate_limit_429_count = hits + diagnostics["rate_limit_status"] = "ok" + diagnostics["rate_limit_hits"] = hits + except Exception as exc: # pragma: no cover - best effort + diagnostics["rate_limit_status"] = "error" + diagnostics["rate_limit_reason"] = str(exc) + log.info("Rate-limit App Insights probe failed (non-fatal): %s", exc) + + return payload + + +def _acquire_application_insights_token() -> str: + from azure.identity import DefaultAzureCredential + + credential = DefaultAzureCredential( + exclude_developer_cli_credential=True, + process_timeout=30, + ) + token = credential.get_token("https://api.applicationinsights.io/.default") + return token.token + + +def _query_application_insights( + application_id: str, + bearer: str, + kql: str, +) -> Optional[Dict[str, Any]]: + body = json.dumps({"query": kql}).encode("utf-8") + req = request.Request( + url=f"https://api.applicationinsights.io/v1/apps/{application_id}/query", + data=body, + headers={ + "Authorization": f"Bearer {bearer}", + "Content-Type": "application/json", + }, + method="POST", + ) + try: + with request.urlopen(req, timeout=10) as resp: # noqa: S310 + parsed = json.loads(resp.read()) + except (error.URLError, ValueError, KeyError) as exc: + log.debug("App Insights REST query failed: %s", exc) + return None + + if isinstance(parsed, dict) and parsed.get("error"): + err = parsed["error"] + msg = err.get("message") if isinstance(err, dict) else str(err) + log.debug("App Insights REST query reported error: %s", msg) + return None + return parsed + + +def _first_rest_row(result: Optional[Dict[str, Any]]) -> Optional[Dict[str, Any]]: + if not result: + return None + tables = result.get("tables") or [] + if not tables: + return None + table = tables[0] + columns = [col.get("name") for col in (table.get("columns") or [])] + rows = table.get("rows") or [] + if not rows: + return None + return dict(zip(columns, rows[0])) + + +def _apply_summary_row(payload: AzureMonitorPayload, data: Dict[str, Any]) -> None: + payload.request_count = int(data.get("request_count", 0) or 0) + payload.error_count = int(data.get("error_count", 0) or 0) + avg_ms = data.get("avg_duration_ms") + p95_ms = data.get("p95_duration_ms") + if avg_ms is not None: + payload.avg_duration_seconds = float(avg_ms) / 1000.0 + if p95_ms is not None: + payload.p95_duration_seconds = float(p95_ms) / 1000.0 + if payload.request_count > 0: + payload.error_rate = payload.error_count / payload.request_count diff --git a/src/agentops/agent/sources/results_history.py b/src/agentops/agent/sources/results_history.py index 759d348..0dd53ba 100644 --- a/src/agentops/agent/sources/results_history.py +++ b/src/agentops/agent/sources/results_history.py @@ -13,7 +13,7 @@ from dataclasses import dataclass, field from datetime import datetime, timezone from pathlib import Path -from typing import Any, Dict, List, Optional +from typing import Any, Dict, Iterable, List, Optional from agentops.agent.config import FoundryControlSourceConfig, ResultsHistorySourceConfig @@ -435,7 +435,7 @@ def _extract_item_scores(item: Dict[str, Any]) -> Dict[str, float]: scores: Dict[str, float] = {} results = item.get("results") if isinstance(results, dict): - iterator = results.items() + iterator: Iterable[tuple[Any, Any]] = results.items() elif isinstance(results, list): iterator = ((_get_dict(entry, "name") or _get_dict(entry, "key") or "", entry) for entry in results) else: diff --git a/src/agentops/cli/app.py b/src/agentops/cli/app.py index c0eea11..4dfc270 100644 --- a/src/agentops/cli/app.py +++ b/src/agentops/cli/app.py @@ -130,6 +130,25 @@ def _cli_value(text: str) -> str: return text +def _workflow_eval_runner_label(eval_runner: str) -> str: + if eval_runner == "official-ai-agent-evaluation": + return "Microsoft Foundry AI Agent Evaluation" + if eval_runner == "agentops-local": + return "AgentOps local eval" + return eval_runner + + +def _workflow_environment_names(kinds: list[str]) -> list[str]: + environments: list[str] = [] + if any(kind in kinds for kind in ("pr", "watchdog", "dev")): + environments.append("dev") + if "qa" in kinds: + environments.append("qa") + if "prod" in kinds: + environments.append("production") + return environments + + def _colorize_analysis_text(text: str) -> str: """Apply restrained terminal color to text analysis output only.""" lines: list[str] = [] @@ -139,24 +158,54 @@ def _colorize_analysis_text(text: str) -> str: if not stripped: lines.append(line) continue + if stripped in {"Warnings", "Warnings:"}: + section = "Warnings" + lines.append(_cli_warn(line)) + continue if stripped in { "AgentOps eval analysis", "AgentOps trace-to-dataset preview", "AgentOps workflow analysis", + "Workflow decision checklist:", + "Recommendation", + "Readiness", "Detected signals:", + "Signals", + "Foundry eval checks:", + "Foundry eval", "Recommended skills:", + "Recommended skills", "Copilot handoff:", + "Copilot handoff", "Recommended commands:", + "Commands", "Pipeline stages:", + "Pipeline plan", "Next steps:", + "Next", "Sample rows:", + "Sample rows", + "Summary", }: section = stripped.rstrip(":") lines.append(_cli_heading(line)) continue - if stripped == "Warnings:": - section = "Warnings" - lines.append(_cli_warn(line)) + if section == "Commands" and stripped.startswith("agentops "): + prefix = line[: len(line) - len(line.lstrip())] + lines.append(f"{prefix}{_cli_command(stripped)}") + continue + status = stripped.split(maxsplit=1)[0].lower() if stripped else "" + if status in {"ok", "hint", "todo", "warn"}: + prefix = line[: len(line) - len(line.lstrip())] + body = line[len(prefix) :] + rest = body[len(status) :] + if status == "ok": + rendered_status = _cli_ok(body[: len(status)]) + elif status == "warn": + rendered_status = _cli_warn(body[: len(status)]) + else: + rendered_status = style(body[: len(status)], "bold", "yellow") + lines.append(f"{prefix}{rendered_status}{rest}") continue if stripped.startswith("- "): bullet = style("-", "dim") @@ -283,7 +332,7 @@ class ExplainPage: "scaffolds `agentops.yaml` plus the `.agentops/` starter files, " "ensures a `.azure/` directory shaped the way `azd` expects, and " "runs an azd-style question loop that fills in project endpoint, " - "agent, dataset, and Application Insights connection string.", + "agent, and dataset.", "Every answer is persisted as soon as it is validated, so a " "Ctrl+C mid-wizard never loses values that were already entered. " "Re-running `agentops init` is idempotent: questions whose values " @@ -304,9 +353,12 @@ class ExplainPage: "`.azure//.env`, and the process environment. Each question " "shows the current value as its default; pressing Enter keeps it.", "Persists `agent` and `dataset` to `agentops.yaml` (declarative, " - "version-controlled). Persists the Foundry project endpoint and " - "Application Insights connection string to `.azure//.env` " - "(git-ignored). Canonical Azure variable names " + "version-controlled). Persists the Foundry project endpoint to " + "`.azure//.env` (git-ignored). App Insights is not asked in " + "the wizard; runtime commands try to discover the Foundry project's " + "attached resource through the Azure AI Projects SDK, and " + "`--appinsights-connection-string` remains available when you need " + "to force a value explicitly. Canonical Azure variable names " "(`AZURE_*`, `APPLICATIONINSIGHTS_*`) are preserved so Azure SDKs " "and azd templates read them directly. Only AgentOps-specific " "knobs use the `AGENTOPS_` prefix.", @@ -810,20 +862,20 @@ def _extend_text_section( lines.extend(_manual_item_lines(prefix, item)) -_ASCII_TRANSLITERATION = { - "\u2014": "-", # em dash - "\u2013": "-", # en dash - "\u2212": "-", # minus sign - "\u2018": "'", # left single quote - "\u2019": "'", # right single quote - "\u201c": '"', # left double quote - "\u201d": '"', # right double quote - "\u2026": "...", # horizontal ellipsis - "\u00a0": " ", # non-breaking space - "\u2192": "->", # rightwards arrow - "\u2197": "^", # north-east arrow - "\u2022": "*", # bullet - "\u00b7": "*", # middle dot +_ASCII_TRANSLITERATION: dict[int, str] = { + ord("\u2014"): "-", # em dash + ord("\u2013"): "-", # en dash + ord("\u2212"): "-", # minus sign + ord("\u2018"): "'", # left single quote + ord("\u2019"): "'", # right single quote + ord("\u201c"): '"', # left double quote + ord("\u201d"): '"', # right double quote + ord("\u2026"): "...", # horizontal ellipsis + ord("\u00a0"): " ", # non-breaking space + ord("\u2192"): "->", # rightwards arrow + ord("\u2197"): "^", # north-east arrow + ord("\u2022"): "*", # bullet + ord("\u00b7"): "*", # middle dot } @@ -837,7 +889,7 @@ def _downgrade_to_ascii(text: str) -> str: """ if not text: return text - return text.translate(str.maketrans(_ASCII_TRANSLITERATION)) + return text.translate(_ASCII_TRANSLITERATION) def _emit_manual_output( @@ -1085,7 +1137,10 @@ def cmd_init( bool, typer.Option( "--no-appinsights", - help="Skip the Application Insights question.", + help=( + "Deprecated no-op; App Insights is no longer asked in the " + "interactive wizard." + ), ), ] = False, project_endpoint: Annotated[ @@ -1136,10 +1191,12 @@ def cmd_init( evaluate, observe, and analyze a Foundry agent. ``agent`` and ``dataset`` land in ``agentops.yaml`` (version- - controlled). The Foundry project endpoint and the App Insights - connection string land in the active ``.azure//.env`` file - (git-ignored) — the same file ``azd`` already manages — so a single - source of truth feeds Doctor, the Cockpit, and ``agentops eval run``. + controlled). The Foundry project endpoint lands in the active + ``.azure//.env`` file (git-ignored) — the same file ``azd`` + already manages — so a single source of truth feeds Doctor, the + Cockpit, and ``agentops eval run``. App Insights can be supplied + explicitly with ``--appinsights-connection-string`` if runtime discovery + is not enough. The wizard persists each answer immediately as it is validated, so a Ctrl+C mid-wizard never discards what the user already entered. @@ -1150,7 +1207,6 @@ def cmd_init( from agentops.services.initializer import initialize_flat_workspace from agentops.services.setup_wizard import ( AGENT_TITLE, - APPINSIGHTS_TITLE, DATASET_TITLE, PROJECT_ENDPOINT_TITLE, WizardAnswers, @@ -1180,10 +1236,11 @@ def cmd_init( raise typer.Exit(code=1) from exc log.debug( - "cmd_init called force=%s dir=%s no_prompt=%s any_flag=%s", + "cmd_init called force=%s dir=%s no_prompt=%s no_appinsights=%s any_flag=%s", force, workspace, no_prompt, + no_appinsights, any( v is not None for v in (project_endpoint, agent, dataset, appinsights_connection_string) @@ -1224,6 +1281,12 @@ def cmd_init( for skipped in result.skipped_files: typer.echo(_cli_skipped(skipped)) + config_path = workspace / "agentops.yaml" + config_seeded_this_run = ( + config_path in result.created_files + or config_path in result.overwritten_files + ) + # ----- Phase 2: ensure a .azure// baseline exists ---------------- target_env_name = azd_env_name or "dev" if azd_env_name: @@ -1328,13 +1391,15 @@ def cmd_init( # Only show the prompt hint when at least one question will actually # be asked. When everything is already configured (idempotent re-run), # the wizard emits compact confirmation lines instead. - will_prompt = reconfigure or any( + force_prompt_fields = {"agent", "dataset"} if config_seeded_this_run else set() + prompt_values = [ + defaults.project_endpoint, + defaults.agent, + defaults.dataset, + ] + will_prompt = reconfigure or bool(force_prompt_fields) or any( v is None or not str(v).strip() - for v in ( - defaults.project_endpoint, - defaults.agent, - defaults.dataset, - ) + for v in prompt_values ) if will_prompt: typer.echo(style("Press Enter to accept the value in brackets.", "dim")) @@ -1349,7 +1414,6 @@ def _prompt(question: str, default: Optional[str]) -> str: PROJECT_ENDPOINT_TITLE, AGENT_TITLE, DATASET_TITLE, - APPINSIGHTS_TITLE, } def _on_answer(field_name: str, value: str) -> None: @@ -1382,9 +1446,9 @@ def _wizard_echo(msg: str) -> None: workspace, prompt=_prompt, echo=_wizard_echo, - include_appinsights=not no_appinsights, on_answer=_on_answer, reconfigure=reconfigure, + force_prompt_fields=force_prompt_fields, ) # ----- Phase 4: apply (idempotent — covers scripted mode and any @@ -2169,13 +2233,17 @@ def cmd_workflow_generate( raise typer.Exit(code=1) from exc typer.echo(f"{_cli_label('Platform')}: {result.platform}") - deploy_mode_note = ( - f"{result.deploy_mode} (auto default)" - if deploy_mode == "auto" - else result.deploy_mode - ) + deploy_kinds = [kind for kind in result.kinds if kind in {"dev", "qa", "prod"}] + deploy_mode_note = result.deploy_mode + if deploy_mode == "auto": + deploy_mode_note = f"{deploy_mode_note} (auto default)" + if not deploy_kinds: + deploy_mode_note = f"{deploy_mode_note}; used only by deploy workflows" typer.echo(f"{_cli_label('Deploy mode')}: {_cli_value(deploy_mode_note)}") - typer.echo(f"{_cli_label('Eval runner')}: {_cli_value(result.eval_runner)}") + typer.echo( + f"{_cli_label('Eval runner')}: " + f"{_cli_value(_workflow_eval_runner_label(result.eval_runner))}" + ) for created in result.created_files: typer.echo(_cli_created(created)) for overwritten in result.overwritten_files: @@ -2185,59 +2253,90 @@ def cmd_workflow_generate( if result.created_files or result.overwritten_files: typer.echo("") - typer.echo(_cli_heading("Next steps:")) + typer.echo(_cli_heading("Next")) if result.platform == "github": + environments = _workflow_environment_names(result.kinds) + typer.echo(" repo publish this folder before CI can run") + typer.echo(" If this is not a GitHub repo yet:") + typer.echo(f" {_cli_command('git init')}") + typer.echo(f" {_cli_command('git add .')}") typer.echo( - " 1. Configure Azure Workload Identity Federation (OIDC) and set " - "repository variables AZURE_CLIENT_ID, AZURE_TENANT_ID, " - "AZURE_SUBSCRIPTION_ID, AZURE_AI_FOUNDRY_PROJECT_ENDPOINT, " - "AZURE_OPENAI_DEPLOYMENT." + " " + + _cli_command('git commit -m "Add AgentOps workflows"') ) typer.echo( - " 2. Create three GitHub Environments: 'dev', 'qa', 'production'. " - "Add required reviewers to 'production'." + f" {_cli_command('gh repo create --source . --private --push')}" ) - else: + typer.echo(" Copilot smoother path: use the AgentOps workflow skill") typer.echo( - " 1. Create the Azure DevOps service connection " - "'agentops-azure' and a variable group named 'agentops'." + f" {_cli_command('agentops skills install --platform copilot')}" ) + typer.echo(" In Copilot, run /skills and confirm agentops-workflow loaded.") typer.echo( - " 2. Create Azure DevOps Environments: 'dev', 'qa', " - "'production'. Add approval checks to 'production'." + " Ask it to wire GitHub, Azure OIDC, variables, " + "environments, and branch rules." ) - if result.deploy_mode == "azd": typer.echo( - " 3. Confirm azure.yaml, infra/, and azd hooks are committed. " - "The deploy workflows delegate provision/deploy to azd." + " CI vars AZURE_CLIENT_ID, AZURE_TENANT_ID, " + "AZURE_SUBSCRIPTION_ID" ) typer.echo( - " Set AZURE_ENV_NAME and AZURE_LOCATION per environment if " - "your azd env names differ from dev/qa/production." + " AZURE_AI_FOUNDRY_PROJECT_ENDPOINT, " + "AZURE_OPENAI_DEPLOYMENT" ) + if environments: + typer.echo( + " envs create GitHub environment" + f"{'' if len(environments) == 1 else 's'}: " + f"{', '.join(environments)}" + ) + if "production" in environments: + typer.echo( + " add required reviewers to production before " + "enabling prod deploys" + ) + else: + environments = _workflow_environment_names(result.kinds) + typer.echo(" repo publish this folder before pipelines can run") + typer.echo(" service create service connection: agentops-azure") + typer.echo(" vars create variable group: agentops") + if environments: + typer.echo( + " envs create Azure DevOps environment" + f"{'' if len(environments) == 1 else 's'}: " + f"{', '.join(environments)}" + ) + if "production" in environments: + typer.echo(" add approval checks to production") + if result.deploy_mode == "azd": + if deploy_kinds: + typer.echo(" azd commit azure.yaml, infra/, and azd hooks") + typer.echo( + " set AZURE_ENV_NAME/AZURE_LOCATION if env names differ" + ) elif result.deploy_mode == "prompt-agent": - typer.echo( - " 3. Commit a prompt/instructions file and set `prompt_file` " - "in agentops.yaml (or AGENTOPS_AGENT_PROMPT_FILE in CI)." - ) - typer.echo( - " The deploy workflow stages a Foundry prompt-agent " - "candidate, evaluates that exact version, then records it " - "as deployed when the gate passes." - ) + if deploy_kinds: + typer.echo(" prompt commit a prompt/instructions file") + typer.echo( + " set prompt_file or AGENTOPS_AGENT_PROMPT_FILE in CI" + ) + typer.echo( + " deploy evaluates that exact candidate version first" + ) + else: + typer.echo(" deploy not needed yet; PR/watchdog can run first") + typer.echo( + " add deploy workflows when you are ready to deploy" + ) else: - typer.echo( - " 3. No azure.yaml was detected. Ask your coding agent to " - "generate a zero-trust azd deployment, or use " - "`--deploy-mode prompt-agent` for a Foundry prompt agent." - ) - typer.echo( - " 4. In Settings -> Branches, require the 'AgentOps PR' status check " - "on develop and main." - ) - typer.echo( - " 5. Commit and push. See docs/ci-github-actions.md for the full guide." - ) + if deploy_kinds: + typer.echo(" deploy placeholder workflows need project-specific edits") + typer.echo( + " ask your coding agent to wire azd or prompt-agent deploy" + ) + if "pr" in result.kinds: + typer.echo(" gate after the first run, require the AgentOps PR check") + typer.echo(" guide docs/ci-github-actions.md") elif result.skipped_files: typer.echo(_cli_warn("No files written. Use --force to overwrite existing workflows.")) @@ -3785,15 +3884,15 @@ def _read_single_key() -> str: try: import msvcrt # type: ignore[import-not-found] - ch = msvcrt.getch() - if ch in (b"\xe0", b"\x00"): + ch_win = msvcrt.getch() # type: ignore[attr-defined] + if ch_win in (b"\xe0", b"\x00"): try: - msvcrt.getch() + msvcrt.getch() # type: ignore[attr-defined] except Exception: # noqa: BLE001 pass return "" try: - return ch.decode("utf-8", errors="ignore") + return ch_win.decode("utf-8", errors="ignore") except Exception: # noqa: BLE001 return "" except Exception: # noqa: BLE001 @@ -3803,13 +3902,13 @@ def _read_single_key() -> str: import tty # type: ignore[import-not-found] fd = sys.stdin.fileno() - old = termios.tcgetattr(fd) + old = termios.tcgetattr(fd) # type: ignore[attr-defined] try: - tty.setraw(fd) - ch = sys.stdin.read(1) + tty.setraw(fd) # type: ignore[attr-defined] + ch_posix = sys.stdin.read(1) finally: - termios.tcsetattr(fd, termios.TCSADRAIN, old) - return ch + termios.tcsetattr(fd, termios.TCSADRAIN, old) # type: ignore[attr-defined] + return ch_posix except Exception: # noqa: BLE001 return "" @@ -4485,13 +4584,14 @@ def _serve() -> None: _time.sleep(0.05) if bind_error: - exc = bind_error[0] + bind_exc = bind_error[0] + bind_errno = getattr(bind_exc, "errno", None) # WinError 10048 / EADDRINUSE / EACCES on the bind syscall. is_port_collision = ( - isinstance(exc, OSError) + isinstance(bind_exc, OSError) and ( - getattr(exc, "winerror", None) == 10048 - or getattr(exc, "errno", None) in (48, 98, 13) + getattr(bind_exc, "winerror", None) == 10048 + or (isinstance(bind_errno, int) and bind_errno in (48, 98, 13)) ) ) if is_port_collision: @@ -4512,7 +4612,7 @@ def _serve() -> None: err=True, ) raise typer.Exit(code=1) - typer.echo(f"{_cli_error('Failed to start cockpit')}: {exc}", err=True) + typer.echo(f"{_cli_error('Failed to start cockpit')}: {bind_exc}", err=True) raise typer.Exit(code=1) try: diff --git a/src/agentops/pipeline/official_eval.py b/src/agentops/pipeline/official_eval.py index 85bef10..1b94a22 100644 --- a/src/agentops/pipeline/official_eval.py +++ b/src/agentops/pipeline/official_eval.py @@ -43,24 +43,24 @@ def official_eval_action_ref() -> str: - """Return the GitHub Action ref used for official eval workflows.""" + """Return the GitHub Action ref used for Microsoft Foundry eval workflows.""" return os.getenv(OFFICIAL_EVAL_ACTION_ENV, OFFICIAL_EVAL_ACTION) def official_eval_ado_task_ref() -> str: - """Return the Azure DevOps task ref used for official eval workflows.""" + """Return the Azure DevOps task ref used for Microsoft Foundry eval workflows.""" return os.getenv(OFFICIAL_EVAL_ADO_TASK_ENV, OFFICIAL_EVAL_ADO_TASK) class OfficialEvalUnsupported(ValueError): - """Raised when an AgentOps config cannot use official AI Agent Evaluation.""" + """Raised when an AgentOps config cannot use Microsoft Foundry AI Agent Evaluation.""" @dataclass(frozen=True) class OfficialEvalSupport: - """Eligibility result for the official Microsoft Foundry evaluation runner.""" + """Eligibility result for the Microsoft Foundry evaluation runner.""" eligible: bool runner: str @@ -73,7 +73,7 @@ class OfficialEvalSupport: @dataclass(frozen=True) class OfficialEvalPreparation: - """Prepared official evaluation input and metadata.""" + """Prepared Microsoft Foundry evaluation input and metadata.""" data_path: Path metadata_path: Path @@ -95,7 +95,7 @@ class _EvalPlan: def analyze_official_eval_support(config_path: Path) -> OfficialEvalSupport: - """Report whether ``config_path`` can use the official Foundry eval runner.""" + """Report whether ``config_path`` can use the Microsoft Foundry eval runner.""" try: plan = _build_plan(config_path) @@ -113,7 +113,7 @@ def analyze_official_eval_support(config_path: Path) -> OfficialEvalSupport: return OfficialEvalSupport( eligible=False, runner=AGENTOPS_LOCAL_RUNNER, - reasons=(f"agentops.yaml cannot be prepared for official eval: {exc}",), + reasons=(f"agentops.yaml cannot be prepared for Microsoft Foundry eval: {exc}",), warnings=(), official_evaluators=(), agent_ids=None, @@ -124,8 +124,8 @@ def analyze_official_eval_support(config_path: Path) -> OfficialEvalSupport: eligible=True, runner=OFFICIAL_EVAL_RUNNER, reasons=( - "agentops.yaml targets a Foundry prompt agent in name:version format.", - "The dataset can be converted to the official AI Agent Evaluation JSON shape.", + "Agent target is a Foundry prompt agent (`name:version`).", + "Dataset columns are compatible with Microsoft Foundry eval.", ), warnings=plan.warnings, official_evaluators=plan.official_evaluators, @@ -147,7 +147,7 @@ def prepare_official_eval( *, deployment_name: str | None = None, ) -> OfficialEvalPreparation: - """Convert AgentOps JSONL config into the official AI Agent Evaluation JSON.""" + """Convert AgentOps JSONL config into Microsoft Foundry AI Agent Evaluation JSON.""" plan = _build_plan(config_path) deployment = _resolve_deployment_name(deployment_name) @@ -200,7 +200,7 @@ def _build_plan(config_path: Path) -> _EvalPlan: target = classify_agent(config.agent, config.protocol) if target.kind != "foundry_prompt": raise OfficialEvalUnsupported( - "official AI Agent Evaluation only evaluates Foundry prompt agents " + "Microsoft Foundry AI Agent Evaluation only evaluates Foundry prompt agents " "(`agent: name:version`); use AgentOps local eval for hosted endpoints, " "HTTP agents, and model targets." ) @@ -212,7 +212,7 @@ def _build_plan(config_path: Path) -> _EvalPlan: official_evaluators, skipped, warnings = _map_evaluators(presets) if not official_evaluators: raise OfficialEvalUnsupported( - "no AgentOps evaluators could be mapped to official Foundry evaluators." + "no AgentOps evaluators could be mapped to Microsoft Foundry evaluators." ) _validate_dataset_for_official_runner(dataset_path, official_evaluators) @@ -238,11 +238,6 @@ def _map_evaluators( for preset in presets: if preset.name == _LATENCY_PRESET: skipped.append(preset.name) - warnings.append( - "avg_latency_seconds remains an AgentOps local gate; the official " - "runner records latency in its summary but does not expose a stable " - "threshold artifact yet." - ) continue official_name = _OFFICIAL_EVALUATORS.get(preset.name) @@ -316,7 +311,7 @@ def _validate_dataset_for_official_runner( if needs_ground_truth and not row.get("ground_truth"): raise OfficialEvalUnsupported( f"{dataset_path}: line {line_number} needs `expected` or " - "`ground_truth` for the selected official evaluators." + "`ground_truth` for the selected Microsoft Foundry evaluators." ) diff --git a/src/agentops/services/cicd.py b/src/agentops/services/cicd.py index 1e5e28d..4fe2d98 100644 --- a/src/agentops/services/cicd.py +++ b/src/agentops/services/cicd.py @@ -109,6 +109,7 @@ class CicdResult: platform: str = "github" deploy_mode: str = "placeholder" eval_runner: str = AGENTOPS_LOCAL_RUNNER + kinds: List[str] = field(default_factory=list) created_files: List[Path] = field(default_factory=list) overwritten_files: List[Path] = field(default_factory=list) skipped_files: List[Path] = field(default_factory=list) @@ -493,6 +494,7 @@ def generate_cicd_workflows( if kind in seen or kind not in template_map: continue seen.add(kind) + result.kinds.append(kind) substitutions: dict[str, str] = {} eval_config = ( "${{ inputs.config || 'agentops.yaml' }}" diff --git a/src/agentops/services/eval_analysis.py b/src/agentops/services/eval_analysis.py index 132811a..f40da78 100644 --- a/src/agentops/services/eval_analysis.py +++ b/src/agentops/services/eval_analysis.py @@ -4,10 +4,11 @@ import json import os +import textwrap from dataclasses import dataclass, field from fnmatch import fnmatch from pathlib import Path -from typing import Any, Dict, Iterable, List, Optional, Set +from typing import Any, Dict, Iterable, List, Optional, Sequence, Set from agentops.core.agentops_config import classify_agent from agentops.utils.yaml import load_yaml @@ -15,6 +16,7 @@ _TEXT_LIMIT = 200_000 _SCAN_LIMIT = 80 _DATASET_ROW_LIMIT = 20 +_TEXT_WRAP_WIDTH = 92 _TEXT_SUFFIXES = {".py", ".ts", ".tsx", ".js", ".jsx", ".bicep", ".yaml", ".yml"} _WALK_FILE_LIMIT = 2_000 _IGNORE_PARTS = { @@ -557,46 +559,171 @@ def _classification(config_info: _ConfigInfo, scenario_hint: str) -> str: def _render_text(analysis: EvalAnalysis) -> str: lines = [ "AgentOps eval analysis", - f"Directory: {analysis.directory}", - f"Classification: {analysis.classification}", - f"Config status: {analysis.config_status}", - f"Dataset status: {analysis.dataset_status}", - f"Target kind: {analysis.target_kind or 'unknown'}", - f"Scenario hint: {analysis.scenario_hint}", - f"Complexity: {analysis.complexity}", - f"Skill-assisted setup: {'yes' if analysis.requires_copilot_adaptation else 'no'}", - f"Copilot skills installed: {'yes' if analysis.copilot_skills_installed else 'no'}", + f"Workspace: {analysis.directory}", + f"Project: {_soften_text(analysis.classification)}", "", - "Detected signals:", + "Readiness", ] + lines.extend(_render_text_readiness(analysis)) + lines.append("") + lines.append("Signals") if analysis.signals: + lines.extend(_render_text_signals(analysis.signals)) + else: lines.extend( - f"- {s.label}: {s.detail}" + (f" ({s.path})" if s.path else "") - for s in analysis.signals + _wrapped_status_line( + "todo", + "Signals", + "No strong evaluation setup signals detected.", + ) ) - else: - lines.append("- No strong evaluation setup signals detected.") if analysis.warnings: lines.append("") - lines.append("Warnings:") - lines.extend(f"- {warning}" for warning in analysis.warnings) + lines.append("Warnings") + for warning in analysis.warnings: + lines.extend(_wrapped_status_line("warn", "warning", warning)) if analysis.recommended_skills: lines.append("") - lines.append("Recommended skills:") - lines.extend(f"- /{skill}" for skill in analysis.recommended_skills) + lines.append("Recommended skills") + for skill in analysis.recommended_skills: + lines.extend(_wrapped_status_line("todo", "skill", f"/{skill}")) if analysis.copilot_prompt: lines.append("") - lines.append("Copilot handoff:") - lines.append(f"- Copy/paste: {analysis.copilot_prompt}") + lines.append("Copilot handoff") + lines.extend(_wrapped_status_line("todo", "copy/paste", analysis.copilot_prompt)) lines.append("") - lines.append("Recommended commands:") - lines.extend(f"- {command}" for command in analysis.recommended_commands) + lines.append("Commands") + lines.extend(f" {command}" for command in analysis.recommended_commands) lines.append("") - lines.append("Next steps:") - lines.extend(f"- {step}" for step in analysis.next_steps) + lines.append("Next") + for index, step in enumerate(analysis.next_steps, start=1): + lines.extend(_wrapped_numbered_step(index, step)) return "\n".join(lines) + "\n" +def _render_text_readiness(analysis: EvalAnalysis) -> List[str]: + setup_value = ( + "needs setup help - use recommended skills before making eval blocking" + if analysis.requires_copilot_adaptation + else "ready - current eval setup can run directly" + ) + skills_value = ( + "installed - available for setup handoff" + if analysis.copilot_skills_installed + else ( + "missing - install if you want Copilot-guided setup" + if analysis.requires_copilot_adaptation + else "not needed - no Copilot handoff for eval setup" + ) + ) + return _render_text_fields( + [ + ("config", _friendly_status(analysis.config_status)), + ("dataset", _friendly_status(analysis.dataset_status)), + ("target", _friendly_target(analysis.target_kind)), + ("scenario", _friendly_status(analysis.scenario_hint)), + ("complexity", analysis.complexity), + ("setup help", setup_value), + ("Copilot skills", skills_value), + ] + ) + + +def _render_text_signals(signals: Sequence[EvalSignal]) -> List[str]: + lines: List[str] = [] + for signal in signals: + status = "ok" if signal.confidence == "high" else "hint" + detail = _soften_text(signal.detail + (f" ({signal.path})" if signal.path else "")) + lines.extend(_wrapped_status_line(status, _signal_label(signal.key, signal.label), detail)) + return lines + + +def _render_text_fields(rows: Sequence[tuple[str, str]]) -> List[str]: + width = max(len(label) for label, _ in rows) + lines: List[str] = [] + for label, value in rows: + lines.extend(_wrap_text(value, indent=f" {label.ljust(width)} ")) + return lines + + +def _wrapped_status_line(status: str, label: str, text: str) -> List[str]: + prefix = f" {status.ljust(4)} {label.ljust(20)} " + wrapped = textwrap.wrap( + text, + width=_TEXT_WRAP_WIDTH, + initial_indent=prefix, + subsequent_indent=" " * len(prefix), + break_long_words=False, + break_on_hyphens=False, + ) + return wrapped or [prefix.rstrip()] + + +def _wrapped_numbered_step(index: int, text: str) -> List[str]: + prefix = f" {index}. " + wrapped = textwrap.wrap( + text, + width=_TEXT_WRAP_WIDTH, + initial_indent=prefix, + subsequent_indent=" " * len(prefix), + break_long_words=False, + break_on_hyphens=False, + ) + return wrapped or [prefix.rstrip()] + + +def _wrap_text(text: str, *, indent: str) -> List[str]: + return textwrap.wrap( + text, + width=_TEXT_WRAP_WIDTH, + initial_indent=indent, + subsequent_indent=indent, + break_long_words=False, + break_on_hyphens=False, + ) or [indent.rstrip()] + + +def _friendly_target(target_kind: Optional[str]) -> str: + if not target_kind: + return "unknown" + return { + "foundry_prompt": "Foundry prompt agent", + "foundry_hosted": "Foundry hosted agent", + "http_json": "HTTP/JSON agent", + "model_deployment": "model deployment", + "model_direct": "direct model", + }.get(target_kind, _friendly_status(target_kind)) + + +def _friendly_status(value: str) -> str: + return value.replace("_", " ") + + +def _soften_text(text: str) -> str: + return ( + text.replace("foundry_prompt", "Foundry prompt agent") + .replace("model_direct", "direct model") + .replace("model_quality", "model quality") + .replace("agent_workflow", "agent workflow") + .replace("http_json", "HTTP/JSON agent") + ) + + +def _signal_label(key: str, fallback: str) -> str: + return { + "agentops_config": "Config", + "dataset_ref": "Dataset", + "dataset_columns": "Columns", + "scenario_hint": "Scenario", + "azd_project": "azd", + "container_or_http_app": "Host", + "rag_signal": "RAG", + "tool_signal": "Tools", + "foundry_signal": "Foundry", + "model_signal": "Model", + }.get(key, fallback) + + def _render_markdown(analysis: EvalAnalysis) -> str: lines = [ "# AgentOps eval analysis", @@ -719,4 +846,3 @@ def _rel(root: Path, path: Optional[Path]) -> Optional[str]: return str(path.relative_to(root)) except ValueError: return str(path) - diff --git a/src/agentops/services/evidence_pack.py b/src/agentops/services/evidence_pack.py index 4c7cddb..38963e4 100644 --- a/src/agentops/services/evidence_pack.py +++ b/src/agentops/services/evidence_pack.py @@ -7,7 +7,7 @@ from dataclasses import dataclass from datetime import datetime, timezone from pathlib import Path -from typing import Any, Optional +from typing import Any, Optional, cast from agentops.agent.analyzer import AnalysisResult from agentops.agent.findings import Severity @@ -210,12 +210,23 @@ def _agentops_eval_status(root: Path) -> dict[str, Any]: if not isinstance(payload, dict): return {"status": "invalid", "path": str(path), "error": "expected JSON object"} - summary = payload.get("summary") if isinstance(payload.get("summary"), dict) else {} - target = payload.get("target") if isinstance(payload.get("target"), dict) else {} - config = payload.get("config") if isinstance(payload.get("config"), dict) else {} - metrics = payload.get("aggregate_metrics") or payload.get("metrics") or payload.get("run_metrics") or {} - thresholds = payload.get("thresholds") if isinstance(payload.get("thresholds"), list) else [] - cloud = config.get("cloud_evaluation") if isinstance(config.get("cloud_evaluation"), dict) else {} + summary_raw = payload.get("summary") + summary = cast(dict[str, Any], summary_raw) if isinstance(summary_raw, dict) else {} + target_raw = payload.get("target") + target = cast(dict[str, Any], target_raw) if isinstance(target_raw, dict) else {} + config_raw = payload.get("config") + config = cast(dict[str, Any], config_raw) if isinstance(config_raw, dict) else {} + raw_metrics = ( + payload.get("aggregate_metrics") + or payload.get("metrics") + or payload.get("run_metrics") + or {} + ) + metrics: dict[str, Any] = raw_metrics if isinstance(raw_metrics, dict) else {} + thresholds_raw = payload.get("thresholds") + thresholds = cast(list[Any], thresholds_raw) if isinstance(thresholds_raw, list) else [] + cloud_raw = config.get("cloud_evaluation") + cloud = cast(dict[str, Any], cloud_raw) if isinstance(cloud_raw, dict) else {} comparison = payload.get("comparison") passed = summary.get("overall_passed") diff --git a/src/agentops/services/setup_wizard.py b/src/agentops/services/setup_wizard.py index 6b1dee6..bfde86c 100644 --- a/src/agentops/services/setup_wizard.py +++ b/src/agentops/services/setup_wizard.py @@ -2,19 +2,21 @@ The wizard asks the user one question at a time for the values AgentOps needs to evaluate, observe, and analyze a Foundry agent — the project -endpoint, the agent identifier, the dataset path, and the Application -Insights connection string. +endpoint, the agent identifier, and the dataset path. Storage model (azd-first): * ``agent`` and ``dataset`` are declarative project config and stay in ``agentops.yaml``. They are version-controlled and rarely change between environments. -* ``AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`` and - ``APPLICATIONINSIGHTS_CONNECTION_STRING`` are environment-specific and - land in ``.azure//.env`` — the same file ``azd`` uses, so - Doctor, the Cockpit, and ``agentops eval run`` all see one source of - truth. The file is git-ignored via ``.azure/.gitignore``. +* ``AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`` is environment-specific and lands + in ``.azure//.env`` — the same file ``azd`` uses, so Doctor, + the Cockpit, and ``agentops eval run`` all see one source of truth. The + file is git-ignored via ``.azure/.gitignore``. +* ``APPLICATIONINSIGHTS_CONNECTION_STRING`` can still be saved to the same + env file when supplied non-interactively, but the interactive wizard does + not ask for it; runtime commands can discover it from the Foundry project + later. * Canonical Azure variable names are preserved so the Azure SDKs and ``azd`` templates can read them directly. @@ -30,7 +32,7 @@ import re from dataclasses import dataclass, field from pathlib import Path -from typing import Callable, List, Optional +from typing import Callable, Collection, List, Optional # --------------------------------------------------------------------------- @@ -58,13 +60,6 @@ "Default: .agentops/data/smoke.jsonl" ) -APPINSIGHTS_TITLE = "Application Insights connection string (optional)" -APPINSIGHTS_HELP = ( - "Press Enter to auto-discover it from the Foundry project.\n" - "Paste one only to force a specific App Insights resource." -) - - # Canonical environment-variable names AgentOps reads. We never rename # variables that the Azure SDKs and azd templates expect — only AgentOps- # specific knobs get the ``AGENTOPS_`` prefix. @@ -377,9 +372,9 @@ def run_wizard( prompt: PromptFn, echo: Callable[[str], None], *, - include_appinsights: bool = True, on_answer: Optional[OnAnswerFn] = None, reconfigure: bool = False, + force_prompt_fields: Optional[Collection[str]] = None, ) -> WizardAnswers: """Drive the interactive question loop. @@ -398,13 +393,22 @@ def run_wizard( environment, or the process env — is reused silently with a single confirmation line. Set ``reconfigure=True`` to force the wizard to re-ask every question even when defaults are present. + + ``force_prompt_fields`` is narrower than ``reconfigure``: it re-asks only + selected fields while still reusing other existing defaults. The CLI uses + this on a first interactive run so starter ``agentops.yaml`` values remain + visible defaults instead of being accepted as real user choices. """ defaults = discover_defaults(workspace) answers = WizardAnswers() skipped: list[str] = [] + forced_fields = set(force_prompt_fields or ()) unicode_ok = _can_encode("✓•") ok_glyph = "✓" if unicode_ok else "*" + def _should_prompt(field_name: str, value: Optional[str]) -> bool: + return reconfigure or field_name in forced_fields or not value + def _persist(field_name: str, value: str) -> None: if on_answer is not None: try: @@ -421,8 +425,8 @@ def _confirm_existing(label: str, value: str, secret: bool = False) -> None: echo(f" {ok_glyph} {label}: {display}") # 1) Foundry project endpoint - if not reconfigure and defaults.project_endpoint: - _confirm_existing(PROJECT_ENDPOINT_TITLE, defaults.project_endpoint) + if not _should_prompt("project_endpoint", defaults.project_endpoint): + _confirm_existing(PROJECT_ENDPOINT_TITLE, defaults.project_endpoint or "") skipped.append("project_endpoint") else: echo("") @@ -443,8 +447,8 @@ def _confirm_existing(label: str, value: str, secret: bool = False) -> None: break # 2) Agent - if not reconfigure and defaults.agent: - _confirm_existing(AGENT_TITLE, defaults.agent) + if not _should_prompt("agent", defaults.agent): + _confirm_existing(AGENT_TITLE, defaults.agent or "") skipped.append("agent") else: echo("") @@ -465,8 +469,8 @@ def _confirm_existing(label: str, value: str, secret: bool = False) -> None: break # 3) Dataset - if not reconfigure and defaults.dataset: - _confirm_existing(DATASET_TITLE, defaults.dataset) + if not _should_prompt("dataset", defaults.dataset): + _confirm_existing(DATASET_TITLE, defaults.dataset or "") skipped.append("dataset") else: echo("") @@ -486,41 +490,9 @@ def _confirm_existing(label: str, value: str, secret: bool = False) -> None: _persist("dataset", value) break - # 4) Application Insights - if include_appinsights: - if not reconfigure and defaults.appinsights_connection_string: - _confirm_existing( - APPINSIGHTS_TITLE, - defaults.appinsights_connection_string, - secret=True, - ) - skipped.append("appinsights_connection_string") - else: - echo("") - echo(APPINSIGHTS_TITLE) - echo(_indent(APPINSIGHTS_HELP)) - if defaults.appinsights_connection_string: - echo( - " Current: " - f"{_mask_secret(defaults.appinsights_connection_string)} " - "(Enter keeps it)." - ) - else: - echo(" Enter = auto-discover.") - raw = prompt( - "Application Insights connection string", - None, - ) - value = raw.strip() - if value and value != (defaults.appinsights_connection_string or ""): - answers.appinsights_connection_string = value - _persist("appinsights_connection_string", value) - # Surface a hint only when EVERY managed value was already set, so the # user knows how to edit values without thinking the wizard "did nothing". expected = ["project_endpoint", "agent", "dataset"] - if include_appinsights: - expected.append("appinsights_connection_string") if not reconfigure and set(skipped) == set(expected): echo("") echo("All values already configured. Re-run with --reconfigure to change them.") diff --git a/src/agentops/services/trace_promotion.py b/src/agentops/services/trace_promotion.py index ce78553..0c2eae9 100644 --- a/src/agentops/services/trace_promotion.py +++ b/src/agentops/services/trace_promotion.py @@ -3,11 +3,14 @@ from __future__ import annotations import json +import textwrap from dataclasses import dataclass, field from datetime import datetime, timezone from pathlib import Path from typing import Any, Iterable, Literal, Optional +_TEXT_WRAP_WIDTH = 92 + LabelMode = Literal["self-similarity", "pending"] @@ -102,25 +105,78 @@ def render_trace_promotion_preview(preview: TracePromotionPreview) -> str: lines = [ "AgentOps trace-to-dataset preview", - "", f"Source: {preview.source}", f"Output: {preview.output_path}", - f"Rows: {len(preview.rows)}", - f"Skipped: {preview.skipped}", - f"Label mode: {preview.label_mode}", + "", + "Summary", ] + lines.extend( + _render_text_fields( + [ + ("rows", str(len(preview.rows))), + ("skipped", str(preview.skipped)), + ("label mode", preview.label_mode), + ] + ) + ) if preview.warnings: lines.append("") - lines.append("Warnings:") - lines.extend(f"- {warning}" for warning in preview.warnings) + lines.append("Warnings") + for warning in preview.warnings: + lines.extend(_wrapped_status_line("warn", "warning", warning)) if preview.rows: lines.append("") - lines.append("Sample rows:") - for row in preview.rows[:3]: - lines.append(f"- {row['input'][:100]}") + lines.append("Sample rows") + for index, row in enumerate(preview.rows[:3], start=1): + lines.extend(_wrapped_numbered_line(index, str(row["input"])[:100])) return "\n".join(lines) + "\n" +def _render_text_fields(rows: list[tuple[str, str]]) -> list[str]: + width = max(len(label) for label, _ in rows) + lines: list[str] = [] + for label, value in rows: + lines.extend(_wrap_text(value, indent=f" {label.ljust(width)} ")) + return lines + + +def _wrapped_status_line(status: str, label: str, text: str) -> list[str]: + prefix = f" {status.ljust(4)} {label.ljust(10)} " + wrapped = textwrap.wrap( + text, + width=_TEXT_WRAP_WIDTH, + initial_indent=prefix, + subsequent_indent=" " * len(prefix), + break_long_words=False, + break_on_hyphens=False, + ) + return wrapped or [prefix.rstrip()] + + +def _wrapped_numbered_line(index: int, text: str) -> list[str]: + prefix = f" {index}. " + wrapped = textwrap.wrap( + text, + width=_TEXT_WRAP_WIDTH, + initial_indent=prefix, + subsequent_indent=" " * len(prefix), + break_long_words=False, + break_on_hyphens=False, + ) + return wrapped or [prefix.rstrip()] + + +def _wrap_text(text: str, *, indent: str) -> list[str]: + return textwrap.wrap( + text, + width=_TEXT_WRAP_WIDTH, + initial_indent=indent, + subsequent_indent=indent, + break_long_words=False, + break_on_hyphens=False, + ) or [indent.rstrip()] + + def _load_trace_export(source: Path) -> Iterable[dict[str, Any]]: text = source.read_text(encoding="utf-8") stripped = text.strip() diff --git a/src/agentops/services/workflow_analysis.py b/src/agentops/services/workflow_analysis.py index c6c1b7a..6dbdad6 100644 --- a/src/agentops/services/workflow_analysis.py +++ b/src/agentops/services/workflow_analysis.py @@ -3,9 +3,10 @@ from __future__ import annotations import json +import textwrap from dataclasses import dataclass, field from pathlib import Path -from typing import Any, Dict, Iterable, List, Optional +from typing import Any, Dict, Iterable, List, Optional, Sequence from agentops.core.agentops_config import classify_agent from agentops.pipeline.official_eval import ( @@ -18,6 +19,7 @@ _TEXT_LIMIT = 200_000 _SCAN_LIMIT = 80 +_TEXT_WRAP_WIDTH = 92 _IGNORE_PARTS = { ".agentops", ".azure", @@ -263,15 +265,15 @@ def analyze_workflow_project(directory: Path) -> WorkflowAnalysis: signals.append( WorkflowSignal( "official_ai_agent_evaluation", - "Official AI Agent Evaluation", - "agentops.yaml can use Microsoft Foundry AI Agent Evaluation for prompt-agent CI gates.", + "Foundry eval runner", + "prompt agent and dataset are compatible; CI can use Microsoft Foundry evaluation.", "agentops.yaml", ) ) warnings.extend(official_support.warnings) else: warnings.append( - "Official AI Agent Evaluation not selected: " + "Microsoft Foundry AI Agent Evaluation not selected: " + " ".join(official_support.reasons) ) @@ -349,6 +351,14 @@ def recommended_eval_runner(directory: Path) -> str: return analyze_workflow_project(directory).recommended_eval_runner +def _display_eval_runner(eval_runner: str) -> str: + if eval_runner == OFFICIAL_EVAL_RUNNER: + return "Microsoft Foundry AI Agent Evaluation" + if eval_runner == AGENTOPS_LOCAL_RUNNER: + return "AgentOps local eval" + return eval_runner + + def has_ailz_preflight(directory: Path) -> bool: """Return True when the official AI Landing Zone preflight script exists.""" root = directory.resolve() @@ -369,46 +379,42 @@ def render_workflow_analysis(analysis: WorkflowAnalysis, output_format: str = "t def _render_text(analysis: WorkflowAnalysis) -> str: lines = [ "AgentOps workflow analysis", - f"Directory: {analysis.directory}", - f"Classification: {analysis.classification}", - f"Recommended deploy mode: {analysis.recommended_deploy_mode}", - f"Recommended eval runner: {analysis.recommended_eval_runner}", - f"Strategy: {analysis.deployment_strategy}", - f"Eval strategy: {analysis.eval_strategy}", - f"Complexity: {analysis.complexity}", - f"Copilot adaptation: {'yes' if analysis.requires_copilot_adaptation else 'no'}", - f"Copilot skills installed: {'yes' if analysis.copilot_skills_installed else 'no'}", + f"Workspace: {analysis.directory}", + f"Project: {analysis.classification}", "", - "Detected signals:", + "Recommendation", ] - lines.extend( - f"- {s.label}: {s.detail}" + (f" ({s.path})" if s.path else "") - for s in analysis.signals - ) + lines.extend(_render_text_recommendation(analysis)) + lines.append("") + lines.append("Signals") + lines.extend(_render_text_signal_rows(_signal_rows(analysis))) if analysis.warnings: lines.append("") - lines.append("Warnings:") - lines.extend(f"- {warning}" for warning in analysis.warnings) + lines.append("Warnings") + for warning in analysis.warnings: + lines.extend(_wrapped_status_line("warn", "warning", warning)) if analysis.official_eval_reasons: lines.append("") - lines.append("Official eval decision:") - lines.extend(f"- {reason}" for reason in analysis.official_eval_reasons) - if analysis.official_evaluators: - lines.append("- Evaluators: " + ", ".join(analysis.official_evaluators)) + lines.append("Foundry eval") + lines.extend(_render_text_foundry_eval_rows(_foundry_eval_rows(analysis))) if analysis.copilot_prompt: lines.append("") - lines.append("Copilot handoff:") - lines.append(f"- Copy/paste: {analysis.copilot_prompt}") + lines.append("Copilot handoff") + lines.extend(_wrapped_status_line("todo", "copy/paste", analysis.copilot_prompt)) lines.append("") - lines.append("Recommended commands:") - lines.extend(f"- {command}" for command in analysis.recommended_commands) - lines.append("") - lines.append("Pipeline stages:") - for stage in analysis.stages: - lines.append(f"- {stage.name} [{stage.owner}]: {stage.purpose}") + lines.append("Pipeline plan") + for index, stage in enumerate(analysis.stages, start=1): + lines.append(f" {index}. {stage.name}") + lines.extend(_wrap_text(stage.purpose, indent=" ")) + commands = _text_commands(analysis.recommended_commands) + if commands: + lines.append("") + lines.append("Commands") + lines.extend(f" {command}" for command in commands) lines.append("") - lines.append("Next steps:") - lines.extend(f"- {step}" for step in analysis.next_steps) + lines.append("Next") + for index, step in enumerate(analysis.next_steps, start=1): + lines.extend(_wrapped_numbered_step(index, step)) return "\n".join(lines) + "\n" @@ -418,33 +424,25 @@ def _render_markdown(analysis: WorkflowAnalysis) -> str: "", f"- **Directory:** `{analysis.directory}`", f"- **Classification:** {analysis.classification}", - f"- **Recommended deploy mode:** `{analysis.recommended_deploy_mode}`", - f"- **Recommended eval runner:** `{analysis.recommended_eval_runner}`", - f"- **Strategy:** {analysis.deployment_strategy}", - f"- **Eval strategy:** {analysis.eval_strategy}", - f"- **Complexity:** {analysis.complexity}", - f"- **Copilot adaptation:** {'yes' if analysis.requires_copilot_adaptation else 'no'}", - f"- **Copilot skills installed:** {'yes' if analysis.copilot_skills_installed else 'no'}", "", - "## Detected signals", + "## Workflow decision checklist", "", ] - if analysis.signals: - lines.extend( - f"- **{s.label}** ({s.confidence}): {s.detail}" - + (f" — `{s.path}`" if s.path else "") - for s in analysis.signals - ) - else: - lines.append("- No strong accelerator or deployment signals detected.") + lines.extend(_render_markdown_table(("Check", "Status", "Explanation"), _decision_checklist_rows(analysis))) + lines.extend( + [ + "", + "## Detected signals", + "", + ] + ) + lines.extend(_render_markdown_table(("Status", "Type", "Finding", "Evidence"), _signal_rows(analysis))) if analysis.warnings: lines.extend(["", "## Warnings", ""]) lines.extend(f"- {warning}" for warning in analysis.warnings) if analysis.official_eval_reasons: - lines.extend(["", "## Official eval decision", ""]) - lines.extend(f"- {reason}" for reason in analysis.official_eval_reasons) - if analysis.official_evaluators: - lines.append("- Evaluators: " + ", ".join(f"`{e}`" for e in analysis.official_evaluators)) + lines.extend(["", "## Foundry eval checks", ""]) + lines.extend(_render_markdown_table(("Status", "Check", "Explanation"), _foundry_eval_rows(analysis))) if analysis.copilot_prompt: lines.extend(["", "## Copilot handoff", ""]) lines.extend(["Copy/paste this into Copilot:", "", "```text", analysis.copilot_prompt, "```"]) @@ -466,6 +464,270 @@ def _render_markdown(analysis: WorkflowAnalysis) -> str: return "\n".join(lines).rstrip() + "\n" +def _render_text_recommendation(analysis: WorkflowAnalysis) -> List[str]: + adaptation_value = ( + "needed - review project-specific build/deploy steps" + if analysis.requires_copilot_adaptation + else "not needed - generated workflow should work as-is" + ) + skills_value = ( + "installed - available for workflow adaptation handoff" + if analysis.copilot_skills_installed + else ( + "missing - run `agentops skills install --platform copilot` for handoff" + if analysis.requires_copilot_adaptation + else "not needed - no Copilot handoff for this project shape" + ) + ) + return _render_text_fields( + [ + ("deploy", analysis.recommended_deploy_mode), + ("evaluate", _display_eval_runner(analysis.recommended_eval_runner)), + ("workflow edits", adaptation_value), + ("Copilot skills", skills_value), + ] + ) + + +def _text_commands(commands: Sequence[str]) -> List[str]: + return [command for command in commands if command != "agentops workflow analyze --format markdown"] + + +def _render_text_fields(rows: Sequence[tuple[str, str]]) -> List[str]: + width = max(len(label) for label, _ in rows) + lines: List[str] = [] + for label, value in rows: + lines.extend(_wrap_text(value, indent=f" {label.ljust(width)} ")) + return lines + + +def _render_text_signal_rows(rows: Sequence[Sequence[str]]) -> List[str]: + lines: List[str] = [] + for status, signal_type, _finding, evidence in rows: + marker, _ = _split_status_value(str(status)) + detail = _soften_text(str(evidence)) + lines.extend(_wrapped_status_line(_status_word(marker), str(signal_type), detail)) + return lines + + +def _render_text_foundry_eval_rows(rows: Sequence[Sequence[str]]) -> List[str]: + lines: List[str] = [] + for status, check, explanation in rows: + marker, _ = _split_status_value(str(status)) + lines.extend( + _wrapped_status_line( + _status_word(marker), + str(check), + _friendly_foundry_eval_text(str(check), str(explanation)), + ) + ) + return lines + + +def _split_status_value(status: str) -> tuple[str, str]: + if status.startswith("[") and "]" in status: + marker, _, value = status.partition(" ") + return marker, value.strip() + return status, "" + + +def _status_word(marker: str) -> str: + if marker == "[x]": + return "ok" + if marker == "[?]": + return "hint" + if marker == "[ ]": + return "todo" + return marker.strip("[]").lower() or "info" + + +def _wrapped_status_line(status: str, label: str, text: str) -> List[str]: + prefix = f" {status.ljust(4)} {label.ljust(13)} " + wrapped = textwrap.wrap( + text, + width=_TEXT_WRAP_WIDTH, + initial_indent=prefix, + subsequent_indent=" " * len(prefix), + break_long_words=False, + break_on_hyphens=False, + ) + return wrapped or [prefix.rstrip()] + + +def _wrapped_numbered_step(index: int, text: str) -> List[str]: + prefix = f" {index}. " + wrapped = textwrap.wrap( + text, + width=_TEXT_WRAP_WIDTH, + initial_indent=prefix, + subsequent_indent=" " * len(prefix), + break_long_words=False, + break_on_hyphens=False, + ) + return wrapped or [prefix.rstrip()] + + +def _friendly_foundry_eval_text(check: str, text: str) -> str: + if check == "Agent target": + return "Foundry prompt agent (`name:version`)." + if check == "Evaluators": + return _friendly_evaluator_list(text.split(", ")) + return _soften_text(text) + + +def _friendly_evaluator_list(evaluators: Iterable[str]) -> str: + return ", ".join( + evaluator.removeprefix("builtin.").replace("_", " ") + for evaluator in evaluators + if evaluator + ) + + +def _soften_text(text: str) -> str: + return text.replace("foundry_prompt", "Foundry prompt agent") + + +def _wrap_text(text: str, *, indent: str) -> List[str]: + return textwrap.wrap( + text, + width=_TEXT_WRAP_WIDTH, + initial_indent=indent, + subsequent_indent=indent, + break_long_words=False, + break_on_hyphens=False, + ) or [indent.rstrip()] + + +def _render_markdown_table(headers: Sequence[str], rows: Sequence[Sequence[str]]) -> List[str]: + normalized = [[_escape_markdown_cell(str(cell)) for cell in row] for row in rows] + if not normalized: + normalized = [["-" for _ in headers]] + header_line = "| " + " | ".join(_escape_markdown_cell(str(header)) for header in headers) + " |" + separator = "| " + " | ".join("---" for _ in headers) + " |" + body = ["| " + " | ".join(row) + " |" for row in normalized] + return [header_line, separator, *body] + + +def _escape_markdown_cell(value: str) -> str: + return value.replace("|", "\\|") + + +def _decision_checklist_rows(analysis: WorkflowAnalysis) -> List[tuple[str, str, str]]: + if analysis.requires_copilot_adaptation: + adaptation_status = "[ ] needs edits" + adaptation_detail = ( + "Detected project-specific topology or deploy signals; review generated " + "workflow before making it blocking." + ) + else: + adaptation_status = "[x] not needed" + adaptation_detail = "Generated workflow should be directly usable for this project shape." + + if analysis.requires_copilot_adaptation: + skills_status = "[x] installed" if analysis.copilot_skills_installed else "[ ] missing" + skills_detail = "Needed only for the Copilot workflow-adaptation handoff." + else: + skills_status = "[x] not required" + skills_detail = "No workflow handoff is required for the current recommendation." + + return [ + ( + "Deploy mode", + f"[x] {analysis.recommended_deploy_mode}", + _deploy_mode_check_detail(analysis.recommended_deploy_mode), + ), + ( + "Eval runner", + f"[x] {_display_eval_runner(analysis.recommended_eval_runner)}", + _eval_runner_check_detail(analysis.recommended_eval_runner), + ), + ("Complexity", "[x] " + analysis.complexity, "Used to decide whether CI needs extra review."), + ("Workflow adaptation", adaptation_status, adaptation_detail), + ("Copilot skills", skills_status, skills_detail), + ] + + +def _deploy_mode_check_detail(mode: str) -> str: + if mode == "azd": + return "Use azd for provision/deploy; AgentOps supplies gates and evidence." + if mode == "prompt-agent": + return "Stage and evaluate a Foundry prompt candidate, then record deployment." + return "Generate CI placeholders; add the project-specific build/deploy steps." + + +def _eval_runner_check_detail(eval_runner: str) -> str: + if eval_runner == OFFICIAL_EVAL_RUNNER: + return "Prompt agent plus dataset fit Foundry eval; AgentOps keeps evidence." + return "AgentOps runs local eval and writes normalized results/report artifacts." + + +def _signal_rows(analysis: WorkflowAnalysis) -> List[tuple[str, str, str, str]]: + if not analysis.signals: + return [ + ( + "[ ]", + "Signals", + "No strong project signals", + "No accelerator, azd, AgentOps, or CI files were detected.", + ) + ] + return [ + ( + "[x]" if signal.confidence == "high" else "[?]", + _signal_type(signal.key), + signal.label, + signal.detail + (f" ({signal.path})" if signal.path else ""), + ) + for signal in analysis.signals + ] + + +def _foundry_eval_rows(analysis: WorkflowAnalysis) -> List[tuple[str, str, str]]: + selected = analysis.recommended_eval_runner == OFFICIAL_EVAL_RUNNER + if selected: + rows = [ + ( + "[x]", + "Agent target", + analysis.official_eval_reasons[0] + if analysis.official_eval_reasons + else "Foundry prompt agent.", + ), + ( + "[x]", + "Dataset", + analysis.official_eval_reasons[1] + if len(analysis.official_eval_reasons) > 1 + else "Compatible with Microsoft Foundry eval.", + ), + ] + if analysis.official_evaluators: + rows.append(("[x]", "Evaluators", ", ".join(analysis.official_evaluators))) + return rows + + return [ + ("[ ]", "Microsoft Foundry eval", reason) + for reason in analysis.official_eval_reasons + ] + + +def _signal_type(key: str) -> str: + return { + "agentops_config": "Config", + "official_ai_agent_evaluation": "Eval runner", + "azd_project": "Deploy mode", + "prompt_file": "Prompt source", + "bicep_infra": "Infrastructure", + "ailz_manifest": "Landing zone", + "ailz_preflight": "Preflight", + "network_isolation": "Runner topology", + "network_isolation_hint": "Runner topology", + "container_app": "Application host", + "accelerator_hint": "Accelerator", + "existing_ci": "Existing CI", + }.get(key, "Signal") + + def _agentops_signal(root: Path) -> Dict[str, Any]: path = root / "agentops.yaml" if not path.exists(): @@ -699,7 +961,7 @@ def _eval_stage(eval_runner: str) -> WorkflowStage: return WorkflowStage( "PR evaluation gate", "Microsoft Foundry + AgentOps", - "Run the official AI Agent Evaluation action/task and publish AgentOps-prepared inputs.", + "Run Microsoft Foundry AI Agent Evaluation and publish AgentOps-prepared inputs.", [ "python -m agentops.pipeline.official_eval prepare", official_eval_action_ref(), @@ -796,7 +1058,7 @@ def _next_steps( if eval_runner == OFFICIAL_EVAL_RUNNER: steps.insert( 1, - "Set AZURE_OPENAI_DEPLOYMENT so the official AI Agent Evaluation runner can judge responses.", + "Set AZURE_OPENAI_DEPLOYMENT so Microsoft Foundry AI Agent Evaluation can judge responses.", ) if ailz_preflight: steps.insert(0, "Run `pwsh ./scripts/Invoke-PreflightChecks.ps1 -Strict` before provisioning the AI Landing Zone.") diff --git a/src/agentops/templates/skills/agentops-workflow/SKILL.md b/src/agentops/templates/skills/agentops-workflow/SKILL.md index c375a49..6306a1d 100644 --- a/src/agentops/templates/skills/agentops-workflow/SKILL.md +++ b/src/agentops/templates/skills/agentops-workflow/SKILL.md @@ -1,6 +1,6 @@ --- name: agentops-workflow -description: Set up AgentOps release-readiness workflows: PR eval gates, Doctor/evidence artifacts, and safe deploy handoffs to azd or Foundry prompt-agent tooling. Trigger on "CI", "CD", "pipeline", "workflow", "GitHub Actions", "Azure DevOps", "ADO", "PR gate", "deploy", "environments", "GitFlow", "release branch", "promote to prod", "DevOps", "can we ship". +description: "Set up AgentOps release-readiness workflows: PR eval gates, Doctor/evidence artifacts, and safe deploy handoffs to azd or Foundry prompt-agent tooling. Trigger on CI, CD, pipeline, workflow, GitHub Actions, Azure DevOps, ADO, PR gate, deploy, environments, GitFlow, release branch, promote to prod, DevOps, can we ship." --- # AgentOps Workflow @@ -39,6 +39,80 @@ parallel deployment system. AgentOps should gate quality and record proof; `azd provision`, `azd deploy`, azd hooks, Foundry Toolkit, the `microsoft-foundry` skill, and project tooling own lifecycle actions. +## Fast path - generated GitHub setup + +Use this path when the user already generated GitHub workflows or asks to get +the PR gate/watchdog running. Stay local-first and deterministic; do not start +by discovering the whole Azure subscription. + +1. Inspect the repo before cloud discovery: + - `agentops init show --dir .` without `--reveal-secrets`. + - `agentops.yaml`. + - `.azure/config.json`, then the active `.azure//.env`. + - `azd env get-values` when `azure.yaml` exists and azd is available. + - `.github/workflows/agentops-*.yml`. +2. Read the generated workflows to determine exactly which GitHub environments + and variables are needed. For the prompt-agent quickstart, `pr,watchdog` + normally means only `environment: dev`. +3. Treat `dev` here as a GitHub Actions environment for OIDC and variables. It + normally points at the Foundry project already configured by `agentops init`; + it does not require creating a new Foundry project. +4. Proceed only when these values are known or deliberately chosen: + - GitHub `owner/repo`. + - workflow environment names from `jobs.*.environment`. + - `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_SUBSCRIPTION_ID`. + - `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`. + - `AZURE_OPENAI_DEPLOYMENT`. + - optional `APPLICATIONINSIGHTS_CONNECTION_STRING`. +5. Prefer existing values and exact checks: + - `git remote get-url origin` and `gh repo view --json nameWithOwner`. + - `gh variable list --env ` and `gh secret list --env `. + - `agentops init show`, local `.azure//.env`, and `azd env get-values` + values before `az account show`. + - `az account show` only as a proposal for tenant/subscription; confirm + before writing it to GitHub variables. +6. Copy CI variables from local AgentOps/azd configuration into the GitHub + environment used by the workflow. Reuse local values for + `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`, `AZURE_OPENAI_ENDPOINT`, + `AZURE_OPENAI_DEPLOYMENT`, and optional + `APPLICATIONINSIGHTS_CONNECTION_STRING` instead of asking the user to type + them again. Explain `AZURE_OPENAI_DEPLOYMENT` only if it is missing: it is + the Azure OpenAI deployment used as the evaluator/judge model, not the + user's agent. +7. Do not enumerate subscriptions, Foundry projects, Azure OpenAI resources, or + model deployments to guess missing values. If `AZURE_SUBSCRIPTION_ID`, + `AZURE_TENANT_ID`, `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`, or + `AZURE_OPENAI_DEPLOYMENT` is absent from AgentOps/azd/local env, ask the user + to choose or provide it. Only run a scoped Azure query after the user confirms + the subscription and the exact missing value. +8. For GitHub OIDC, derive the federated credential subject from the generated + workflow. If the job has `environment: dev`, the subject is normally + `repo:/:environment:dev`. Do not assume branch or + `pull_request` subjects without reading the workflow. +9. Ask before creating or updating GitHub repos, GitHub environments, + variables/secrets, Entra app registrations/service principals, federated + credentials, managed identities, or Azure RBAC assignments. +10. When creating federated credentials from PowerShell, avoid fragile + interpolation. Do **not** write `"repo:$repo:environment:$envName"` because + `$repo:` can be parsed as a scoped variable. Use + `"repo:${repo}:environment:${envName}"` or + `("repo:{0}:environment:{1}" -f $repo, $envName)`, then build JSON from a + PowerShell object with `ConvertTo-Json`. +11. After creating or updating a federated credential, read it back and verify + before triggering a workflow: + - `subject` exactly matches the generated workflow subject. + - `issuer` is `https://token.actions.githubusercontent.com`. + - `audiences` includes `api://AzureADTokenExchange`. + If any value differs, fix the credential before running GitHub Actions. +12. Do not dispatch `gh workflow run` as a surprise validation step. First show + that the GitHub environment, variables/secrets, federated credential, and + Azure RBAC are ready, then ask the user before triggering workflows. +13. Avoid broad discovery unless local config is missing. Do **not** run broad + `az resource list`, `az graph query`, SDK inspection, or web search to find + the Foundry project when `agentops.yaml` or `.azure//.env` already has + `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`. If the endpoint is missing, say exactly + what is missing and ask the user before scanning the subscription. + ## Branch model assumed ``` @@ -126,21 +200,18 @@ Useful flags: ### GitHub Actions -Walk the user through Settings → Environments and create three: +Read the generated workflow files and create only the GitHub Environments used +by `jobs.*.environment`. For `pr,watchdog`, that is usually only **`dev`**. For +the full scaffold, create **`dev`**, **`qa`**, and **`production`**. -1. **`dev`** - no extra protection. Set any DEV-specific variables here - (e.g. `ACA_APP_NAME`, `AZURE_RESOURCE_GROUP` pointing at the dev RG). -2. **`qa`** - usually no required reviewers, but isolated variables for - the QA environment. -3. **`production`** - set: - - **Required reviewers**: at least one (deploys to PROD will pause - here until approved). - - (Optional) **Wait timer** for an extra delay. - - (Optional) **Deployment branches**: restrict to `main`. - - PROD-specific variables (e.g. production resource group). +- **`dev`** - no extra protection. Store the OIDC variables here when the + generated jobs use `environment: dev`. +- **`qa`** - usually no required reviewers, but isolated variables for QA. +- **`production`** - set required reviewers, optional wait timer, optional + deployment branch restriction to `main`, and production-specific variables. -Tell the user that env-specific variables on the `production` environment -will override repo-level ones automatically inside the prod workflow. +Tell the user that environment-level variables override repository-level ones +inside jobs that declare that environment. ### Azure DevOps @@ -169,14 +240,17 @@ so the PR-comment step can post. ### GitHub Actions (OIDC) -At repository level (Settings → Secrets and variables → Actions → -**Variables** tab), set: +At the GitHub Environment level when the workflow declares an environment +(preferred for the quickstart), or at repository level when intentionally shared +across environments, set: - `AZURE_CLIENT_ID` - App registration / managed identity used for OIDC. - `AZURE_TENANT_ID` - `AZURE_SUBSCRIPTION_ID` - `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT` - Foundry project URL used by the eval step. +- `AZURE_OPENAI_DEPLOYMENT` - existing Azure OpenAI deployment used as the + evaluator/judge model. Reuse the local AgentOps/azd value when available. - `APPLICATIONINSIGHTS_CONNECTION_STRING` - optional fallback as a variable or secret. Generated workflows first try to auto-discover App Insights from the Foundry project endpoint; this value makes eval and diff --git a/tests/unit/test_cicd.py b/tests/unit/test_cicd.py index 0383b2d..9370856 100644 --- a/tests/unit/test_cicd.py +++ b/tests/unit/test_cicd.py @@ -53,6 +53,7 @@ def test_kinds_filter_subset(tmp_path: Path) -> None: "agentops-pr.yml", "agentops-deploy-dev.yml", } + assert result.kinds == ["pr", "dev"] assert (tmp_path / _PR_PATH).exists() assert (tmp_path / _DEV_PATH).exists() assert not (tmp_path / _QA_PATH).exists() @@ -514,12 +515,51 @@ def test_cli_next_steps_mention_environments(tmp_path: Path) -> None: out = result.stdout assert "Deploy mode" in out assert "placeholder (auto default)" in out - assert "Next steps" in out + assert "Next" in out assert "dev" in out and "qa" in out and "production" in out assert "OIDC" in out or "Workload Identity Federation" in out assert "branch" in out.lower() +def test_cli_next_steps_for_pr_watchdog_only_do_not_request_deploy_envs( + tmp_path: Path, +) -> None: + (tmp_path / "agentops.yaml").write_text( + "version: 1\nagent: travel-agent:1\ndataset: .agentops/data/smoke.jsonl\n", + encoding="utf-8", + ) + data_path = tmp_path / ".agentops" / "data" / "smoke.jsonl" + data_path.parent.mkdir(parents=True) + data_path.write_text( + '{"input": "Plan a trip", "expected": "A useful itinerary"}\n', + encoding="utf-8", + ) + + result = runner.invoke( + app, + [ + "workflow", + "generate", + "--dir", + str(tmp_path), + "--kinds", + "pr,watchdog", + ], + ) + + assert result.exit_code == 0 + out = result.stdout + assert "Microsoft Foundry AI Agent Evaluation" in out + assert "used only by deploy workflows" in out + assert "create GitHub environment: dev" in out + assert "qa" not in out + assert "production" not in out + assert "deploy not needed yet" in out + assert "agentops skills install --platform copilot" in out + assert "/skills" in out and "agentops-workflow" in out + assert "gh repo create " in out + + # --------------------------------------------------------------------------- # Azure DevOps platform # --------------------------------------------------------------------------- diff --git a/tests/unit/test_eval_analysis.py b/tests/unit/test_eval_analysis.py index dd57842..239f0f6 100644 --- a/tests/unit/test_eval_analysis.py +++ b/tests/unit/test_eval_analysis.py @@ -119,8 +119,11 @@ def test_cli_eval_analyze_text(tmp_path: Path) -> None: assert result.exit_code == 0, result.stdout assert "AgentOps eval analysis" in result.stdout - assert "Config status: ready" in result.stdout - assert "Copilot skills installed: no" in result.stdout + assert "Readiness" in result.stdout + assert "config" in result.stdout + assert "ready" in result.stdout + assert "Copilot skills" in result.stdout + assert "not needed - no Copilot handoff for eval setup" in result.stdout def test_cli_eval_analyze_json(tmp_path: Path) -> None: @@ -165,4 +168,3 @@ def test_cli_eval_analyze_invalid_format_fails(tmp_path: Path) -> None: assert result.exit_code == 1 assert "--format must be text, markdown, or json" in result.output - diff --git a/tests/unit/test_init_command.py b/tests/unit/test_init_command.py index 3f23fec..2e985d7 100644 --- a/tests/unit/test_init_command.py +++ b/tests/unit/test_init_command.py @@ -319,7 +319,6 @@ def test_run_wizard_calls_on_answer_for_each_validated_input( "https://acct.services.ai.azure.com/api/projects/p", # project_endpoint "my-agent:1", # agent ".agentops/data/smoke.jsonl", # dataset - "", # appinsights -> skip ] ) @@ -332,7 +331,6 @@ def _prompt(_question: str, _default): # noqa: ANN001 tmp_path, prompt=_prompt, echo=lambda _msg: None, - include_appinsights=True, on_answer=lambda field, value: captured.append((field, value)), ) @@ -364,7 +362,9 @@ def test_run_wizard_skips_questions_when_defaults_present( monkeypatch.delenv(ENV_KEY_PROJECT_ENDPOINT, raising=False) monkeypatch.delenv(ENV_KEY_APPINSIGHTS, raising=False) - # Pre-populate agentops.yaml and the active azd env so all 4 values resolve. + # Pre-populate agentops.yaml and the active azd env so all interactive + # values resolve. App Insights may exist, but the wizard no longer manages + # it interactively. (tmp_path / "agentops.yaml").write_text( "version: 1\nagent: my-agent:1\ndataset: .agentops/data/smoke.jsonl\n", encoding="utf-8", @@ -391,32 +391,20 @@ def _prompt(question: str, _default): # noqa: ANN001 prompt_calls.append(question) return "" - run_wizard( - tmp_path, - prompt=_prompt, - echo=echo_lines.append, - include_appinsights=True, - ) + run_wizard(tmp_path, prompt=_prompt, echo=echo_lines.append) # Zero questions asked — every default was satisfied. assert prompt_calls == [], ( f"Wizard should not prompt when all defaults are present, asked: {prompt_calls}" ) - # All four confirmation lines emitted, plus the closing hint. + # All three interactive confirmation lines emitted, plus the closing hint. # The leading glyph is "✓" on UTF-8 stdouts and "*" on cp1252; accept either. confirmations = [ line for line in echo_lines if line.startswith((" ✓ ", " * ")) ] - assert len(confirmations) == 4, echo_lines + assert len(confirmations) == 3, echo_lines assert any("--reconfigure" in line for line in echo_lines), echo_lines - - # AppInsights confirmation must mask the secret (no raw key visible). - appinsights_line = next( - line for line in confirmations if "Application Insights" in line - ) - assert "InstrumentationKey" not in appinsights_line - # Bullet glyph is either "•" (Unicode) or "*" (cp1252 fallback). - assert ("•" in appinsights_line) or ("*" in appinsights_line) + assert "Application Insights" not in "\n".join(echo_lines) def test_run_wizard_reconfigure_forces_questions_even_when_defaults_present( @@ -462,7 +450,6 @@ def _prompt(question: str, _default): # noqa: ANN001 tmp_path, prompt=_prompt, echo=lambda _msg: None, - include_appinsights=True, reconfigure=True, ) @@ -470,6 +457,4 @@ def _prompt(question: str, _default): # noqa: ANN001 "Foundry project endpoint", "Agent", "Dataset path", - "Application Insights connection string", ] - diff --git a/tests/unit/test_official_eval.py b/tests/unit/test_official_eval.py index 484bba7..155f540 100644 --- a/tests/unit/test_official_eval.py +++ b/tests/unit/test_official_eval.py @@ -35,7 +35,7 @@ def test_analyze_official_eval_support_for_prompt_agent(tmp_path: Path) -> None: assert support.agent_ids == "support-agent:4" assert "builtin.coherence" in support.official_evaluators assert "builtin.text_similarity" in support.official_evaluators - assert any("avg_latency_seconds" in warning for warning in support.warnings) + assert support.warnings == () def test_prepare_official_eval_writes_data_and_metadata(tmp_path: Path) -> None: @@ -59,6 +59,7 @@ def test_prepare_official_eval_writes_data_and_metadata(tmp_path: Path) -> None: assert metadata["deployment_name"] == "gpt-4o-mini" assert metadata["items_total"] == 1 assert metadata["machine_readable_thresholds"] is False + assert metadata["skipped_agentops_evaluators"] == ["avg_latency_seconds"] def test_prepare_official_eval_records_preview_runner_refs( diff --git a/tests/unit/test_setup_wizard.py b/tests/unit/test_setup_wizard.py index 5bfb0f7..a4792b8 100644 --- a/tests/unit/test_setup_wizard.py +++ b/tests/unit/test_setup_wizard.py @@ -349,10 +349,10 @@ def _prompt(question: str, default): # noqa: ANN001 return _prompt -def test_run_wizard_collects_all_four_answers( +def test_run_wizard_collects_core_answers( tmp_path: Path, monkeypatch: pytest.MonkeyPatch ): - # Isolate from the developer shell so all four questions are asked. + # Isolate from the developer shell so all interactive questions are asked. monkeypatch.delenv("AZURE_AI_FOUNDRY_PROJECT_ENDPOINT", raising=False) monkeypatch.delenv("APPLICATIONINSIGHTS_CONNECTION_STRING", raising=False) @@ -362,21 +362,20 @@ def test_run_wizard_collects_all_four_answers( "https://acct.services.ai.azure.com/api/projects/p", "my-bot:9", "data.jsonl", - "InstrumentationKey=zzz", ] ) answers = run_wizard(tmp_path, prompt=prompt, echo=lambda _msg: None) assert answers.project_endpoint == "https://acct.services.ai.azure.com/api/projects/p" assert answers.agent == "my-bot:9" assert answers.dataset == "data.jsonl" - assert answers.appinsights_connection_string == "InstrumentationKey=zzz" + assert answers.appinsights_connection_string is None -def test_run_wizard_skips_appinsights_when_disabled( +def test_run_wizard_does_not_prompt_for_appinsights( tmp_path: Path, monkeypatch: pytest.MonkeyPatch ): monkeypatch.delenv("AZURE_AI_FOUNDRY_PROJECT_ENDPOINT", raising=False) - monkeypatch.delenv("APPLICATIONINSIGHTS_CONNECTION_STRING", raising=False) + monkeypatch.setenv("APPLICATIONINSIGHTS_CONNECTION_STRING", "InstrumentationKey=zzz") (tmp_path / "data.jsonl").write_text("{}\n", encoding="utf-8") prompt = _scripted_prompt( @@ -386,10 +385,10 @@ def test_run_wizard_skips_appinsights_when_disabled( "data.jsonl", ] ) - answers = run_wizard( - tmp_path, prompt=prompt, echo=lambda _msg: None, include_appinsights=False - ) + messages: list[str] = [] + answers = run_wizard(tmp_path, prompt=prompt, echo=messages.append) assert answers.appinsights_connection_string is None + assert "Application Insights" not in "\n".join(messages) def test_run_wizard_empty_input_keeps_current( @@ -404,9 +403,9 @@ def test_run_wizard_empty_input_keeps_current( ) (tmp_path / "keep.jsonl").write_text("{}\n", encoding="utf-8") # With idempotent skip-on-default, agent/dataset are silently reused. - # Only the 2 still-empty questions (project_endpoint, appinsights) get - # asked, and the user presses Enter on both — yielding zero updates. - prompt = _scripted_prompt(["", ""]) + # Only the still-empty project endpoint gets asked; App Insights is left + # for runtime discovery or explicit non-interactive configuration. + prompt = _scripted_prompt([""]) answers = run_wizard( tmp_path, prompt=prompt, echo=lambda _msg: None, reconfigure=False ) @@ -416,33 +415,79 @@ def test_run_wizard_empty_input_keeps_current( assert answers.appinsights_connection_string is None -def test_run_wizard_appinsights_prompt_guides_autodiscovery( +def test_run_wizard_force_prompt_fields_reasks_seed_agent_and_dataset( + tmp_path: Path, monkeypatch: pytest.MonkeyPatch +): + monkeypatch.delenv("AZURE_AI_FOUNDRY_PROJECT_ENDPOINT", raising=False) + monkeypatch.delenv("APPLICATIONINSIGHTS_CONNECTION_STRING", raising=False) + _seed_azd_env( + tmp_path, + "dev", + { + "AZURE_AI_FOUNDRY_PROJECT_ENDPOINT": ( + "https://acct.services.ai.azure.com/api/projects/p" + ), + "APPLICATIONINSIGHTS_CONNECTION_STRING": "InstrumentationKey=zzz", + }, + ) + (tmp_path / "agentops.yaml").write_text( + "version: 1\nagent: my-agent:1\ndataset: .agentops/data/smoke.jsonl\n", + encoding="utf-8", + ) + data_dir = tmp_path / ".agentops" / "data" + data_dir.mkdir(parents=True, exist_ok=True) + (data_dir / "smoke.jsonl").write_text("{}\n", encoding="utf-8") + (data_dir / "travel-smoke.jsonl").write_text("{}\n", encoding="utf-8") + + replies = iter(["travel-agent:1", ".agentops/data/travel-smoke.jsonl"]) + prompt_calls: list[tuple[str, object]] = [] + + def prompt(question: str, default): # noqa: ANN001 + prompt_calls.append((question, default)) + return next(replies) + + answers = run_wizard( + tmp_path, + prompt=prompt, + echo=lambda _msg: None, + force_prompt_fields={"agent", "dataset"}, + ) + + assert prompt_calls == [ + ("Agent", "my-agent:1"), + ("Dataset path", ".agentops/data/smoke.jsonl"), + ] + assert answers.project_endpoint is None + assert answers.agent == "travel-agent:1" + assert answers.dataset == ".agentops/data/travel-smoke.jsonl" + assert answers.appinsights_connection_string is None + + +def test_run_wizard_appinsights_is_not_interactive_even_when_missing( tmp_path: Path, monkeypatch: pytest.MonkeyPatch ): - """The optional App Insights prompt should tell users they can press - Enter and rely on Foundry project auto-discovery instead of forcing a - connection string they may not know.""" + """The wizard should not ask for App Insights just to leave it blank.""" monkeypatch.delenv("AZURE_AI_FOUNDRY_PROJECT_ENDPOINT", raising=False) monkeypatch.delenv("APPLICATIONINSIGHTS_CONNECTION_STRING", raising=False) messages: list[str] = [] - prompt_defaults: list[object] = [] + prompt_calls: list[str] = [] - def prompt(_question: str, default): # noqa: ANN001 - prompt_defaults.append(default) + def prompt(question: str, _default): # noqa: ANN001 + prompt_calls.append(question) return "" run_wizard(tmp_path, prompt=prompt, echo=messages.append) - assert any("auto-discover it from the Foundry project" in m for m in messages) - assert prompt_defaults[-1] is None + output = "\n".join(messages) + assert "Application Insights" not in output + assert "Application Insights connection string" not in prompt_calls -def test_run_wizard_appinsights_reconfigure_masks_existing_secret( +def test_run_wizard_reconfigure_does_not_echo_appinsights_secret( tmp_path: Path, monkeypatch: pytest.MonkeyPatch ): - """Reconfigure mode should not echo the full App Insights connection - string as a prompt default.""" + """Reconfigure mode should not surface App Insights in the wizard.""" monkeypatch.delenv("AZURE_AI_FOUNDRY_PROJECT_ENDPOINT", raising=False) monkeypatch.delenv("APPLICATIONINSIGHTS_CONNECTION_STRING", raising=False) _seed_azd_env( @@ -464,18 +509,18 @@ def test_run_wizard_appinsights_reconfigure_masks_existing_secret( (tmp_path / "keep.jsonl").write_text("{}\n", encoding="utf-8") messages: list[str] = [] - prompt_defaults: list[object] = [] + prompt_calls: list[str] = [] - def prompt(_question: str, default): # noqa: ANN001 - prompt_defaults.append(default) + def prompt(question: str, _default): # noqa: ANN001 + prompt_calls.append(question) return "" run_wizard(tmp_path, prompt=prompt, echo=messages.append, reconfigure=True) - assert prompt_defaults[-1] is None + assert "Application Insights connection string" not in prompt_calls output = "\n".join(messages) assert "secret1234" not in output - assert "1234" in output + assert "Application Insights" not in output def test_run_wizard_re_prompts_on_invalid_input( @@ -492,7 +537,6 @@ def test_run_wizard_re_prompts_on_invalid_input( "bare-name", "my-bot:2", "data.jsonl", - "", ] ) errors: list[str] = [] diff --git a/tests/unit/test_telemetry.py b/tests/unit/test_telemetry.py index f5b73bc..e63dab6 100644 --- a/tests/unit/test_telemetry.py +++ b/tests/unit/test_telemetry.py @@ -536,6 +536,144 @@ def query_resource( assert payload.p95_duration_seconds == 2.5 +def test_azure_monitor_uses_connection_string_application_id( + monkeypatch: pytest.MonkeyPatch, +) -> None: + queries: list[str] = [] + + def fake_query(application_id: str, bearer: str, query: str) -> dict[str, object]: + assert application_id == "app-from-env" + assert bearer == "fake-bearer" + queries.append(query) + if "request_count" in query: + return _app_insights_result( + { + "request_count": 4, + "error_count": 1, + "avg_duration_ms": 1500.0, + "p95_duration_ms": 3200.0, + } + ) + if "content_filter" in query: + return _app_insights_result({"hits": 2}) + if "input_tokens" in query: + return _app_insights_result( + {"input_tokens": 120, "output_tokens": 45} + ) + return _app_insights_result({"hits": 3}) + + monkeypatch.setenv( + "APPLICATIONINSIGHTS_CONNECTION_STRING", + "InstrumentationKey=ikey;ApplicationId=app-from-env", + ) + monkeypatch.setattr( + azure_monitor, + "_acquire_application_insights_token", + lambda: "fake-bearer", + ) + monkeypatch.setattr( + azure_monitor, + "_query_application_insights", + fake_query, + ) + + payload = azure_monitor.collect_azure_monitor( + AzureMonitorSourceConfig(enabled=True), + lookback_days=7, + ) + + assert payload.diagnostics["status"] == "ok" + assert payload.diagnostics["target_kind"] == "application_id" + assert payload.diagnostics["target_source"] == "APPLICATIONINSIGHTS_CONNECTION_STRING" + assert payload.request_count == 4 + assert payload.error_count == 1 + assert payload.error_rate == 0.25 + assert payload.avg_duration_seconds == 1.5 + assert payload.p95_duration_seconds == 3.2 + assert payload.safety_violations == [{"signal": "content_filter", "hits": 2}] + assert payload.input_token_count == 120 + assert payload.output_token_count == 45 + assert payload.rate_limit_429_count == 3 + assert any("union isfuzzy=true requests, dependencies" in q for q in queries) + + +def test_azure_monitor_uses_foundry_discovered_application_id( + monkeypatch: pytest.MonkeyPatch, +) -> None: + from agentops.utils import foundry_discovery + + monkeypatch.delenv("APPLICATIONINSIGHTS_CONNECTION_STRING", raising=False) + monkeypatch.delenv("AGENTOPS_APPLICATIONINSIGHTS_CONNECTION_STRING", raising=False) + monkeypatch.setattr( + foundry_discovery, + "resolve_appinsights_connection_from_env_with_reason", + lambda: ("InstrumentationKey=ikey;ApplicationId=app-from-foundry", None), + ) + monkeypatch.setattr( + azure_monitor, + "_acquire_application_insights_token", + lambda: "fake-bearer", + ) + monkeypatch.setattr( + azure_monitor, + "_query_application_insights", + lambda *_args: _app_insights_result( + { + "request_count": 1, + "error_count": 0, + "avg_duration_ms": 100.0, + "p95_duration_ms": 100.0, + } + ), + ) + + payload = azure_monitor.collect_azure_monitor( + AzureMonitorSourceConfig(enabled=True), + lookback_days=7, + ) + + assert payload.diagnostics["status"] == "ok" + assert payload.diagnostics["target"] == "app-from-foundry" + assert payload.diagnostics["target_source"] == "foundry_project_telemetry" + assert payload.request_count == 1 + + +def test_azure_monitor_skipped_when_connection_string_lacks_application_id( + monkeypatch: pytest.MonkeyPatch, +) -> None: + from agentops.utils import foundry_discovery + + monkeypatch.setenv( + "APPLICATIONINSIGHTS_CONNECTION_STRING", + "InstrumentationKey=ikey", + ) + monkeypatch.setattr( + foundry_discovery, + "resolve_appinsights_connection_from_env_with_reason", + lambda: (None, "AZURE_AI_FOUNDRY_PROJECT_ENDPOINT is not set"), + ) + + payload = azure_monitor.collect_azure_monitor( + AzureMonitorSourceConfig(enabled=True), + lookback_days=7, + ) + + assert payload.diagnostics["status"] == "skipped" + assert "no App Insights ApplicationId" in payload.diagnostics["reason"] + assert payload.diagnostics["discovery_reason"] + + +def _app_insights_result(row: dict[str, object]) -> dict[str, object]: + return { + "tables": [ + { + "columns": [{"name": name} for name in row], + "rows": [list(row.values())], + } + ] + } + + def test_run_evaluation_flushes_telemetry_on_error( monkeypatch: pytest.MonkeyPatch, tmp_path: Path, diff --git a/tests/unit/test_trace_promotion.py b/tests/unit/test_trace_promotion.py index 4307b49..64c6d0f 100644 --- a/tests/unit/test_trace_promotion.py +++ b/tests/unit/test_trace_promotion.py @@ -109,4 +109,4 @@ def test_promote_traces_cli_does_not_double_bullet_truncated_rows(tmp_path: Path assert result.exit_code == 0, result.stdout assert "\n-- word" not in result.stdout - assert "\n- word" in result.stdout + assert "\n 1. word" in result.stdout diff --git a/tests/unit/test_workflow_analysis.py b/tests/unit/test_workflow_analysis.py index 4eb1f69..0fa2beb 100644 --- a/tests/unit/test_workflow_analysis.py +++ b/tests/unit/test_workflow_analysis.py @@ -65,13 +65,28 @@ def test_analysis_recommends_official_eval_for_supported_prompt_agent(tmp_path: analysis = analyze_workflow_project(tmp_path) rendered = render_workflow_analysis(analysis, "text") + markdown = render_workflow_analysis(analysis, "markdown") assert analysis.recommended_deploy_mode == "prompt-agent" assert analysis.recommended_eval_runner == "official-ai-agent-evaluation" assert recommended_eval_runner(tmp_path) == "official-ai-agent-evaluation" assert "builtin.coherence" in analysis.official_evaluators assert any(signal.key == "official_ai_agent_evaluation" for signal in analysis.signals) - assert "Recommended eval runner: official-ai-agent-evaluation" in rendered + assert "Recommendation" in rendered + assert "evaluate" in rendered + assert "Microsoft Foundry AI Agent Evaluation" in rendered + assert "prompt agent and dataset are compatible" in rendered + assert "evaluation. (agentops.yaml)" in rendered + assert "workflow edits" in rendered + assert "not needed - generated workflow should work as-is" in rendered + assert "Copilot skills" in rendered + assert "not needed - no Copilot handoff for this project shape" in rendered + assert "Foundry eval" in rendered + assert "- [x]" not in rendered + assert "builtin." not in rendered + assert "| Check" not in rendered + assert "JSON shape" not in rendered + assert "| Check | Status | Explanation |" in markdown def test_analysis_uses_placeholder_for_generic_repo(tmp_path: Path) -> None: @@ -163,8 +178,11 @@ def test_cli_workflow_analyze_text(tmp_path: Path) -> None: assert result.exit_code == 0, result.stdout assert "AgentOps workflow analysis" in result.stdout - assert "Recommended deploy mode: azd" in result.stdout - assert "Copilot skills installed: no" in result.stdout + assert "Recommendation" in result.stdout + assert "deploy" in result.stdout + assert "azd" in result.stdout + assert "no Copilot handoff" in result.stdout + assert "- [x]" not in result.stdout def test_cli_workflow_analyze_json(tmp_path: Path) -> None: