Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 29 additions & 29 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
@@ -1,29 +1,29 @@
{
"name": "agentops",
"metadata": {
"description": "AgentOps Toolkit plugin marketplace — evaluation skills for Microsoft Foundry agents",
"version": "1.0.0"
},
"owner": {
"name": "AgentOps Toolkit",
"email": "agentops@microsoft.com"
},
"plugins": [
{
"name": "agentops-toolkit",
"source": "../../plugins/agentops",
"description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
"version": "0.2.0",
"keywords": [
"agentops",
"evaluation",
"foundry",
"copilot",
"agent-skills",
"ai-evaluation"
],
"license": "MIT",
"repository": "https://github.com/Azure/agentops"
}
]
}
{
"name": "agentops",
"metadata": {
"description": "AgentOps Toolkit plugin marketplace — evaluation skills for Microsoft Foundry agents",
"version": "1.0.0"
},
"owner": {
"name": "AgentOps Toolkit",
"email": "agentops@microsoft.com"
},
"plugins": [
{
"name": "agentops-toolkit",
"source": "../../plugins/agentops",
"description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
"version": "0.2.1",
"keywords": [
"agentops",
"evaluation",
"foundry",
"copilot",
"agent-skills",
"ai-evaluation"
],
"license": "MIT",
"repository": "https://github.com/Azure/agentops"
}
]
}
58 changes: 29 additions & 29 deletions .github/plugin/marketplace.json
Original file line number Diff line number Diff line change
@@ -1,29 +1,29 @@
{
"name": "agentops",
"metadata": {
"description": "AgentOps Toolkit plugin marketplace — evaluation skills for Microsoft Foundry agents",
"version": "1.0.0"
},
"owner": {
"name": "AgentOps Toolkit",
"email": "agentops@microsoft.com"
},
"plugins": [
{
"name": "agentops-toolkit",
"source": "../../plugins/agentops",
"description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
"version": "0.2.0",
"keywords": [
"agentops",
"evaluation",
"foundry",
"copilot",
"agent-skills",
"ai-evaluation"
],
"license": "MIT",
"repository": "https://github.com/Azure/agentops"
}
]
}
{
"name": "agentops",
"metadata": {
"description": "AgentOps Toolkit plugin marketplace — evaluation skills for Microsoft Foundry agents",
"version": "1.0.0"
},
"owner": {
"name": "AgentOps Toolkit",
"email": "agentops@microsoft.com"
},
"plugins": [
{
"name": "agentops-toolkit",
"source": "../../plugins/agentops",
"description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
"version": "0.2.1",
"keywords": [
"agentops",
"evaluation",
"foundry",
"copilot",
"agent-skills",
"ai-evaluation"
],
"license": "MIT",
"repository": "https://github.com/Azure/agentops"
}
]
}
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
All notable changes to this project will be documented in this file.
This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres to [Semantic Versioning](https://semver.org/).

## [0.2.1] - 2026-05-26


## [Unreleased]

### Changed
Expand All @@ -15,6 +18,24 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres
- Updated the tutorials to prefer the interactive `agentops init` wizard,
explain evaluator deployment separately from initialization, and include
forced regression/fix loops for prompt and hosted agent paths.
- Re-ask starter `agent` and `dataset` values during the first interactive
`agentops init` run so tutorial users replace `my-agent:1` with their target.
- Removed the interactive App Insights question from `agentops init`; runtime
commands discover it from the Foundry project when possible, and
`--appinsights-connection-string` remains available for explicit setup.
- Made `workflow analyze` output use a lighter PowerShell-friendly summary,
Markdown tables, and user-facing Foundry eval labels; also removed a
non-actionable latency warning from the normal analysis output.
- Made `workflow generate` next steps gentler for PowerShell and tutorial users:
PR/watchdog-only output now asks for only the `dev` environment, explains
that deploy setup can wait, and points users to Copilot-assisted GitHub/OIDC
setup.

### Fixed
- **Doctor App Insights discovery.** The `azure_monitor` source now falls back
to an App Insights `ApplicationId` from `APPLICATIONINSIGHTS_CONNECTION_STRING`
or Foundry project telemetry discovery, so Doctor no longer reports runtime
telemetry as unconfigured when Cockpit can already resolve App Insights.

## [0.2.0] - 2026-05-22

Expand Down
56 changes: 41 additions & 15 deletions docs/ci-github-actions.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ deployment wiring (azd, prompt-agent, or placeholder) and the eval runner.
Generate the PR gate first: `agentops workflow generate --kinds pr`. Add
DEV/QA/PROD after GitHub Environments and Azure OIDC are ready. Repos with
`azure.yaml` use azd-backed deploys; Foundry prompt agents can use
prompt-agent deploys and the official Microsoft Foundry AI Agent Evaluation
runner when the dataset is compatible.
prompt-agent deploys and the Microsoft Foundry AI Agent Evaluation runner when
the dataset is compatible.

The full scaffold ships five templates:

Expand Down Expand Up @@ -77,6 +77,32 @@ agentops workflow generate --kinds pr
agentops workflow generate --kinds pr,dev,qa,prod --deploy-mode auto --force
```

## Copilot-assisted setup

The GitHub setup spans repository creation, Azure OIDC, Actions variables,
GitHub Environments, and branch protection. For a smoother first run, install
the AgentOps workflow skill and hand this setup to Copilot:

```bash
agentops skills install --platform copilot
```

Then open Copilot and run `/skills`. Confirm `agentops-workflow` is loaded
before continuing.

When the skill is loaded, ask Copilot:

```text
Use the AgentOps workflow skill to get the generated AgentOps GitHub Actions
workflows running end to end.

This may be a new folder with no Git repo or GitHub remote yet. Create or
connect the GitHub repo if needed, wire Azure OIDC and required Actions
variables, create only the environments used by the generated workflows, show me
the plan before changing GitHub or Azure, and call out anything that needs
owner/admin permission.
```

## Configuration walkthrough

### 1. Repository variables (OIDC)
Expand All @@ -89,7 +115,7 @@ In Settings → Secrets and variables → Actions → **Variables**, add:
| `AZURE_TENANT_ID` | Azure AD tenant |
| `AZURE_SUBSCRIPTION_ID` | Target subscription |
| `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT` | Foundry project URL (used by the eval step) |
| `AZURE_OPENAI_DEPLOYMENT` | Model deployment used by local evaluators and the official AI Agent Evaluation runner |
| `AZURE_OPENAI_DEPLOYMENT` | Model deployment used by local evaluators and Microsoft Foundry AI Agent Evaluation |
| `APPLICATIONINSIGHTS_CONNECTION_STRING` | Optional fallback when the Foundry project's App Insights connection cannot be auto-discovered |

Then on the Azure side, configure Workload Identity Federation
Expand Down Expand Up @@ -165,9 +191,9 @@ signals, and existing CI folders. README matches such as GPT-RAG, Live Voice, or
AI Landing Zone are treated as hints; structural files drive the recommendation.
`workflow generate --deploy-mode auto` uses the same recommendation, so the
analysis and generated templates do not drift. The analyzer also reports the
eval runner: `official-ai-agent-evaluation` for compatible Foundry prompt
agents, otherwise `agentops-local`. If you omit `--deploy-mode`, the default is
`auto`; the command output prints the selected effective mode, for example
eval runner: Microsoft Foundry AI Agent Evaluation for compatible Foundry prompt
agents, otherwise AgentOps local eval. If you omit `--deploy-mode`, the default
is `auto`; the command output prints the selected effective mode, for example
`azd (auto default)` or `placeholder (auto default)`.

Use one of these modes:
Expand Down Expand Up @@ -237,20 +263,20 @@ Each deploy workflow does this:
1. stages a candidate Foundry prompt-agent version from `prompt_file`;
2. writes `.agentops/deployments/agentops.candidate.yaml` pointing at the
candidate `name:version`;
3. runs the official AI Agent Evaluation runner against that candidate version
3. runs Microsoft Foundry AI Agent Evaluation against that candidate version
when supported, or `agentops eval run` as the local fallback;
4. runs `agentops doctor --evidence-pack` so the exact candidate has release evidence;
5. records `.agentops/deployments/foundry-agent.json` as a CI artifact only
after the gate passes.

This keeps the invariant clear: **the evaluated agent version is the deployed
agent version**. Foundry manages the candidate agent versions; AgentOps
prepares the official-eval input under `.agentops/official-eval/` when that
runner is selected, and always supplies the repo-side gate, deployment record,
and Cockpit visibility.
prepares the Microsoft Foundry eval input under `.agentops/official-eval/` when
that runner is selected, and always supplies the repo-side gate, deployment
record, and Cockpit visibility.

Preview branches can temporarily route the generated GitHub workflow to a fork
of the official eval action before an upstream action PR is merged:
of the Microsoft Foundry eval action before an upstream action PR is merged:

```powershell
$env:AGENTOPS_OFFICIAL_EVAL_ACTION = "placerda/ai-agent-evals@v3-beta"
Expand Down Expand Up @@ -370,9 +396,9 @@ contract to gate deploys:
| `2` | Eval ran, one or more thresholds failed | ❌ fail (deploy never runs) |
| `1` | Runtime / config error | ❌ fail |

When `official-ai-agent-evaluation` is selected, the Microsoft action/task owns
the eval job result. AgentOps still uploads the prepared input and metadata so
the release has repo-side proof of what was evaluated.
When Microsoft Foundry AI Agent Evaluation is selected, the Microsoft
action/task owns the eval job result. AgentOps still uploads the prepared input
and metadata so the release has repo-side proof of what was evaluated.

## Artifacts

Expand All @@ -383,7 +409,7 @@ Each workflow uploads (always - even on failure):
- `cloud_evaluation.json` - present when using Foundry cloud evaluation;
contains a deep link to the New Foundry Experience Evaluations page
- `.agentops/official-eval/input.json`, `metadata.json`, and `result.json` -
present when using the official AI Agent Evaluation runner
present when using Microsoft Foundry AI Agent Evaluation
- `evidence.json` and `evidence.md` - present in PR, PROD, and watchdog
workflows after `agentops doctor --evidence-pack`

Expand Down
8 changes: 4 additions & 4 deletions docs/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,10 +158,10 @@ evidence outputs into a release gate.
| Generic HTTP/JSON endpoint | No | Yes | Use local runner. |
| Raw model deployment (`model:<name>`) | No | Yes | Use local runner. |

For CI pipelines that only need a supported Foundry-native eval, prefer the
official AI Agent Evaluation action or Azure DevOps extension. Use AgentOps when
the repo also needs thresholds, baselines, local fallback, Doctor readiness,
release evidence, or trace-to-regression review.
For CI pipelines that only need a supported Foundry-native eval, prefer
Microsoft Foundry AI Agent Evaluation. Use AgentOps when the repo also needs
thresholds, baselines, local fallback, Doctor readiness, release evidence, or
trace-to-regression review.

## Evaluation Scenarios

Expand Down
11 changes: 6 additions & 5 deletions docs/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,9 @@ is the proof?** It:
5. Writes release evidence with `agentops doctor --evidence-pack`.

Foundry owns agent creation, deployment, runtime, traces, monitoring,
red-teaming, datasets, and official evaluation drilldown. AgentOps references
the candidate those tools produced and adds the repo-controlled release proof:
red-teaming, datasets, and Microsoft-hosted evaluation drilldown. AgentOps
references the candidate those tools produced and adds the repo-controlled
release proof:
config, gates, artifacts, PR reports, Doctor diagnostics, release evidence,
trace-to-regression promotion, and Cockpit links back to Foundry/Azure Monitor.

Expand Down Expand Up @@ -542,9 +543,9 @@ The `execution: cloud` trade-offs (so you can decide consciously):

For CI pipelines that only need a supported Foundry-native eval and do not need
AgentOps artifacts, baselines, Doctor readiness, or release evidence, the
official AI Agent Evaluation GitHub Action or Azure DevOps extension may be the
cleaner entry point. AgentOps is the wrapper when the repo needs a release gate
and proof pack around those signals.
Microsoft Foundry AI Agent Evaluation GitHub Action or Azure DevOps extension
may be the cleaner entry point. AgentOps is the wrapper when the repo needs a
release gate and proof pack around those signals.

Implementation lives in [src/agentops/pipeline/publisher.py](../src/agentops/pipeline/publisher.py)
(Classic) and [src/agentops/pipeline/cloud_runner.py](../src/agentops/pipeline/cloud_runner.py)
Expand Down
Loading
Loading