Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,24 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres
- Updated the tutorials to prefer the interactive `agentops init` wizard,
explain evaluator deployment separately from initialization, and include
forced regression/fix loops for prompt and hosted agent paths.
- Re-ask starter `agent` and `dataset` values during the first interactive
`agentops init` run so tutorial users replace `my-agent:1` with their target.
- Removed the interactive App Insights question from `agentops init`; runtime
commands discover it from the Foundry project when possible, and
`--appinsights-connection-string` remains available for explicit setup.
- Made `workflow analyze` output use a lighter PowerShell-friendly summary,
Markdown tables, and user-facing Foundry eval labels; also removed a
non-actionable latency warning from the normal analysis output.
- Made `workflow generate` next steps gentler for PowerShell and tutorial users:
PR/watchdog-only output now asks for only the `dev` environment, explains
that deploy setup can wait, and points users to Copilot-assisted GitHub/OIDC
setup.

### Fixed
- **Doctor App Insights discovery.** The `azure_monitor` source now falls back
to an App Insights `ApplicationId` from `APPLICATIONINSIGHTS_CONNECTION_STRING`
or Foundry project telemetry discovery, so Doctor no longer reports runtime
telemetry as unconfigured when Cockpit can already resolve App Insights.

## [0.2.0] - 2026-05-22

Expand Down
56 changes: 41 additions & 15 deletions docs/ci-github-actions.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ deployment wiring (azd, prompt-agent, or placeholder) and the eval runner.
Generate the PR gate first: `agentops workflow generate --kinds pr`. Add
DEV/QA/PROD after GitHub Environments and Azure OIDC are ready. Repos with
`azure.yaml` use azd-backed deploys; Foundry prompt agents can use
prompt-agent deploys and the official Microsoft Foundry AI Agent Evaluation
runner when the dataset is compatible.
prompt-agent deploys and the Microsoft Foundry AI Agent Evaluation runner when
the dataset is compatible.

The full scaffold ships five templates:

Expand Down Expand Up @@ -77,6 +77,32 @@ agentops workflow generate --kinds pr
agentops workflow generate --kinds pr,dev,qa,prod --deploy-mode auto --force
```

## Copilot-assisted setup

The GitHub setup spans repository creation, Azure OIDC, Actions variables,
GitHub Environments, and branch protection. For a smoother first run, install
the AgentOps workflow skill and hand this setup to Copilot:

```bash
agentops skills install --platform copilot
```

Then open Copilot and run `/skills`. Confirm `agentops-workflow` is loaded
before continuing.

When the skill is loaded, ask Copilot:

```text
Use the AgentOps workflow skill to get the generated AgentOps GitHub Actions
workflows running end to end.

This may be a new folder with no Git repo or GitHub remote yet. Create or
connect the GitHub repo if needed, wire Azure OIDC and required Actions
variables, create only the environments used by the generated workflows, show me
the plan before changing GitHub or Azure, and call out anything that needs
owner/admin permission.
```

## Configuration walkthrough

### 1. Repository variables (OIDC)
Expand All @@ -89,7 +115,7 @@ In Settings → Secrets and variables → Actions → **Variables**, add:
| `AZURE_TENANT_ID` | Azure AD tenant |
| `AZURE_SUBSCRIPTION_ID` | Target subscription |
| `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT` | Foundry project URL (used by the eval step) |
| `AZURE_OPENAI_DEPLOYMENT` | Model deployment used by local evaluators and the official AI Agent Evaluation runner |
| `AZURE_OPENAI_DEPLOYMENT` | Model deployment used by local evaluators and Microsoft Foundry AI Agent Evaluation |
| `APPLICATIONINSIGHTS_CONNECTION_STRING` | Optional fallback when the Foundry project's App Insights connection cannot be auto-discovered |

Then on the Azure side, configure Workload Identity Federation
Expand Down Expand Up @@ -165,9 +191,9 @@ signals, and existing CI folders. README matches such as GPT-RAG, Live Voice, or
AI Landing Zone are treated as hints; structural files drive the recommendation.
`workflow generate --deploy-mode auto` uses the same recommendation, so the
analysis and generated templates do not drift. The analyzer also reports the
eval runner: `official-ai-agent-evaluation` for compatible Foundry prompt
agents, otherwise `agentops-local`. If you omit `--deploy-mode`, the default is
`auto`; the command output prints the selected effective mode, for example
eval runner: Microsoft Foundry AI Agent Evaluation for compatible Foundry prompt
agents, otherwise AgentOps local eval. If you omit `--deploy-mode`, the default
is `auto`; the command output prints the selected effective mode, for example
`azd (auto default)` or `placeholder (auto default)`.

Use one of these modes:
Expand Down Expand Up @@ -237,20 +263,20 @@ Each deploy workflow does this:
1. stages a candidate Foundry prompt-agent version from `prompt_file`;
2. writes `.agentops/deployments/agentops.candidate.yaml` pointing at the
candidate `name:version`;
3. runs the official AI Agent Evaluation runner against that candidate version
3. runs Microsoft Foundry AI Agent Evaluation against that candidate version
when supported, or `agentops eval run` as the local fallback;
4. runs `agentops doctor --evidence-pack` so the exact candidate has release evidence;
5. records `.agentops/deployments/foundry-agent.json` as a CI artifact only
after the gate passes.

This keeps the invariant clear: **the evaluated agent version is the deployed
agent version**. Foundry manages the candidate agent versions; AgentOps
prepares the official-eval input under `.agentops/official-eval/` when that
runner is selected, and always supplies the repo-side gate, deployment record,
and Cockpit visibility.
prepares the Microsoft Foundry eval input under `.agentops/official-eval/` when
that runner is selected, and always supplies the repo-side gate, deployment
record, and Cockpit visibility.

Preview branches can temporarily route the generated GitHub workflow to a fork
of the official eval action before an upstream action PR is merged:
of the Microsoft Foundry eval action before an upstream action PR is merged:

```powershell
$env:AGENTOPS_OFFICIAL_EVAL_ACTION = "placerda/ai-agent-evals@v3-beta"
Expand Down Expand Up @@ -370,9 +396,9 @@ contract to gate deploys:
| `2` | Eval ran, one or more thresholds failed | ❌ fail (deploy never runs) |
| `1` | Runtime / config error | ❌ fail |

When `official-ai-agent-evaluation` is selected, the Microsoft action/task owns
the eval job result. AgentOps still uploads the prepared input and metadata so
the release has repo-side proof of what was evaluated.
When Microsoft Foundry AI Agent Evaluation is selected, the Microsoft
action/task owns the eval job result. AgentOps still uploads the prepared input
and metadata so the release has repo-side proof of what was evaluated.

## Artifacts

Expand All @@ -383,7 +409,7 @@ Each workflow uploads (always - even on failure):
- `cloud_evaluation.json` - present when using Foundry cloud evaluation;
contains a deep link to the New Foundry Experience Evaluations page
- `.agentops/official-eval/input.json`, `metadata.json`, and `result.json` -
present when using the official AI Agent Evaluation runner
present when using Microsoft Foundry AI Agent Evaluation
- `evidence.json` and `evidence.md` - present in PR, PROD, and watchdog
workflows after `agentops doctor --evidence-pack`

Expand Down
8 changes: 4 additions & 4 deletions docs/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,10 +158,10 @@ evidence outputs into a release gate.
| Generic HTTP/JSON endpoint | No | Yes | Use local runner. |
| Raw model deployment (`model:<name>`) | No | Yes | Use local runner. |

For CI pipelines that only need a supported Foundry-native eval, prefer the
official AI Agent Evaluation action or Azure DevOps extension. Use AgentOps when
the repo also needs thresholds, baselines, local fallback, Doctor readiness,
release evidence, or trace-to-regression review.
For CI pipelines that only need a supported Foundry-native eval, prefer
Microsoft Foundry AI Agent Evaluation. Use AgentOps when the repo also needs
thresholds, baselines, local fallback, Doctor readiness, release evidence, or
trace-to-regression review.

## Evaluation Scenarios

Expand Down
11 changes: 6 additions & 5 deletions docs/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,9 @@ is the proof?** It:
5. Writes release evidence with `agentops doctor --evidence-pack`.

Foundry owns agent creation, deployment, runtime, traces, monitoring,
red-teaming, datasets, and official evaluation drilldown. AgentOps references
the candidate those tools produced and adds the repo-controlled release proof:
red-teaming, datasets, and Microsoft-hosted evaluation drilldown. AgentOps
references the candidate those tools produced and adds the repo-controlled
release proof:
config, gates, artifacts, PR reports, Doctor diagnostics, release evidence,
trace-to-regression promotion, and Cockpit links back to Foundry/Azure Monitor.

Expand Down Expand Up @@ -542,9 +543,9 @@ The `execution: cloud` trade-offs (so you can decide consciously):

For CI pipelines that only need a supported Foundry-native eval and do not need
AgentOps artifacts, baselines, Doctor readiness, or release evidence, the
official AI Agent Evaluation GitHub Action or Azure DevOps extension may be the
cleaner entry point. AgentOps is the wrapper when the repo needs a release gate
and proof pack around those signals.
Microsoft Foundry AI Agent Evaluation GitHub Action or Azure DevOps extension
may be the cleaner entry point. AgentOps is the wrapper when the repo needs a
release gate and proof pack around those signals.

Implementation lives in [src/agentops/pipeline/publisher.py](../src/agentops/pipeline/publisher.py)
(Classic) and [src/agentops/pipeline/cloud_runner.py](../src/agentops/pipeline/cloud_runner.py)
Expand Down
43 changes: 28 additions & 15 deletions docs/tutorial-end-to-end.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,11 +225,21 @@ Answer the prompts as the wizard asks them:
| Foundry project endpoint | `https://<resource>.services.ai.azure.com/api/projects/<project>` |
| Agent | The value in `$env:TRAVEL_AGENT_TARGET`, such as `travel-agent:1` or `http://127.0.0.1:8000/chat` |
| Dataset path | `.agentops/data/travel-smoke.jsonl` |
| Application Insights connection string | Paste it if you have one, or press Enter to let AgentOps auto-discover/leave it blank |

The wizard does not ask for App Insights. Later runtime commands such as eval,
Doctor, and Cockpit use the Foundry project endpoint to ask the Azure AI
Projects SDK for the App Insights resource attached to that Foundry project. If
discovery is unavailable and you want to force a value, run
`agentops init --appinsights-connection-string "<connection-string>"` or set
`APPLICATIONINSIGHTS_CONNECTION_STRING` manually in `.azure/dev/.env`.

If the first run shows starter defaults such as `Agent [my-agent:1]` or
`Dataset path [.agentops/data/smoke.jsonl]`, replace them with your Travel Agent
target and dataset. Those defaults only come from the scaffolded starter file.

The wizard saves `agent` and `dataset` to `agentops.yaml`. It saves the Foundry
project endpoint and App Insights connection string to `.azure/dev/.env`, which
is git-ignored and compatible with azd.
project endpoint to `.azure/dev/.env`, which is git-ignored and compatible with
azd. If you force an App Insights connection string later, it is saved there too.

For a hosted HTTP endpoint, add the endpoint protocol fields:

Expand All @@ -252,13 +262,14 @@ Expected result:

| Agent target | Runner |
|---|---|
| `agent: name:version` | `official-ai-agent-evaluation` |
| `agent: name:version` | Microsoft Foundry AI Agent Evaluation |
| `agent: https://...` | `agentops-local` |
| `agent: model:<deployment>` | `agentops-local` |

This is the key alignment rule. Foundry-native prompt agents use the official
runner where possible. AgentOps keeps the local path for hosted endpoints,
models, unsupported evaluator mappings, and repo-specific threshold evidence.
This is the key alignment rule. Foundry-native prompt agents use the Microsoft
Foundry AI Agent Evaluation action/task where possible. AgentOps keeps the local
path for hosted endpoints, models, unsupported evaluator mappings, and
repo-specific threshold evidence.

## 5. Run the first eval

Expand Down Expand Up @@ -287,10 +298,10 @@ Before running that workflow, set the CI variable:
AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini
```

This value is not an `agentops init` answer. It tells the official eval runner
which model deployment should judge responses.
This value is not an `agentops init` answer. It tells the Microsoft Foundry AI
Agent Evaluation runner which model deployment should judge responses.

The generated workflow prepares official eval input under:
The generated workflow prepares Microsoft Foundry eval input under:

```text
.agentops/official-eval/
Expand Down Expand Up @@ -331,8 +342,8 @@ and rerun the same gate.
version such as `travel-agent:3`, re-run `agentops init --reconfigure`, and
run the pipeline again.

This exercises Foundry prompt versioning, the official AI Agent Evaluation
runner, and AgentOps evidence for the exact version under release review.
This exercises Foundry prompt versioning, Microsoft Foundry AI Agent Evaluation,
and AgentOps evidence for the exact version under release review.

### Hosted/HTTP regression

Expand Down Expand Up @@ -388,7 +399,8 @@ The generated workflows are intentionally boring:
Foundry and Azure Monitor own live observability. AgentOps only checks whether
the repo and runtime are wired to those signals.

Set the Application Insights connection string in the active azd env:
If runtime discovery does not find the connected App Insights resource, set the
connection string in the active azd env:

```powershell
agentops init show --reveal-secrets
Expand Down Expand Up @@ -482,7 +494,7 @@ agentops cockpit --workspace .
Use Cockpit as the local command center:

- Foundry connection and deep links;
- official eval or local eval gate status;
- Microsoft Foundry eval or AgentOps local eval gate status;
- Doctor findings;
- release evidence;
- local eval history;
Expand All @@ -496,7 +508,8 @@ You are ready for a release review when:

- The agent target is explicit in `agentops.yaml`.
- CI uses the expected runner for the target.
- Eval results or official eval metadata are attached to the workflow artifact.
- Eval results or Microsoft Foundry eval metadata are attached to the workflow
artifact.
- The workshop includes one deliberate regression and one fixed rerun, either
through Foundry prompt versions or AgentOps local baseline comparison.
- `agentops doctor --evidence-pack` writes `evidence.md`.
Expand Down
18 changes: 14 additions & 4 deletions docs/tutorial-hosted-agent-quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ You need:
| Foundry project endpoint | optional, but recommended for links and evaluators |
| Azure OpenAI endpoint | `https://<resource>.openai.azure.com`, used later by local AI-assisted evaluators |
| Evaluator model deployment | `gpt-4o-mini`, used later by local AI-assisted evaluators |
| Application Insights connection string | optional, but recommended |
| Application Insights connection string | optional later, for observability |

If the deployed endpoint needs a bearer token:

Expand All @@ -192,7 +192,17 @@ Answer the prompts as the wizard asks them:
| Foundry project endpoint | `https://<resource>.services.ai.azure.com/api/projects/<project>`, or press Enter if you are only testing the local endpoint |
| Agent | The value in `$env:TRAVEL_AGENT_ENDPOINT`, for example `http://127.0.0.1:8000/chat` |
| Dataset path | `.agentops/data/travel-smoke.jsonl` |
| Application Insights connection string | Paste it if you have one, or press Enter to let AgentOps auto-discover/leave it blank |

The wizard does not ask for App Insights. Later runtime commands such as eval,
Doctor, and Cockpit use the Foundry project endpoint to ask the Azure AI
Projects SDK for the App Insights resource attached to that Foundry project. If
discovery is unavailable and you want to force a value, run
`agentops init --appinsights-connection-string "<connection-string>"` or set
`APPLICATIONINSIGHTS_CONNECTION_STRING` manually in `.azure/dev/.env`.

If the first run shows starter defaults such as `Agent [my-agent:1]` or
`Dataset path [.agentops/data/smoke.jsonl]`, replace them with the hosted Travel
Agent values above. Those defaults only come from the scaffolded starter file.

If you want an azd environment name other than the default `dev`, run
`agentops init --azd-env <name>`.
Expand All @@ -208,8 +218,8 @@ request_field: message
response_field: text
```

The Foundry project endpoint and App Insights connection string live in
`.azure/dev/.env`, not in source control.
The Foundry project endpoint lives in `.azure/dev/.env`, not in source control.
If you force an App Insights connection string later, it is saved there too.

For a deployed endpoint protected by a bearer token, add:

Expand Down
Loading
Loading