Test: Validate single Foundry agent evaluation

## Objective

Validate that a single Foundry agent can be evaluated end-to-end with AgentOps and produce stable reports, thresholds, and comparison output.

## Scope

Use a remote Foundry agent target with `type: agent`, `hosting: foundry`, `execution_mode: remote`, and `endpoint.kind: foundry_agent`. For the bundle, default to `conversational_agent_baseline` for a plain assistant/Q&A agent, or use `agent_workflow_baseline` if the chosen agent already uses tools. Foundry agents are a supported remote target, and `agent_mode` is a valid Foundry-only dimension.

## Tasks

- [ ] Run `agentops init`
- [ ] Create a dedicated run config for the selected Foundry agent
- [ ] Configure `agent_id`, judge `model`, and `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT`
- [ ] Execute `agentops eval run`
- [ ] Review `results.json` and `report.md`
- [ ] Make one small instruction/config change and rerun
- [ ] Run `agentops eval compare --runs <baseline>,latest`
- [ ] Log bugs, usability issues, and docs gaps

## Acceptance Criteria

- A Foundry agent can be evaluated successfully end-to-end
- The run produces `results.json` and `report.md`
- Threshold behavior is clear and repeatable
- Comparison output is understandable and useful for regression analysis
- Follow-up backlog items are captured

## Out of Scope

- Improving the agent itself
- Adding new evaluators in this sprint
- CI/CD integration for this scenario


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test: Validate single Foundry agent evaluation #78

Objective

Scope

Tasks

Acceptance Criteria

Out of Scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Test: Validate single Foundry agent evaluation #78

Description

Objective

Scope

Tasks

Acceptance Criteria

Out of Scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions