Objective
Validate that a single Foundry agent can be evaluated end-to-end with AgentOps and produce stable reports, thresholds, and comparison output.
Scope
Use a remote Foundry agent target with type: agent, hosting: foundry, execution_mode: remote, and endpoint.kind: foundry_agent. For the bundle, default to conversational_agent_baseline for a plain assistant/Q&A agent, or use agent_workflow_baseline if the chosen agent already uses tools. Foundry agents are a supported remote target, and agent_mode is a valid Foundry-only dimension.
Tasks
Acceptance Criteria
- A Foundry agent can be evaluated successfully end-to-end
- The run produces
results.json and report.md
- Threshold behavior is clear and repeatable
- Comparison output is understandable and useful for regression analysis
- Follow-up backlog items are captured
Out of Scope
- Improving the agent itself
- Adding new evaluators in this sprint
- CI/CD integration for this scenario
Objective
Validate that a single Foundry agent can be evaluated end-to-end with AgentOps and produce stable reports, thresholds, and comparison output.
Scope
Use a remote Foundry agent target with
type: agent,hosting: foundry,execution_mode: remote, andendpoint.kind: foundry_agent. For the bundle, default toconversational_agent_baselinefor a plain assistant/Q&A agent, or useagent_workflow_baselineif the chosen agent already uses tools. Foundry agents are a supported remote target, andagent_modeis a valid Foundry-only dimension.Tasks
agentops initagent_id, judgemodel, andAZURE_AI_FOUNDRY_PROJECT_ENDPOINTagentops eval runresults.jsonandreport.mdagentops eval compare --runs <baseline>,latestAcceptance Criteria
results.jsonandreport.mdOut of Scope