Skip to content

Test: Validate single Foundry agent evaluation #78

@placerda

Description

@placerda

Objective

Validate that a single Foundry agent can be evaluated end-to-end with AgentOps and produce stable reports, thresholds, and comparison output.

Scope

Use a remote Foundry agent target with type: agent, hosting: foundry, execution_mode: remote, and endpoint.kind: foundry_agent. For the bundle, default to conversational_agent_baseline for a plain assistant/Q&A agent, or use agent_workflow_baseline if the chosen agent already uses tools. Foundry agents are a supported remote target, and agent_mode is a valid Foundry-only dimension.

Tasks

  • Run agentops init
  • Create a dedicated run config for the selected Foundry agent
  • Configure agent_id, judge model, and AZURE_AI_FOUNDRY_PROJECT_ENDPOINT
  • Execute agentops eval run
  • Review results.json and report.md
  • Make one small instruction/config change and rerun
  • Run agentops eval compare --runs <baseline>,latest
  • Log bugs, usability issues, and docs gaps

Acceptance Criteria

  • A Foundry agent can be evaluated successfully end-to-end
  • The run produces results.json and report.md
  • Threshold behavior is clear and repeatable
  • Comparison output is understandable and useful for regression analysis
  • Follow-up backlog items are captured

Out of Scope

  • Improving the agent itself
  • Adding new evaluators in this sprint
  • CI/CD integration for this scenario

Metadata

Metadata

Assignees

No one assigned

    Labels

    sprint3Sprint 3 test scenarios

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions