fix: Foundry E2E quick-wins (#111, #115, #116, #117) by Dongbumlee · Pull Request #118 · Azure/agentops

Dongbumlee · 2026-04-29T21:21:09Z

Bundles the four lowest-risk findings from the Foundry agent end-to-end validation in #78 into a single, low-risk PR. No backend changes; all fixes are docs, templates, or CLI surface.

Fixes

Closes Implement or hide 'agentops config validate' (currently a stub) #115 — implement agentops config validate -c <path>. Loads the given run.yaml through the same Pydantic models the runner uses, prints a one-line success summary or a clear error, and exits non-zero on failure.
Closes Default avg_latency_seconds threshold (10s) is unrealistic for Foundry cloud eval #116 — raise default avg_latency_seconds thresholds across baseline bundles to match Foundry cloud-evaluation reality (15–25 s/row observed):
- 10 s → 30 s for conversational_agent_baseline, model_quality_baseline, rag_quality_baseline, safe_agent_baseline
- 15 s → 45 s for agent_workflow_baseline
- Tutorial tables updated to match.
Closes agent_workflow_baseline silently scores ~0 against tool-less agents #117 — document in docs/bundles.md that agent_workflow_baseline is only meaningful against agents that expose tool definitions, and that avg_latency_seconds measures the full pipeline rather than the agent alone.
Closes Document agent_id resolution when version is omitted #111 — document in the Foundry agent tutorial and docs/how-it-works.md that omitting the :version suffix on a named Foundry agent resolves to latest at run time, and that CI / baseline runs should pin a version.

Out of scope (separate follow-up issues)

Foundry cloud-eval: agent response missing from report.md #110 (empty Response in report.md), agentops eval compare: per-run column uses baseline's threshold for all runs #112 (compare uses baseline threshold for all runs), Foundry cloud-eval: per-row latency is faked (run-level avg duplicated) #113 (faked per-row latency) — these require backend / runner changes and are kept for separate review.

Validation

$ python3 -m pytest tests/ -x -q
285 passed in 1.27s

3 new unit tests for config validate (success, missing file, invalid schema).

Manual smoke:

$ agentops config validate -c .agentops/run-foundry-conversational.yaml
✅ ... is valid (version=1, target.type=agent, target.hosting=foundry, target.execution_mode=remote, bundle=conversational_agent_baseline, dataset=smoke-conversational)
$ agentops config validate -c does/not/exist.yaml; echo $?
Error: config file not found: does/not/exist.yaml
1

Bundles together the four lowest-risk findings from the Foundry agent end-to-end validation in #78. No backend changes; all fixes are docs, templates, or CLI surface. * #115 - implement `agentops config validate -c <path>`. Loads the given run.yaml through the same Pydantic models the runner uses; prints a one-line success summary or a clear validation error and exits non-zero on failure. Replaces the previous '(planned)' stub. * #116 - raise default `avg_latency_seconds` thresholds in the four baseline bundles to reflect Foundry cloud-evaluation reality (~15-25s/row including the judge-evaluator pipeline). 10s -> 30s for conversational/model/rag/safe; 15s -> 45s for agent_workflow. Tutorial tables updated to match. * #117 - document in docs/bundles.md that the agent_workflow_baseline is only meaningful against agents that actually expose tool definitions, and that avg_latency_seconds measures the full pipeline rather than the agent alone. * #111 - document in docs/tutorial-basic-foundry-agent.md and docs/how-it-works.md that omitting the `:version` suffix on a named Foundry agent resolves to 'latest' at run time, and that CI / baseline runs should pin a version for reproducibility. Tests: 285 passed (3 new for config validate). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Revert "fix: Foundry E2E quick-wins (#118)"

Dongbumlee merged commit 72b457e into develop Apr 29, 2026
12 checks passed

Dongbumlee added a commit that referenced this pull request Apr 29, 2026

Merge pull request #119 from Azure/revert/foundry-eval-quick-wins

45b324f

Revert "fix: Foundry E2E quick-wins (#118)"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Foundry E2E quick-wins (#111, #115, #116, #117)#118

fix: Foundry E2E quick-wins (#111, #115, #116, #117)#118
Dongbumlee merged 1 commit into
developfrom
fix/foundry-eval-quick-wins

Dongbumlee commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Dongbumlee commented Apr 29, 2026

Fixes

Out of scope (separate follow-up issues)

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant