Skip to content

fix: Foundry E2E quick-wins (#111, #115, #116, #117)#118

Merged
Dongbumlee merged 1 commit into
developfrom
fix/foundry-eval-quick-wins
Apr 29, 2026
Merged

fix: Foundry E2E quick-wins (#111, #115, #116, #117)#118
Dongbumlee merged 1 commit into
developfrom
fix/foundry-eval-quick-wins

Conversation

@Dongbumlee
Copy link
Copy Markdown
Collaborator

Bundles the four lowest-risk findings from the Foundry agent end-to-end validation in #78 into a single, low-risk PR. No backend changes; all fixes are docs, templates, or CLI surface.

Fixes

Out of scope (separate follow-up issues)

Validation

$ python3 -m pytest tests/ -x -q
285 passed in 1.27s

3 new unit tests for config validate (success, missing file, invalid schema).

Manual smoke:

$ agentops config validate -c .agentops/run-foundry-conversational.yaml
✅ ... is valid (version=1, target.type=agent, target.hosting=foundry, target.execution_mode=remote, bundle=conversational_agent_baseline, dataset=smoke-conversational)
$ agentops config validate -c does/not/exist.yaml; echo $?
Error: config file not found: does/not/exist.yaml
1

Bundles together the four lowest-risk findings from the Foundry agent
end-to-end validation in #78. No backend changes; all fixes are docs,
templates, or CLI surface.

* #115 - implement `agentops config validate -c <path>`. Loads the
  given run.yaml through the same Pydantic models the runner uses;
  prints a one-line success summary or a clear validation error and
  exits non-zero on failure. Replaces the previous '(planned)' stub.

* #116 - raise default `avg_latency_seconds` thresholds in the four
  baseline bundles to reflect Foundry cloud-evaluation reality
  (~15-25s/row including the judge-evaluator pipeline). 10s -> 30s
  for conversational/model/rag/safe; 15s -> 45s for agent_workflow.
  Tutorial tables updated to match.

* #117 - document in docs/bundles.md that the agent_workflow_baseline
  is only meaningful against agents that actually expose tool
  definitions, and that avg_latency_seconds measures the full
  pipeline rather than the agent alone.

* #111 - document in docs/tutorial-basic-foundry-agent.md and
  docs/how-it-works.md that omitting the `:version` suffix on a
  named Foundry agent resolves to 'latest' at run time, and that
  CI / baseline runs should pin a version for reproducibility.

Tests: 285 passed (3 new for config validate).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant