Default avg_latency_seconds threshold (10s) is unrealistic for Foundry cloud eval

## Source
Discovered during E2E validation of #78.

## Problem
`bundles/conversational_agent_baseline.yaml` ships with:
```yaml
- evaluator: avg_latency_seconds
  criteria: "<="
  value: 10.0
```
But Foundry cloud evaluation latency includes the entire judge-evaluator pipeline. Real values measured against `FoundryAgent` on `gpt-5.1`:
- Run 1: 22.47s/row
- Run 2: 14.04s/row
Anyone who follows the Foundry tutorial and runs the smoke dataset against a real Foundry agent will hit immediate FAIL on the latency rule, even though scoring is fine across all 4 quality evaluators.

## Options

**A. Raise the default for Foundry-targeted bundles** to e.g. 30s. Same value can be too lax for HTTP/local backends though.

**B. Split into two metrics**:
- `agent_latency_seconds` — only the agent invocation time
- `eval_latency_seconds` — full pipeline including evaluators
…and apply tighter thresholds only to the agent metric.

**C. Document prominently** in the Foundry tutorial that `avg_latency_seconds` measures the full pipeline and users should tune the threshold per backend.

Recommendation: **C** as a stop-gap, **B** as the proper fix (also addresses #ISSUE_FAKED_LATENCY).

## Severity
Medium (every Foundry tutorial run currently fails on this rule)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default avg_latency_seconds threshold (10s) is unrealistic for Foundry cloud eval #116

Source

Problem

Options

Severity

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Default avg_latency_seconds threshold (10s) is unrealistic for Foundry cloud eval #116

Description

Source

Problem

Options

Severity

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions