Add cost-protection pause for suspicious long-running sessions

## Problem

A real WebUI session on the stable local runtime showed how a long-running agent turn can keep consuming model calls and time after it has likely entered a low-value recovery path. The observed turn was manually cancelled after about 6 minutes and had already made repeated model calls, context-compression attempts, and tool retries. The task had extracted the target article body but had not converged to writing the file or returning a final answer.

This is not a user-error problem. Users should not need to diagnose live agent logs to know whether a turn is productively working, retrying, compressing, or stuck in a high-cost recovery path.

## Desired behavior

WebUI should not automatically decide that the task failed. Instead, when objective high-risk signals accumulate, WebUI should pause before the next expensive step and ask the user what to do.

Example copy:

> This run may be stuck in a high-cost recovery path: 13 model calls, 11 context compressions, 2 tool errors, and no final assistant output yet. Continue?

Actions:

- Continue
- Stop
- Summarize and stop

## Candidate risk signals

- Long active run age with no final assistant output
- Repeated context compression in one turn
- Compression timeout or repeated compression failure
- Repeated API retry or provider connection failures
- Repeated tool errors for the same target
- High model-call or tool-call count for one user turn

## Scope for first PR

A first slice should be deliberately narrow:

- Add a backend cost-protection/risk gate at safe run-loop boundaries, before issuing the next model/tool call where feasible.
- Emit a structured event to WebUI when the gate pauses a run.
- Render a confirmation card in the existing chat flow with Continue and Stop actions.
- Keep auto-stop out of scope unless the user has configured a hard budget.
- Preserve existing Stop/cancel behavior.

## Non-goals

- Do not let WebUI silently judge a task as failed.
- Do not pause inside an already-blocking model/compression call unless that call already supports cancellation.
- Do not introduce a new runner process or large runtime-adapter migration in this slice.

## Evidence from local incident

Observed on a local `8787` stable runtime session while clipping a WeChat article:

- Active for about 6 minutes before manual cancellation
- 13 model API calls
- 11 context-compression attempts
- 2 tool errors
- At least one 120s auxiliary compression timeout
- Article body had been extracted, but no final saved Markdown was produced

## Contract routing

Task type: runtime / streaming / user-facing safety UX
Touched areas: run lifecycle, SSE events, chat UI, cancellation/continue controls
Relevant docs:

- `AGENTS.md`
- `CONTRIBUTING.md`
- `docs/CONTRACTS.md`
- `docs/rfcs/webui-run-state-consistency-contract.md`
- `docs/rfcs/hermes-run-adapter-contract.md`
- `docs/UIUX-GUIDE.md`
- `DESIGN.md`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cost-protection pause for suspicious long-running sessions #2957

Problem

Desired behavior

Candidate risk signals

Scope for first PR

Non-goals

Evidence from local incident

Contract routing

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add cost-protection pause for suspicious long-running sessions #2957

Description

Problem

Desired behavior

Candidate risk signals

Scope for first PR

Non-goals

Evidence from local incident

Contract routing

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions