Fix evaluation workers hanging due to missing exec_run() timeouts by sarvanithin · Pull Request #93 · withmartian/ares

sarvanithin · 2026-02-27T00:58:55Z

User description

Problem

Evaluation workers hang indefinitely when running example 3 (03_parallel_eval_with_api.py). Symptom: 18 agent steps observed but wall-clock time far exceeding the expected 18 × 2 min = 36 min. Reported by Josh.

Root Cause

Four exec_run() call sites were missing timeout_s, so a single unresponsive container exec could block the entire evaluation indefinitely — even though per-operation timeouts appear to exist elsewhere.

1. `mini_swe_agent.py` — agent setup has no timeout

The 4 uname commands run with no timeout_s. If Daytona is slow during container startup, the code_agent_task hangs before making any LLM request, blocking _get_time_step() forever via asyncio.wait().

2. `code_env.py` (`_compute_reward`) — test execution has no timeout

bash {test_path} and reward file cat calls use no timeout. If a test script contains an infinite loop or the container hangs during scoring, the episode hangs indefinitely after the agent finishes all its steps.

3. `code_env.py` (`_get_time_step`) — `asyncio.wait()` has no timeout

No backstop at the step level. Even when per-operation timeouts are set, if the underlying Daytona SDK delays propagating CancelledError (e.g. during server-side cancel cleanup), the effective timeout can far exceed the configured value.

4. `examples/03_parallel_eval_with_api.py` — no per-task wall-clock limit

A hung task holds a semaphore slot indefinitely, reducing effective parallelism from num_parallel_workers down as workers pile up.

Fix

File	Change
`mini_swe_agent.py`	Add `timeout_s=30` to the 4 setup `exec_run()` calls
`code_env.py`	Add `_REWARD_EXEC_TIMEOUT_S = 300` (5 min) to all exec calls in `_compute_reward()` and `_parse_reward_file()`
`code_env.py`	Add `timeout=_GET_TIME_STEP_TIMEOUT_S` (10 min) to `asyncio.wait()` in `_get_time_step()` with a clear `TimeoutError`
`examples/03_parallel_eval_with_api.py`	Wrap each `evaluate_task()` with `asyncio.wait_for(timeout=args.task_timeout_s)` (default 30 min); timed-out tasks surface as errors via `gather(return_exceptions=True)`

Testing

All 163 existing tests pass. End-to-end testing with Daytona in progress.

Generated description

Below is a concise technical summary of the changes proposed in this PR:
Implements comprehensive timeouts across container execution calls and task management to prevent evaluation workers from hanging indefinitely. These changes ensure that unresponsive container operations or delayed cancellations do not block system resources or parallel execution slots.

Topic Details

Execution Timeouts

Integrates timeout_s into exec_run calls within MiniSWECodeAgent and CodeEnvironment to prevent blocking during agent setup, test execution, and reward parsing.

Modified files (2)

src/ares/code_agents/mini_swe_agent.py
src/ares/environments/code_env.py

Latest Contributors(2)

User	Commit	Date
joshua.greaves@gmail.com	Add-an-LLMResponse-mod...	January 29, 2026
ryan@withmartian.com	Allowed-all-harbor-dat...	January 28, 2026

Task Safeguards

Adds a 10-minute backstop to _get_time_step using asyncio.wait and introduces a configurable task_timeout_s in the parallel evaluation script to safeguard against hung tasks.

Modified files (2)

examples/03_parallel_eval_with_api.py
src/ares/environments/code_env.py

Latest Contributors(2)

User	Commit	Date
ryan@withmartian.com	Fail-fast-cleanups-76	January 29, 2026
joshua.greaves@gmail.com	Fail-fast-if-required-...	January 29, 2026

This pull request is reviewed by Baz. Review like a pro on (Baz).

Several exec_run() call sites were missing timeout_s, causing evaluations to hang indefinitely when the container runtime does not properly propagate asyncio cancellation. Reported symptom: 18 agent steps observed but wall-clock time far exceeding 18 * 2 min = 36 min expected. Root causes and fixes: 1. mini_swe_agent.py - Agent setup runs 4 exec_run() calls (uname -a/r/v/m) with no timeout. If Daytona is unresponsive during container setup, the code_agent_task hangs before making any LLM request, blocking _get_time_step() indefinitely. Fix: add setup_timeout_s=30. 2. code_env.py (_compute_reward) - Test evaluation and reward file reads use exec_run() with no timeout. If a test script contains an infinite loop or the container hangs during scoring, the entire episode hangs after the agent completes. Fix: add _REWARD_EXEC_TIMEOUT_S = 300 (5 min) to all exec calls in _compute_reward() and _parse_reward_file(). 3. code_env.py (_get_time_step) - asyncio.wait() had no timeout argument. Even with per-operation timeouts, if the underlying Daytona SDK catches and delays CancelledError (e.g. during server-side cleanup), the effective timeout can far exceed the configured value. Fix: add timeout=_GET_TIME_STEP_TIMEOUT_S (10 min) and raise TimeoutError with a descriptive message if neither the code agent nor the LLM queue responds in time. 4. examples/03_parallel_eval_with_api.py - No per-task wall-clock limit. A hung task holds a semaphore slot indefinitely, reducing effective parallelism. Fix: wrap each evaluate_task() with asyncio.wait_for(timeout=args.task_timeout_s) (default 30 min). Timed-out tasks propagate TimeoutError through gather's return_exceptions=True and are counted as errors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

baz-reviewer bot approved these changes Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix evaluation workers hanging due to missing exec_run() timeouts#93

Fix evaluation workers hanging due to missing exec_run() timeouts#93
sarvanithin wants to merge 1 commit intowithmartian:mainfrom
sarvanithin:fix/eval-worker-timeouts

sarvanithin commented Feb 27, 2026 •

edited by baz-reviewer bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sarvanithin commented Feb 27, 2026 • edited by baz-reviewer bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Problem

Root Cause

1. mini_swe_agent.py — agent setup has no timeout

2. code_env.py (_compute_reward) — test execution has no timeout

3. code_env.py (_get_time_step) — asyncio.wait() has no timeout

4. examples/03_parallel_eval_with_api.py — no per-task wall-clock limit

Fix

Testing

Generated description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sarvanithin commented Feb 27, 2026 •

edited by baz-reviewer bot

Loading

1. `mini_swe_agent.py` — agent setup has no timeout

2. `code_env.py` (`_compute_reward`) — test execution has no timeout

3. `code_env.py` (`_get_time_step`) — `asyncio.wait()` has no timeout

4. `examples/03_parallel_eval_with_api.py` — no per-task wall-clock limit