Skip to content

feat(loops): loop-level wall-clock deadline (max_duration_seconds) (#1156)#1182

Open
dolho wants to merge 3 commits into
devfrom
feature/1156-loop-max-duration
Open

feat(loops): loop-level wall-clock deadline (max_duration_seconds) (#1156)#1182
dolho wants to merge 3 commits into
devfrom
feature/1156-loop-max-duration

Conversation

@dolho

@dolho dolho commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds an optional loop-level wall-clock deadline (max_duration_seconds) to sequential agent loops — the third hard stop alongside the max_runs iteration cap (and the separately-tracked cost budget). A loop legally configured today (max_runs=100 × timeout_per_run up to 2h + delay_seconds) can run for days; this bounds total duration.

Related to #1156.

What changed (end-to-end)

  • Schema/migration: agent_loops.max_duration_seconds INTEGER (nullable) in schema.py + idempotent agent_loops_max_duration migration for existing DBs.
  • Runner (loop_service._run): deadline measured from started_at, checked only at iteration boundaries — before the next run and before/after the inter-run delay (the delay_seconds sleep is capped to the remaining budget, never sleeping past the deadline). An in-flight run is never killed mid-turn, so overshoot is bounded by one timeout_per_run. Expiry → terminal status stopped, stop_reason="deadline_exceeded".
  • Router: optional max_duration_seconds (1–604800 = 7d); 400 when smaller than the effective per-run timeout (timeout_per_run, else the agent's execution_timeout_seconds). GET /api/loops/{id} now returns max_duration_seconds + a computed elapsed_seconds.
  • MCP run_agent_loop + Loops UI (form field + deadline/elapsed in detail, deadline_exceeded label).
  • Docs: architecture.md (feature + schema) and requirements.md §38.2.

Acceptance criteria

  • POST /loops accepts optional max_duration_seconds; persisted on agent_loops
  • Deadline from started_at, checked at iteration boundaries; in-flight run completes
  • Expired deadline → stop_reason="deadline_exceeded", status stopped
  • Validation: reject < timeout_per_run (or agent execution timeout) with 400
  • GET /loops/{id} returns deadline + elapsed; MCP run_agent_loop exposes the param
  • Loops UI shows the deadline when set
  • Schema migration + tests (deadline stop, boundary semantics, validation error)

Testing

tests/unit/test_loop_service.py (deadline suite) + tests/unit/test_loops_router_validation.py24 passed locally. Also verified the migration is idempotent and create_loop/get_loop round-trips the column on a fresh schema and via the ALTER path.

🤖 Generated with Claude Code

@dolho dolho force-pushed the feature/1156-loop-max-duration branch from be4c6f5 to bc23675 Compare June 12, 2026 11:50
@github-actions

Copy link
Copy Markdown

⚠️ Nightly unit-suite check skipped — merge conflict against dev.

Resolve by running git merge dev locally and pushing the result. The next nightly run will re-test once the conflict is gone.

@dolho dolho requested a review from vybe June 16, 2026 08:48
@vybe

vybe commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

@dependabot rebase

@vybe vybe left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — the feature itself looks good (clean SQLite migration, schema.py/tables.py updated, docs + tests present). One blocker before merge:

Missing the PostgreSQL Alembic revision (dual-track migration, CLAUDE.md §9 / Invariant #9).

This PR adds agent_loops.max_duration_seconds via the SQLite track (_migrate_agent_loops_max_duration in db/migrations.py) but there is no matching Alembic revision under src/backend/migrations/versions/. On a PostgreSQL deployment the column would never be created, so the loop deadline feature would crash there.

Please add a revision (down_revision = current head, e.g. 0001_baseline) that runs op.add_column('agent_loops', sa.Column('max_duration_seconds', sa.Integer(), nullable=True)). Once that's in, this is good to go.

dolho and others added 2 commits June 25, 2026 13:23
…1156)

Adds an optional total-duration bound to sequential agent loops, the third
hard stop alongside the max_runs iteration cap. A loop legally configured
today (max_runs=100 × timeout_per_run up to 2h + delays) could run for days;
max_duration_seconds caps total wall-clock time.

- Schema: agent_loops.max_duration_seconds INTEGER (nullable) + idempotent
  migration (agent_loops_max_duration).
- Runner (loop_service): deadline measured from started_at, checked only at
  iteration boundaries (before the next run + before/after the inter-run
  delay, which is capped to the remaining budget). An in-flight run is never
  killed mid-turn — overshoot is bounded by one timeout_per_run. Expiry stops
  with stop_reason="deadline_exceeded", terminal status "stopped".
- Router: optional max_duration_seconds (1..604800); 400 when smaller than the
  effective per-run timeout (timeout_per_run, else agent execution_timeout).
  GET /api/loops/{id} returns max_duration_seconds + computed elapsed_seconds.
- MCP run_agent_loop + UI Loops form/detail expose the parameter and deadline.
- Tests: deadline stop at boundary, in-flight run completes (not killed),
  delay capped to remaining budget, no-deadline regression, and the four
  validation paths.

Docs: architecture.md (feature + schema) and requirements.md §38.2.

Closes #1156

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1156)

The SQLite path got the column via db/migrations.py + schema.py, but the
PostgreSQL backend is Alembic-owned and the sqlite PRAGMA migrations are
skipped — so existing PG deployments never received the column, and
loops.py's insert(agent_loops).values(max_duration_seconds=...) would fail
on Postgres.

Adds 0004_agent_loops_max_duration (down_revision 0003) mirroring the
0002/0003 precedent: idempotent ADD COLUMN IF NOT EXISTS so it's a no-op on
fresh PG builds (baseline already creates it from schema.py:TABLES) and adds
it in place on existing PG deployments.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dolho dolho force-pushed the feature/1156-loop-max-duration branch from 5546a4d to 30bbaf9 Compare June 25, 2026 10:25
…#1156)

The feature added `max_duration_seconds` but the run-agent-loop feature flow
wasn't updated (flagged by /validate-pr). Weave the wall-clock deadline through
the flow: tool/UI params, the create-time 400 validation + GET elapsed_seconds,
the iteration-boundary deadline check and `deadline_exceeded` terminal reason,
the dual-track migration (SQLite + Alembic 0005), error-handling rows, the
604800s limit, and the new tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dolho

dolho commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

PR Validation Report — /validate-pr

Rebased onto dev (was CONFLICTING) and fixed an Alembic revision-number collision — the new revision was 0004 off 0003, colliding with dev's 0004_agent_ownership_voice_name; renumbered to 0005 chained after the current head. Now MERGEABLE.

Category Status Notes
Commit messages conventional (feat/fix/docs)
Base branch dev
PR size 15 files
Issue link resolves public #1156 (p1, type-feature); "Related to" per relate-don't-close convention
Requirements requirements.md updated (param 1–604800)
Architecture loops section + agent_loops DDL + endpoint behavior
Feature flow run-agent-loop.md updated in 53ed5aa (was the one ⚠️)
Security no secrets / emails / IPs / .env / cred files
Build/Config packaging no new top-level module; no new os.getenv()
Dual-track migration (Inv. #9) SQLite + Alembic 0005 + schema.py + tables.py consistent
Code quality iteration-boundary deadline, overshoot ≤ one timeout_per_run, 400 if deadline < per-run timeout

Verification: schema-parity pytest ✅ (4) · loop tests ✅ (test_loop_service + test_loops_router_validation, 25).

Recommendation: APPROVE

Clean, well-tested, dual-track migration correct after the rebase fix, docs complete. P1 feature — /review + /cso --diff remain part of the pipeline if not already run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants