Skip to content

feat: mark rate-limited automation runs as Skipped – Limit Reached#174

Merged
hieptl merged 3 commits into
mainfrom
hieptl/app-2280
Jun 9, 2026
Merged

feat: mark rate-limited automation runs as Skipped – Limit Reached#174
hieptl merged 3 commits into
mainfrom
hieptl/app-2280

Conversation

@hieptl

@hieptl hieptl commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Automation runs that can't start because the organization (incl. Personal Workspaces) is at its concurrent-sandbox limit are now marked with a new terminal status SKIPPED and shown in the Activity Log as "Skipped – Limit Reached" — instead of being retried and then marked FAILED.

Why

When the org is at its limit, POST /api/v1/sandboxes returns 429 with detail.error == "CONCURRENCY_LIMIT_REACHED". The cloud backend treated this like any transient rate limit: retry 5× (10–60s each), then mark the run FAILED ("Failed to get execution context"). That's slow and misleading — retrying can't free a slot, and it's not really a failure. (This mirrors what the OpenHands integration managers already do for ConcurrencyLimitError on the conversation side — APP-1031.)

What changed

Backend (openhands/automation/)

  • models.py / schemas.py — add SKIPPED to AutomationRunStatus and the API RunStatus.
  • exceptions.py — new ConcurrencyLimitReachedError (a plain Exception, not PermanentDispatchError, so the automation is not disabled).
  • backends/cloud.py_concurrency_limit_detail() recognizes the limit 429 (tolerating FastAPI's {"detail": {…}} nesting and a flat {…}). It's raised inside the retried request closure, so — being a non-HTTPStatusError — it bypasses the retry predicate (no backoff) while transient 429s still retry unchanged.
  • dispatcher.py — catch ConcurrencyLimitReachedError around get_execution_context and mark the run SKIPPED via mark_run_terminal (no disable).
  • utils/run.pySKIPPED joins the terminal set that stamps completed_at.

Frontend (frontend/)

  • types/automation.ts — add SKIPPED to the enum (the badge's exhaustive Record then requires the entry below).
  • components/automations/detail/run-status-badge.tsx — map SKIPPED → AUTOMATIONS$DETAIL$SKIPPED_LIMIT_REACHED (neutral/muted style).
  • i18n/translation.json — new key, all 15 locales, en = "Skipped – Limit Reached" (en‑dash, U+2013). declaration.ts and public/locales/** are gitignored and regenerated by make-i18n.

Tests

  • tests/test_backends.py_concurrency_limit_detail discriminator (nested/flat/transient/no-marker/non-JSON/non-429) + _create_sandbox raises ConcurrencyLimitReachedError and is not retried.
  • tests/test_dispatcher.py_execute_run marks the run SKIPPED (enabled, no error_detail); regression that a generic context failure still marks FAILED.
  • frontend/.../run-status-badge.test.tsx — badge renders the skipped label.

Behavior

Scenario Before After
Org at concurrency limit retry 5× → FAILED ("Failed to get execution context") immediate SKIPPED → "Skipped – Limit Reached"
Transient rate-limit 429 retried retried (unchanged)
Automation after a skip stays enabled

Notes

  • No DB migrationautomation_runs.status is VARCHAR(20) with no constraint (same as the CANCELLED precedent).
  • Dependency — requires an OpenHands deployment that emits the CONCURRENCY_LIMIT_REACHED 429 (APP-1031/conversation-limits).
  • Out of scope — the frontend run-status enum/badge is pre-existingly missing CANCELLED (a CANCELLED run would render an undefined badge). Not introduced here; recommend a separate follow-up.

Testing

# backend
uv run pytest tests/test_backends.py tests/test_dispatcher.py -q   # 65 passed
uv run ruff check openhands/automation tests

# frontend
npm run typecheck                      # exit 0 (exhaustive badge Record compiles)
npm run check-translation-completeness # all 15 locales present
npx vitest run run-status-badge        # 5 passed

Verification (Activity Log)

  1. With an org/workspace at its concurrent-sandbox limit, trigger an automation (cron-due or POST /v1/{id}/dispatch).
  2. Confirm the run row is SKIPPED (not FAILED), with started_at/completed_at set and error_detail null.
  3. Open the automation's Activity Log and confirm the badge reads "Skipped – Limit Reached".

@hieptl hieptl self-assigned this Jun 9, 2026
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Coverage

@tofarr tofarr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍰

@hieptl hieptl merged commit 6560c97 into main Jun 9, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants