feat: mark rate-limited automation runs as Skipped – Limit Reached#174
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Automation runs that can't start because the organization (incl. Personal Workspaces) is at its concurrent-sandbox limit are now marked with a new terminal status
SKIPPEDand shown in the Activity Log as "Skipped – Limit Reached" — instead of being retried and then markedFAILED.Why
When the org is at its limit,
POST /api/v1/sandboxesreturns429withdetail.error == "CONCURRENCY_LIMIT_REACHED". The cloud backend treated this like any transient rate limit: retry 5× (10–60s each), then mark the runFAILED("Failed to get execution context"). That's slow and misleading — retrying can't free a slot, and it's not really a failure. (This mirrors what the OpenHands integration managers already do forConcurrencyLimitErroron the conversation side —APP-1031.)What changed
Backend (
openhands/automation/)models.py/schemas.py— addSKIPPEDtoAutomationRunStatusand the APIRunStatus.exceptions.py— newConcurrencyLimitReachedError(a plainException, notPermanentDispatchError, so the automation is not disabled).backends/cloud.py—_concurrency_limit_detail()recognizes the limit429(tolerating FastAPI's{"detail": {…}}nesting and a flat{…}). It's raised inside the retried request closure, so — being a non-HTTPStatusError— it bypasses the retry predicate (no backoff) while transient429s still retry unchanged.dispatcher.py— catchConcurrencyLimitReachedErroraroundget_execution_contextand mark the runSKIPPEDviamark_run_terminal(no disable).utils/run.py—SKIPPEDjoins the terminal set that stampscompleted_at.Frontend (
frontend/)types/automation.ts— addSKIPPEDto the enum (the badge's exhaustiveRecordthen requires the entry below).components/automations/detail/run-status-badge.tsx— mapSKIPPED → AUTOMATIONS$DETAIL$SKIPPED_LIMIT_REACHED(neutral/muted style).i18n/translation.json— new key, all 15 locales,en = "Skipped – Limit Reached"(en‑dash, U+2013).declaration.tsandpublic/locales/**are gitignored and regenerated bymake-i18n.Tests
tests/test_backends.py—_concurrency_limit_detaildiscriminator (nested/flat/transient/no-marker/non-JSON/non-429) +_create_sandboxraisesConcurrencyLimitReachedErrorand is not retried.tests/test_dispatcher.py—_execute_runmarks the runSKIPPED(enabled, noerror_detail); regression that a generic context failure still marksFAILED.frontend/.../run-status-badge.test.tsx— badge renders the skipped label.Behavior
FAILED("Failed to get execution context")SKIPPED→ "Skipped – Limit Reached"429Notes
automation_runs.statusisVARCHAR(20)with no constraint (same as theCANCELLEDprecedent).CONCURRENCY_LIMIT_REACHED429 (APP-1031/conversation-limits).CANCELLED(aCANCELLEDrun would render an undefined badge). Not introduced here; recommend a separate follow-up.Testing
Verification (Activity Log)
POST /v1/{id}/dispatch).SKIPPED(notFAILED), withstarted_at/completed_atset anderror_detailnull.