Skip to content

Add RetryPolicy CRUD API and armadactl support#4805

Open
dejanzele wants to merge 1 commit intoarmadaproject:masterfrom
dejanzele:retry-policy-crud
Open

Add RetryPolicy CRUD API and armadactl support#4805
dejanzele wants to merge 1 commit intoarmadaproject:masterfrom
dejanzele:retry-policy-crud

Conversation

@dejanzele
Copy link
Copy Markdown
Member

What type of PR is this?

Feature (retry policy PR 2 of 4)

What this PR does / why we need it

Adds RetryPolicy as a first-class API resource with full CRUD operations, following the same patterns as Queue. Operators can create retry policies and assign them to queues by name.

  • Adds RetryPolicy, RetryRule, RetryExitCodeMatcher proto messages and RetryAction enum to submit.proto
  • Adds RetryPolicyService gRPC service with Create/Update/Delete/Get/List RPCs
  • Adds REST gateway bindings for all operations (/v1/retry-policy/...)
  • Adds retry_policy field (string, references policy by name) to the Queue proto message
  • Adds internal/server/retrypolicy/ - repository (PostgreSQL) and gRPC handler, mirroring internal/server/queue/
  • Adds pkg/client/retrypolicy/ - client library for all CRUD operations
  • Adds cmd/armadactl/cmd/retrypolicy.go - CLI commands for managing retry policies
  • Adds --retry-policy flag to queue create/update commands
  • Adds RetryPolicy resource kind for file-based creation (armadactl create -f policy.yaml)

Usage:

armadactl create retry-policy -f policy.yaml
armadactl get retry-policy ml-training
armadactl get retry-policies
armadactl delete retry-policy ml-training
armadactl create queue ml-queue --retry-policy ml-training

Which issue(s) this PR fixes

Part of #4683 (Retry Policy)

Special notes for your reviewer

  • This is retry policy PR 2 of 4: Engine + config (independent) and CRUD + armadactl (this) -> Scheduler wiring -> Backoff + pod naming
  • No dependency on error categorization PRs - this is pure CRUD infrastructure
  • Independent of the engine PR (PR 1) - these can be reviewed in parallel
  • Follows the queue CRUD pattern exactly: proto -> server handler -> PostgreSQL repository -> client library -> armadactl
  • Repository uses INSERT with unique constraint check (not read-then-write) for concurrent-safe creates
  • Update uses single UPDATE with RowsAffected check instead of read-then-upsert
  • The retry_policy table (name text PK, definition bytea) stores serialized proto, same as the queue table
  • File-based create/update because rules are too complex for CLI flags
  • No scheduling behavior change in this PR

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 30, 2026

Greptile Summary

This PR adds RetryPolicy as a first-class API resource with full CRUD operations (proto messages, gRPC service, PostgreSQL repository, client library, and armadactl commands), following the established Queue pattern exactly. All concerns from the previous review round have been addressed: UpdateRetryPolicy now uses a dedicated permission, ExitCodeOperator is an enum, the migration (032_create_retry_policy.sql) is present and correctly sequenced, empty-name validation is in place on all endpoints, and idempotent delete is explicitly documented.

Confidence Score: 5/5

Safe to merge — all prior P0/P1 concerns are resolved; remaining findings are P2 design suggestions for future PRs.

All blocking issues from the previous review round are addressed. The remaining inline comments are P2: missing repository integration tests (no runtime breakage) and proto design considerations around retry_limit = 0 ambiguity and on_conditions typing (both can be addressed before the scheduler wiring lands in PRs 3/4). No P0 or P1 issues remain.

pkg/api/submit.proto (retry_limit uint32 ambiguity, on_conditions string typing) and internal/server/retrypolicy/repository.go (no integration tests).

Important Files Changed

Filename Overview
internal/server/retrypolicy/service.go New gRPC service with correct auth guards on write endpoints, explicit unauthenticated-read policy documented, and good empty-name validation on all endpoints. All prior review concerns resolved.
internal/server/retrypolicy/repository.go PostgreSQL repository using pgxpool with correct ON CONFLICT DO NOTHING + RowsAffected pattern for create, and RowsAffected check for update. Idempotent delete documented. No integration tests exercise the SQL against a real schema.
internal/server/retrypolicy/service_test.go Comprehensive unit tests for all service methods using gomock; covers permission-denied, unavailable, invalid-argument, not-found, already-exists, and success cases.
pkg/api/submit.proto Adds RetryPolicy/RetryRule/RetryExitCodeMatcher messages, ExitCodeOperator and RetryAction enums, and RetryPolicyService. retry_limit uint32 is ambiguous with proto3 default-zero; on_conditions uses free-form strings for a finite Kubernetes pod-condition set.
internal/lookout/schema/migrations/032_create_retry_policy.sql Adds retry_policy table (name PK, definition bytea) mirroring the queue table schema. Migration is correctly sequenced after 031.
internal/server/submit/submit.go Adds retryPolicyService field to Server and thin delegation methods, mirroring the existing queue delegation pattern.
cmd/armadactl/cmd/retrypolicy.go New CLI commands for create/update/delete/get/get-all retry policies, file-based create and update, consistent with queue CLI patterns.
internal/server/permissions/permissions.go Adds CreateRetryPolicy, UpdateRetryPolicy, DeleteRetryPolicy permission constants. All three are distinct, resolving the prior shared-permission concern.
internal/armadactl/retrypolicy.go App methods for retry policy CRUD, file-based parsing, YAML output with resource header. Delete message documents the idempotent behaviour.
internal/server/servertest/grpc.go Extracts RequireGrpcCode helper shared by queue and retry-policy test packages; eliminates code duplication.

Sequence Diagram

sequenceDiagram
    participant CLI as armadactl
    participant REST as REST Client
    participant Submit as Submit gRPC Service
    participant RPSS as RetryPolicyService gRPC
    participant Repo as PostgresRetryPolicyRepository
    participant DB as PostgreSQL (retry_policy table)

    Note over CLI,REST: Write operations require auth (create/update/delete)
    Note over REST,Submit: HTTP REST via Submit gateway (/v1/retry-policy/...)

    CLI->>RPSS: CreateRetryPolicy(RetryPolicy)
    REST->>Submit: POST /v1/retry-policy
    Submit->>RPSS: CreateRetryPolicy(ctx, policy)
    RPSS->>RPSS: AuthorizeAction(CreateRetryPolicy)
    RPSS->>Repo: CreateRetryPolicy(ctx, policy)
    Repo->>DB: INSERT INTO retry_policy ON CONFLICT DO NOTHING
    DB-->>Repo: RowsAffected
    Repo-->>RPSS: nil or ErrAlreadyExists
    RPSS-->>Submit: Empty or status error
    Submit-->>REST: 200 OK or error

    CLI->>RPSS: GetRetryPolicy(name)
    REST->>Submit: GET /v1/retry-policy/{name}
    Submit->>RPSS: GetRetryPolicy(ctx, req)
    Note over RPSS: No auth check (intentional, mirrors GetQueue)
    RPSS->>Repo: GetRetryPolicy(ctx, name)
    Repo->>DB: SELECT definition FROM retry_policy WHERE name=$1
    DB-->>Repo: definition bytes or ErrNoRows
    Repo-->>RPSS: RetryPolicy or ErrNotFound
    RPSS-->>Submit: RetryPolicy or NotFound status
    Submit-->>REST: 200 + JSON or 404
Loading

Reviews (11): Last reviewed commit: "Add RetryPolicy CRUD API and armadactl s..." | Re-trigger Greptile

Comment thread internal/server/retrypolicy/service.go
Comment thread internal/server/retrypolicy/service.go Outdated
Comment thread internal/server/retrypolicy/repository.go
Comment thread internal/server/retrypolicy/service.go
Comment thread pkg/api/submit.proto
Comment thread internal/server/retrypolicy/service.go
@dejanzele dejanzele force-pushed the retry-policy-crud branch 2 times, most recently from 1d4f603 to 45023c8 Compare March 30, 2026 14:06
Comment thread internal/server/retrypolicy/repository.go
@dejanzele
Copy link
Copy Markdown
Member Author

@greptileai

@dejanzele dejanzele force-pushed the retry-policy-crud branch 3 times, most recently from a797d8b to 63f1c89 Compare March 30, 2026 16:15
@dejanzele
Copy link
Copy Markdown
Member Author

@greptileai

@dejanzele dejanzele force-pushed the retry-policy-crud branch 2 times, most recently from 513e46f to 07bffb3 Compare March 31, 2026 19:43
@dejanzele dejanzele force-pushed the retry-policy-crud branch from 07bffb3 to aed9ba2 Compare April 7, 2026 13:18
Introduce RetryPolicy as a first-class API resource with full CRUD
operations. This is pure infrastructure with no scheduling behavior
changes.

Proto: Add RetryPolicy, RetryRule, RetryExitCodeMatcher messages and
RetryPolicyService gRPC service with REST gateway bindings on the
Submit service. Add retry_policy field to Queue message.

Server: Add retrypolicy package with PostgresRetryPolicyRepository
(stores serialized proto in retry_policy table) and Server handler
with authorization checks. Wire into server startup and register
the gRPC service. Add CreateRetryPolicy/DeleteRetryPolicy permissions.

Client: Add pkg/client/retrypolicy with Create/Update/Delete/Get/GetAll
functions matching the queue client pattern.

CLI: Add armadactl commands for create/update/delete/get retry-policy
and get retry-policies, all using file-based input for create/update.
Add --retry-policy flag to queue create and update commands.
Add RetryPolicy as a valid ResourceKind for file-based creation.

Signed-off-by: Dejan Zele Pejchev <pejchev@gmail.com>
Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant