Skip to content

Discussion: attempt_number semantics diverge from Hookdeck API #662

@alexluong

Description

@alexluong

Current outpost behavior

  • attempt_number is 0-indexed (starts at 0)
  • Automated retries increment sequentially (0, 1, 2, ...)
  • Manual retries always produce attempt_number=0 because the retry path doesn't have context of how many prior attempts exist for the event+destination pair — so a manual retry after automated retries (0, 1, 2) produces a duplicate attempt_number=0

Hookdeck API behavior

  • attempt_number is 1-indexed (starts at 1)
  • Both automated and manual retries increment the number

Questions to resolve

1. Indexing

Should we align with Hookdeck's 1-indexed convention?

2. Manual retry incrementing and retry schedule interaction

Should manual retries look up the prior attempt count and continue incrementing? If so, how does this affect the retry schedule? E.g. if the schedule allows 5 retries over 24 hours and a user manually retries 4 times (all failing), does that exhaust the schedule — leaving only 1 automated retry remaining? If manual retries should not count toward the schedule limit, we effectively need two separate concepts: the automated retry counter (used by the scheduler to determine remaining retries) and the total attempt number (including manual).

3. Persistence vs derivation

Currently attempt_number is persisted on the Attempt record. Should we instead derive it at read time (e.g. by ordering attempts for an event+destination by time and assigning a sequence number)? Deriving avoids needing writers to agree on the correct value, but adds read-time complexity.

4. RetryTask.attempt_number reliability

Currently the retry message carries the next attempt_number (set by RetryTaskFromDeliveryTask as task.Attempt + 1). Should the retrymq handler instead calculate it at execution time (e.g. by counting existing attempts)? The current approach is unreliable because manual retries bypass RetryTask entirely, and there's a race condition where a manual and automated retry could both be enqueued to deliverymq — making the carried value stale. This is an extreme edge case but means the value on RetryTask is effectively unreliable.

5. Delivery queue exclusivity (related)

Should we enforce that only one delivery task for the same event+destination can be in the queue at a time? E.g. if an automated retry is pending in deliverymq, reject a manual retry (and vice versa — if a manual delivery task is queued, block automated retries from being enqueued). This would prevent duplicate concurrent deliveries and simplify the attempt_number correctness problem. Without this, we may need to calculate attempt_number during the deliverymq handler by querying the logstore for prior attempts — adding a dependency from deliverymq to logstore.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions