-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Current outpost behavior
attempt_numberis 0-indexed (starts at0)- Automated retries increment sequentially (
0, 1, 2, ...) - Manual retries always produce
attempt_number=0because the retry path doesn't have context of how many prior attempts exist for the event+destination pair — so a manual retry after automated retries (0, 1, 2) produces a duplicateattempt_number=0
Hookdeck API behavior
attempt_numberis 1-indexed (starts at1)- Both automated and manual retries increment the number
Questions to resolve
1. Indexing
Should we align with Hookdeck's 1-indexed convention?
2. Manual retry incrementing and retry schedule interaction
Should manual retries look up the prior attempt count and continue incrementing? If so, how does this affect the retry schedule? E.g. if the schedule allows 5 retries over 24 hours and a user manually retries 4 times (all failing), does that exhaust the schedule — leaving only 1 automated retry remaining? If manual retries should not count toward the schedule limit, we effectively need two separate concepts: the automated retry counter (used by the scheduler to determine remaining retries) and the total attempt number (including manual).
3. Persistence vs derivation
Currently attempt_number is persisted on the Attempt record. Should we instead derive it at read time (e.g. by ordering attempts for an event+destination by time and assigning a sequence number)? Deriving avoids needing writers to agree on the correct value, but adds read-time complexity.
4. RetryTask.attempt_number reliability
Currently the retry message carries the next attempt_number (set by RetryTaskFromDeliveryTask as task.Attempt + 1). Should the retrymq handler instead calculate it at execution time (e.g. by counting existing attempts)? The current approach is unreliable because manual retries bypass RetryTask entirely, and there's a race condition where a manual and automated retry could both be enqueued to deliverymq — making the carried value stale. This is an extreme edge case but means the value on RetryTask is effectively unreliable.
5. Delivery queue exclusivity (related)
Should we enforce that only one delivery task for the same event+destination can be in the queue at a time? E.g. if an automated retry is pending in deliverymq, reject a manual retry (and vice versa — if a manual delivery task is queued, block automated retries from being enqueued). This would prevent duplicate concurrent deliveries and simplify the attempt_number correctness problem. Without this, we may need to calculate attempt_number during the deliverymq handler by querying the logstore for prior attempts — adding a dependency from deliverymq to logstore.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status