-
Notifications
You must be signed in to change notification settings - Fork 29
Description
When a retry task's executor fails (e.g., event not found in logstore, transient errors), the message sits in the queue and becomes visible again after a fixed 30s visibility timeout. This repeats indefinitely with no limit.
Problems
- A permanently failing retry message cycles forever with no cap
- No dead-letter path to detect or surface stuck messages
- Fixed visibility timeout on re-fetch failures — no backoff between attempts
The underlying queue already tracks receive count and supports per-message visibility changes, so the primitives are there.
Open questions
Max receive count
What should the default be?
Suggestion: 5 internal re-fetch attempts before giving up. This is separate from the delivery retry max limit, which controls how many times we re-deliver to the destination.
Backoff on re-fetch
Should we apply exponential backoff on internal failures (e.g., 30s → 60s → 120s), or is a fixed interval fine since these are typically short-lived transient issues?
What happens when max is exceeded
Suggestion: Route to a DLQ. Gives observability into stuck messages and the ability to replay them.
Configuration
Suggestion: Expose as
retrymqconfig, similar to howdeliverymqis configured. e.g.,RETRYMQ_MAX_RECEIVE_COUNT,RETRYMQ_VISIBILITY_TIMEOUT_SECONDS.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status