Skip to content

feat: add exponential backoff retry for webhook delivery#97

Open
bianbiandashen wants to merge 1 commit intobenjitaylor:mainfrom
bianbiandashen:feat/webhook-retry-with-backoff
Open

feat: add exponential backoff retry for webhook delivery#97
bianbiandashen wants to merge 1 commit intobenjitaylor:mainfrom
bianbiandashen:feat/webhook-retry-with-backoff

Conversation

@bianbiandashen
Copy link

Summary

Add robust retry mechanism for webhook delivery using exponential backoff with jitter.

Problem

Currently, webhooks are fire-and-forget. If a webhook endpoint is temporarily unavailable (network glitch, server restart, rate limiting), the notification is lost forever.

Solution

Implement exponential backoff retry with the following features:

Retry Logic

  • Retryable errors: Network failures, 5xx server errors, 429 rate limiting
  • Non-retryable errors: 4xx client errors (except 429) are not retried
  • Exponential backoff: baseDelay * 2^attempt with configurable cap
  • Jitter: Random 0-25% added to prevent thundering herd problem

Configuration (Environment Variables)

Variable Default Description
AGENTATION_WEBHOOK_MAX_RETRIES 3 Maximum retry attempts
AGENTATION_WEBHOOK_BASE_DELAY_MS 1000 Base delay for backoff
AGENTATION_WEBHOOK_MAX_DELAY_MS 30000 Maximum delay cap
AGENTATION_WEBHOOK_TIMEOUT_MS 10000 Request timeout

Non-blocking

Retries run in the background without blocking the HTTP response to the client.

Changes

  • mcp/src/server/http.ts: Add retry logic with helper functions

Test plan

  • Verified successful webhook delivery works as before
  • Verified retry logic triggers on 5xx errors
  • Verified non-retryable 4xx errors are not retried
  • Verified exponential backoff delays are calculated correctly

Webhooks now retry on transient failures with exponential backoff:

- Retries on network errors, 5xx server errors, and 429 rate limiting
- Exponential backoff with jitter to prevent thundering herd
- Configurable via environment variables:
  - AGENTATION_WEBHOOK_MAX_RETRIES (default: 3)
  - AGENTATION_WEBHOOK_BASE_DELAY_MS (default: 1000)
  - AGENTATION_WEBHOOK_MAX_DELAY_MS (default: 30000)
  - AGENTATION_WEBHOOK_TIMEOUT_MS (default: 10000)
- Non-blocking: retries run in background
- Detailed logging for debugging retry behavior
@vercel
Copy link

vercel bot commented Feb 14, 2026

@bianbiandashen is attempting to deploy a commit to the Benji Taylor's Projects Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant