Skip to content

Exponential backoff with jitter for retries #581

@ysknsid25

Description

@ysknsid25

Summary

Add a new retryBackoff option that lets users configure exponential backoff with jitter for automatic retries, following the strategies described in the AWS Architecture Blog post "Exponential Backoff And Jitter" by Marc Brooker.

Motivation

The current retry mechanism (src/fetch.ts) only supports a fixed retryDelay (a number or a callback returning a number). This is fine for a single client, but it scales poorly when many clients fail simultaneously:

  1. With no delay (retryDelay: 0, the default), every client retries immediately and stampedes the server while it is still recovering.
  2. With a fixed delay (e.g. 500ms), all failed clients still retry in lockstep — the herd is preserved, just shifted in time. The server sees the same pulsing load pattern.

This is the classic "Thundering Herd" problem. The AWS post shows via simulation that adding exponential backoff alone is not enough; the key insight is that randomizing each client's retry schedule (jitter) is what actually decorrelates the herd. With jitter, both the peak load on the server and the overall completion time across all clients improve significantly.

Today, users who want this behavior have to implement it themselves inside a retryDelay callback, which is error-prone (need to track attempt count, pick the right formula, manage prev_sleep for decorrelated jitter, etc.). It would be much nicer if ofetch provided this out of the box.

Current behavior

// src/types.ts
retry?: number | false;
retryDelay?: number | ((context: FetchContext<T, R>) => number);
retryStatusCodes?: number[];
// src/fetch.ts (onError)
const retryDelay =
  typeof context.options.retryDelay === "function"
    ? context.options.retryDelay(context)
    : context.options.retryDelay || 0;

The FetchContext does not expose the current attempt number, so even a user-defined callback cannot easily implement backoff without external state.

Proposal

Introduce a new opt-in option retryBackoff that selects a backoff strategy and supplies base / cap delays in milliseconds:

await ofetch("/api", {
  retry: 5,
  retryBackoff: {
    strategy: "full-jitter", // "full-jitter" | "equal-jitter" | "decorrelated-jitter"
    base: 100,  // minimum delay (ms)
    cap: 3000,  // maximum delay (ms)
  },
});

Strategies

All three strategies from the AWS post would be supported so users can pick the one that matches their constraints:

  • full-jitter (recommended in the AWS post):
    sleep = random(0, min(cap, base * 2 ** attempt))
    Best overall — lowest server load and lowest completion time in the simulation. No state required.

  • equal-jitter:
    temp = min(cap, base * 2 ** attempt); sleep = temp / 2 + random(0, temp / 2)
    Guarantees a minimum wait time (temp / 2), useful when "retry immediately" is undesirable.

  • decorrelated-jitter:
    sleep = min(cap, random(base, prev_sleep * 3)) (with prev_sleep = base on the first retry)
    Comparable to full-jitter in performance and matches the AWS SDK's default. Requires tracking the previous sleep value.

Expose attempt count on the context

To keep callback-based use cases working and to let users observe retries, add a public retryAttempt?: number field to FetchContext:

  • undefined on the first attempt
  • 1, 2, ... on subsequent retries

Coexistence with retryDelay

When both retryBackoff and retryDelay are set, retryBackoff takes precedence and retryDelay is ignored. Documented explicitly to avoid silent mixing of the two semantics.

Backward compatibility

Fully opt-in. If retryBackoff is not specified, the existing retryDelay behavior is preserved exactly. No default delay is applied, no public type changes are breaking.

Sketch of implementation

  • New module src/retry.ts exporting RetryBackoffOptions, RetryBackoffStrategy, and a pure computeBackoffDelay({ options, attempt, prevDelay, random }) function (with random injectable for deterministic tests).
  • Extend FetchOptions with retryBackoff?: RetryBackoffOptions, extend GlobalOptions accordingly, and add retryAttempt?: number to FetchContext.
  • In src/fetch.ts onError, branch on retryBackoff before falling back to retryDelay. Pass attempt count and previous delay across recursive retries via internal options fields (_retryAttempt, _retryPrevDelay).
  • Unit tests for computeBackoffDelay (deterministic via injected random) plus 1–2 end-to-end retry tests in test/index.test.ts.
  • README update under "Auto Retry" with a new "Exponential Backoff with Jitter" subsection.

Happy to open a PR if this direction sounds good.

References

Additional information

  • Would you be willing to help implement this feature?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions