Skip to content

fix(utils): guard against re-entrant progress dispatch in Job#3485

Open
Bojan131 wants to merge 1 commit intolibp2p:mainfrom
Bojan131:fix/job-onprogress-reentry-guard
Open

fix(utils): guard against re-entrant progress dispatch in Job#3485
Bojan131 wants to merge 1 commit intolibp2p:mainfrom
Bojan131:fix/job-onprogress-reentry-guard

Conversation

@Bojan131
Copy link
Copy Markdown

What

Fixes #3484Job in @libp2p/utils crashes the process with RangeError: Maximum call stack size exceeded when two jobs in different queues end up with progress callbacks that point at each other.

The crash was first reported by users running OriginTrail/dkg but the bug is general; any consumer that has two Queue instances whose recipients can transitively call back into each other's synthesised onProgress will hit it. We see it in practice on the dial path when two parallel dials of the same peer share a job via join().

How

Add a per-Job dispatchingProgress flag. The synthesised onProgress checks the flag, returns early if it's already dispatching, otherwise sets it, runs the recipient forEach, and clears it in a finally.

onProgress: (evt: any): void => {
  if (this.dispatchingProgress) {
    return
  }
  this.dispatchingProgress = true
  try {
    this.recipients.forEach(recipient => {
      recipient.onProgress?.(evt)
    })
  } finally {
    this.dispatchingProgress = false
  }
}

This is the smallest change that fixes the cycle without altering any non-cyclic behaviour: a normal dispatch (one entry, never re-enters before returning) sees the flag flip on then off again with no observable difference. Only a synchronous re-entry into the same job's dispatcher is short-circuited.

Why this approach

Two alternatives were considered in the issue and rejected:

  • Per-event visited set — works, but every progress event carries new allocation/cleanup overhead, even on the hot path where there is no cycle. The bug is a self-recursion guard, so a per-instance flag is enough.
  • queueMicrotask to defer dispatch — changes the synchronous contract of onProgress. The existing test 'should consume synchronous progress events' (queue.spec.ts) explicitly asserts that synchronous events are observed before the awaited delay. Deferring would break that.

The flag is per-Job, not per-Queue, so two unrelated jobs in the same queue can dispatch concurrently without interference.

Tests

Added 'should not recurse infinitely when two jobs progress-feed each other (issue #3484)' to packages/utils/test/queue.spec.ts. The test sets up two queues whose recipients reference each other's synthesised onProgress, kicks a single event into the cycle, and asserts:

  1. The dispatch does not throw.
  2. Each recipient observes the event exactly once (the second hop sees the flag set and short-circuits).

I verified the test is load-bearing by temporarily removing the guard and re-running it — without the fix it fails with the original RangeError: Maximum call stack size exceeded.

The full @libp2p/utils node test suite (217 tests) passes with the change, and aegir lint, aegir doc-check, aegir build, and aegir dep-check are all clean.

When two jobs share progress callbacks that form a cycle, a single
progress event would recurse synchronously through the recipients
forEach loop until V8 hit `Maximum call stack size exceeded` and the
process crashed. The cycle most often appears in DialQueue, where two
parallel dials of the same peer share a job via `join()` and propagate
progress events back to each other.

Add a per-Job `dispatchingProgress` flag so a synchronous re-entry into
the same job's synthesised onProgress short-circuits instead of
recursing. Non-cyclic dispatches behave identically; only the cycle is
broken.

Includes a regression test that reproduces the recursion — without the
guard it fails with `RangeError: Maximum call stack size exceeded`.

Fixes libp2p#3484
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Job onProgress can recurse infinitely when two queues' progress callbacks reference each other

1 participant