fix(utils): guard against re-entrant progress dispatch in Job#3485
Open
Bojan131 wants to merge 1 commit intolibp2p:mainfrom
Open
fix(utils): guard against re-entrant progress dispatch in Job#3485Bojan131 wants to merge 1 commit intolibp2p:mainfrom
Bojan131 wants to merge 1 commit intolibp2p:mainfrom
Conversation
When two jobs share progress callbacks that form a cycle, a single progress event would recurse synchronously through the recipients forEach loop until V8 hit `Maximum call stack size exceeded` and the process crashed. The cycle most often appears in DialQueue, where two parallel dials of the same peer share a job via `join()` and propagate progress events back to each other. Add a per-Job `dispatchingProgress` flag so a synchronous re-entry into the same job's synthesised onProgress short-circuits instead of recursing. Non-cyclic dispatches behave identically; only the cycle is broken. Includes a regression test that reproduces the recursion — without the guard it fails with `RangeError: Maximum call stack size exceeded`. Fixes libp2p#3484
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Fixes #3484 —
Jobin@libp2p/utilscrashes the process withRangeError: Maximum call stack size exceededwhen two jobs in different queues end up with progress callbacks that point at each other.The crash was first reported by users running
OriginTrail/dkgbut the bug is general; any consumer that has twoQueueinstances whose recipients can transitively call back into each other's synthesisedonProgresswill hit it. We see it in practice on the dial path when two parallel dials of the same peer share a job viajoin().How
Add a per-
JobdispatchingProgressflag. The synthesisedonProgresschecks the flag, returns early if it's already dispatching, otherwise sets it, runs the recipientforEach, and clears it in afinally.This is the smallest change that fixes the cycle without altering any non-cyclic behaviour: a normal dispatch (one entry, never re-enters before returning) sees the flag flip on then off again with no observable difference. Only a synchronous re-entry into the same job's dispatcher is short-circuited.
Why this approach
Two alternatives were considered in the issue and rejected:
queueMicrotaskto defer dispatch — changes the synchronous contract ofonProgress. The existing test'should consume synchronous progress events'(queue.spec.ts) explicitly asserts that synchronous events are observed before the awaiteddelay. Deferring would break that.The flag is per-
Job, not per-Queue, so two unrelated jobs in the same queue can dispatch concurrently without interference.Tests
Added
'should not recurse infinitely when two jobs progress-feed each other (issue #3484)'topackages/utils/test/queue.spec.ts. The test sets up two queues whose recipients reference each other's synthesisedonProgress, kicks a single event into the cycle, and asserts:I verified the test is load-bearing by temporarily removing the guard and re-running it — without the fix it fails with the original
RangeError: Maximum call stack size exceeded.The full
@libp2p/utilsnode test suite (217 tests) passes with the change, andaegir lint,aegir doc-check,aegir build, andaegir dep-checkare all clean.