Skip to content

TaskManager: reset() during nesting can corrupt the concurrency counter #1315

@isamu

Description

@isamu

Background

Found during cross-review of #1314 (snakajima's audit comment).

Problem

prepareForNesting increments this.concurrency, and restoreAfterNesting decrements it. They are paired by node.ts in a try/finally so they normally balance.

reset() clears taskQueue, runningNodes, runningByLabel, and nestingBypassByLabel — but does not reset this.concurrency to its constructor-time value.

If reset() is called between a prepareForNesting and the corresponding restoreAfterNesting (e.g. user aborts a graph that has a labeled parent currently running a nested graph), the bookkeeping gets out of sync:

  1. prepareForNesting() -> concurrency = N+1
  2. reset() is called from elsewhere -> running/queue/bypass cleared, but concurrency stays at N+1
  3. The pending restoreAfterNesting() (still in flight via the finally block of the aborted node) eventually runs -> concurrency = N (correct)

Wait — actually that case ends up fine.

The bad case is when reset() is followed by a fresh prepare/restore that doesn't match the leftover bumps. Or when the in-flight finally never runs because the agent context is destroyed by the abort. Then we end up with a permanently bumped concurrency that lets one extra task run.

Pre-existing

This is not introduced by #1304 / #1314. The same latent bug exists for the global slot in pre-#1314 code (just with no label component).

Proposed fix

Capture the original concurrency in the constructor and have reset() restore it:

```ts
constructor(config: number | ConcurrencyConfig) {
const normalized = normalizeConcurrencyConfig(config);
this.originalConcurrency = normalized.global;
this.concurrency = normalized.global;
this.labelLimits = normalized.labels;
}

public reset() {
this.taskQueue.length = 0;
this.runningNodes.clear();
this.runningByLabel.clear();
this.nestingBypassByLabel.clear();
this.concurrency = this.originalConcurrency;
}
```

Out of scope (different issue)

If a node's finally block runs after reset(), the restoreAfterNesting will decrement below the original value. Fix candidate: gate restoreAfterNesting on a generation counter so post-reset restores are no-ops.

Acceptance

  • A reset() during a labeled nesting window leaves concurrency at the constructor value.
  • Add a regression test in test_task_manager.ts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions