Skip to content

feat: concurrency per label with head-of-line skip (#1304)#1314

Merged
isamu merged 7 commits into
mainfrom
feat/concurrency-per-label
May 3, 2026
Merged

feat: concurrency per label with head-of-line skip (#1304)#1314
isamu merged 7 commits into
mainfrom
feat/concurrency-per-label

Conversation

@isamu
Copy link
Copy Markdown
Contributor

@isamu isamu commented May 2, 2026

Closes #1304

Summary

  • GraphData.concurrency is extended to accept either a number (legacy) or { global: number, labels?: Record<string, number> }.
  • ComputedNodeData gains an optional label: string. When set on a node and the label has a configured limit, that node counts against the per-label cap as well as the global cap.
  • TaskManager walks the priority-sorted queue and dispatches the first task whose label still has capacity ("head-of-line skip"), so a saturated label cannot stall unrelated work.
  • 100% backward compatible: concurrency: number and unlabeled nodes behave exactly as before.

Items to Confirm / Review

  1. HoL skip vs strict priority order — when the highest-priority queued task is blocked by its label, lower-priority tasks of other labels are dispatched first. This is documented in the issue and intended (otherwise label limits would degenerate into global stalls), but it does mean priority is now "priority within a feasible-label set." Starvation of a hot label is possible by design; aging is split out to TaskManager: queue で待機中のタスクの優先度を時間経過で自動昇格 (aging) させたい #1311.
  2. onComplete drains aggressively — when a label slot opens, onComplete keeps dispatching while the global limit allows. This is what makes mixed labeled/free-label workloads behave like the user expects, but please sanity-check the loop condition in task_manager.ts:108-114.
  3. Nested graphsprepareForNesting()/restoreAfterNesting() still touch only the global limit. Per-label limits are not bumped when nested graphs run. If a nested graph's tasks share a label with the parent, they share the cap. Flag if a different policy is wanted.
  4. Validator messages — the new error messages (Concurrency.global must be ..., Concurrency.labels.<key> must be ...) follow the existing Concurrency must be ... style. Check the tone matches your preference.

User Prompt

現在のGraphAIでは concurrency は全体の同時実行数だけだが、特定LLM/外部APIごとに並列上限を設定したい(レート制限対策)。agent単位 / agent + model単位 / または node に label を付けて label単位、で制御したい。

例:

concurrency:
  global: 20
  labels:
    openai: 3
    vertex: 5

設計判断:

  • HoL skip 方式 (B案) — キュー先頭から走査して、global枠 AND label枠が空いている最初のタスクを dispatch
  • priority は既存の数値仕様をそのまま維持(数値が大きいほど優先)

aging / weight / RPM(token bucket) はそれぞれ別チケットに切り出した:

事前に変更箇所のユニットテストが足りているかチェックし、不足していれば現状動作を locked-in するテストを足してから実装すること。

Implementation Approach

Order of work (TDD-ish)

  1. Coverage gap auditTaskManager had zero direct unit tests. Validator only had 4 negative tests for concurrency.
  2. Characterization tests first — added 9 tests locking current TaskManager behavior (priority order, FIFO, prepareForNesting, isRunning, reset, getStatus, etc.) and 3 positive tests for the validator. Confirmed all 212 baseline tests pass on unchanged code.
  3. Implement — type.ts, task_manager.ts, node.ts, validator.
  4. Add label-feature tests — 10 new TaskManager tests + 7 new validator tests covering object-form concurrency.
  5. Verifyyarn build, yarn test (all 19 workspaces, fail 0), yarn format.

Files changed

File Change
packages/graphai/src/type.ts Add ComputedNodeData.label, ConcurrencyConfig, union concurrency
packages/graphai/src/task_manager.ts Constructor accepts union; HoL skip; per-label running counts
packages/graphai/src/node.ts Read data.label into ComputedNode.label
packages/graphai/src/validators/common.ts label added to allowed computed-node attribute keys
packages/graphai/src/validators/graph_data_validator.ts Validate object-form concurrency
packages/graphai/tests/units/test_task_manager.ts New — 19 tests
packages/graphai/tests/validators/test_validator_graph_data.ts +10 tests
plans/feat-concurrency-per-label.md New — design doc

Algorithm sketch (TaskManager.dequeueTaskIfPossible)

if global_running >= global_limit: return
for task in queue (priority desc):
    label = task.node.label
    if label is None or label not in labelLimits:
        dispatch(task); return
    if running_by_label[label] < labelLimits[label]:
        dispatch(task); return
return  # nothing eligible right now; wait for next onComplete

onComplete then loops dequeueTaskIfPossible while progress is being made, so a single completion can dispatch multiple newly-eligible queued tasks (bounded by the global limit).

Validation

  • yarn build — succeeds across all workspaces
  • yarn test — 19 workspaces, 0 failures (graphai package: 212 → 229 tests after this PR)
  • yarn format — clean
  • yarn pack --dry-run not applicable (no package.json change)
  • yarn eslint — preexisting errors only in untracked scratch files (tests/ensemble.ts, tests/runner.ts, tests/units/test_aaa.ts, tests/units/test_graph_aaa.ts); no errors introduced by this PR

Test plan

  • CI passes
  • Manual: write a graph with concurrency: { global: 10, labels: { openai: 2 } }, send 6 nodes labeled openai and confirm only 2 run concurrently while other labels run freely
  • Manual: confirm legacy concurrency: 5 graphs continue to work without changes

Out of scope (followups)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Optional node labels for grouping tasks and applying per-label concurrency limits.
    • Scheduler now respects per-label limits and skips blocked queue heads so eligible tasks can run.
  • Bug Fixes

    • Nesting/preparation now correctly preserves and restores label-aware concurrency state across early exits.
  • Tests

    • Expanded tests covering label validation, label-aware scheduling, nesting, and concurrency behaviors.
  • Documentation

    • Added design doc describing label-based concurrency controls.

Why: a single global concurrency limit cannot model per-API rate concerns
(e.g. limiting OpenAI calls to 3 in flight while keeping unrelated nodes
running freely). Adds optional per-label limits that compose with the
existing global limit.

- type.ts: add ComputedNodeData.label and ConcurrencyConfig union for
  GraphData.concurrency. Number form remains valid.
- task_manager.ts: constructor accepts number | ConcurrencyConfig.
  dequeueTaskIfPossible walks the priority-sorted queue and dispatches
  the first task whose label still has capacity (HoL skip), so a saturated
  label cannot block other work. onComplete drains as many newly-eligible
  tasks as the global limit allows.
- node.ts / validators: read and validate the new fields.

Tests:
- New tests/units/test_task_manager.ts (19 tests) - locks current
  TaskManager behavior (priority order, FIFO, nesting, reset) and covers
  the new label semantics (HoL skip, mixed labeled/unlabeled, slot release).
- tests/validators/test_validator_graph_data.ts gains 10 tests for the
  object-form concurrency validator.

Backward compatibility: concurrency: number and unlabeled nodes behave
identically to before. prepareForNesting still bumps the global limit only.

Follow-ups split out as separate issues:
- #1311 aging (queued-task priority bump)
- #1312 weight (weighted concurrency)
- #1313 token-bucket RPM/TPM (with rationale for why concurrency-only is
  insufficient for TPM and burst control)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 2, 2026

Warning

Rate limit exceeded

@isamu has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 35 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c3d784c6-be0b-42be-8661-4b9a0407714b

📥 Commits

Reviewing files that changed from the base of the PR and between 5d7314c and 0e8e395.

📒 Files selected for processing (7)
  • packages/graphai/src/graphai.ts
  • packages/graphai/src/node.ts
  • packages/graphai/src/task_manager.ts
  • packages/graphai/src/utils/utils.ts
  • packages/graphai/src/validators/graph_data_validator.ts
  • packages/graphai/tests/units/test_task_manager.ts
  • plans/feat-concurrency-per-label.md
📝 Walkthrough

Walkthrough

Adds optional per-node label and a ConcurrencyConfig union for GraphData.concurrency; TaskManager enforces global and optional per-label concurrency with head-of-line skipping; validators validate the new config and label; nested-execution nesting uses label-aware prepare/restore; comprehensive unit and validator tests plus design doc added.

Changes

Per-Label Concurrency Control

Layer / File(s) Summary
Type & Schema Extensions
packages/graphai/src/type.ts
Adds label?: string to ComputedNodeData, introduces exported ConcurrencyConfig = { global: number; labels?: Record<string, number> }, and makes `GraphData.concurrency?: number
Node wiring
packages/graphai/src/node.ts
ComputedNode gains public readonly label?: string (set only when data.label is a string). execute() uses prepareForNesting(this.label) / restoreAfterNesting(this.label) with a try/finally guarded nestingPrepared flag.
Validator Schema Updates
packages/graphai/src/validators/common.ts, packages/graphai/src/validators/computed_node_validator.ts, packages/graphai/src/validators/graph_data_validator.ts
computedNodeAttributeKeys now includes "label". computedNodeValidator enforces label is a string when present. graph_data_validator adds isPlainObject, validateConcurrencyValue, and validateConcurrencyConfig to validate `number
Core TaskManager Label-Aware Logic
packages/graphai/src/task_manager.ts
Constructor now accepts `number
Tests & Documentation
packages/graphai/tests/units/test_task_manager.ts, packages/graphai/tests/validators/test_validator_graph_data.ts, packages/graphai/tests/validators/test_validator_computed_node.ts, plans/feat-concurrency-per-label.md
Extensive unit tests exercise global concurrency, FIFO/priority semantics, per-label limits, independent label scheduling, HoL skipping, unlabeled/unconfigured-label behavior, nested-label bumping behavior, prepareForNesting/restoreAfterNesting, reset, and verbose status. Validator tests cover numeric and object-form concurrency (valid and many malformed cases). Added design doc describing behavior and backward compatibility.

Sequence Diagram

sequenceDiagram
    participant Client as Task Producer
    participant Manager as TaskManager
    participant Queue as Priority Queue
    participant Running as Running Tracker
    participant Worker as Task Executor

    Client->>Manager: addTask(task { label? }, priority)
    Manager->>Manager: normalizeConcurrencyConfig / canRun(task)
    alt eligible (global && label OK)
        Manager->>Running: increment global & label counters
        Manager->>Worker: dispatch task callback
    else ineligible
        Manager->>Queue: enqueue task (priority-sorted)
    end

    Worker->>Manager: onComplete(task)
    Manager->>Running: decrement global & label counters
    Manager->>Queue: scan queue (HoL skip) for first eligible task
    loop dispatch while eligible tasks exist
        Manager->>Running: increment counters
        Manager->>Worker: dispatch next eligible task
        Manager->>Queue: remove dispatched entry
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I hop along queues with a label so neat,
Global gates open, but some labels I must seat.
If one line is full, I glance and skip ahead,
Let others sprint past while the tight ones tread.
A tidy garden of tasks — concurrency well-fed!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main change: adding per-label concurrency control with head-of-line skip dispatching strategy.
Linked Issues check ✅ Passed The implementation fully addresses issue #1304 requirements: per-label concurrency limits alongside global limits [#1304], head-of-line skip to prevent label saturation from stalling unrelated work [#1304], label support on nodes [#1304], and backward compatibility with numeric concurrency [#1304].
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing per-label concurrency control and associated validation. The plan document and tests are supporting artifacts for the feature.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/concurrency-per-label

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 6 minutes and 35 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/graphai/src/validators/common.ts`:
- Line 16: computedNodeValidator currently accepts a `label` key without
checking its type, allowing non-string values (e.g., numbers/booleans) to pass
and bypass TaskManager's per-label limits; update the validator (the
computedNodeValidator in common.ts) to validate that node.label is a string
before accepting the key—reject or throw for non-string label values so only
string labels are permitted and TaskManager label-based caps work correctly.

In `@packages/graphai/src/validators/graph_data_validator.ts`:
- Around line 18-32: The concurrency object validation must reject unknown keys:
after verifying concurrency is an object and that "global" exists, ensure the
object only contains the allowed keys "global" and optionally "labels" by
inspecting Object.keys(concurrency) and throwing a ValidationError if any other
key is present; keep using validateConcurrencyValue(concurrency.global,
"Concurrency.global") and the existing labels branch (typeof concurrency.labels
checks and validateConcurrencyValue for each label) but add the extra-key check
before validating values so typos like "label" fail fast.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ba501093-d1c6-4063-9234-68a8b93bcbc3

📥 Commits

Reviewing files that changed from the base of the PR and between cb15212 and 591f365.

📒 Files selected for processing (8)
  • packages/graphai/src/node.ts
  • packages/graphai/src/task_manager.ts
  • packages/graphai/src/type.ts
  • packages/graphai/src/validators/common.ts
  • packages/graphai/src/validators/graph_data_validator.ts
  • packages/graphai/tests/units/test_task_manager.ts
  • packages/graphai/tests/validators/test_validator_graph_data.ts
  • plans/feat-concurrency-per-label.md

Comment thread packages/graphai/src/validators/common.ts
Comment thread packages/graphai/src/validators/graph_data_validator.ts Outdated
isamu and others added 2 commits May 3, 2026 08:35
Findings from Codex (#1314 review):

1. prepareForNesting/restoreAfterNesting were label-unaware, allowing a
   deadlock when a parent node holding the only slot for label X invokes
   a nested graph whose children also use label X. Fix: both methods now
   accept an optional label argument; node.ts passes its own label so
   the parent's label-bucket gets a temporary +1 slot for the nested
   children, mirroring the existing global-slot escape hatch.

2. ComputedNode.label was copied through without runtime validation, so
   a malformed (non-string) label from YAML/JSON would silently bypass
   the per-label limit (Map<string, ...>.get(<non-string>) -> undefined,
   treated as untracked). Fix: defensive coercion in ComputedNode + a
   schema check in computed_node_validator.

Also addresses user feedback to lean on existing utility predicates:
- task_manager.normalizeConcurrencyConfig now uses isObject for the
  labels guard.
- graph_data_validator now uses isObject + isNull instead of inlined
  null/typeof/Array.isArray combinations.

Tests added:
- TaskManager: nested-graph deadlock prevention, untouched behavior when
  label is undefined or unconfigured.
- computed_node_validator: rejects non-string label, accepts string label.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@isamu
Copy link
Copy Markdown
Contributor Author

isamu commented May 2, 2026

Codex cross-review iteration 1 — addressed

Codex flagged two issues (network-isolated, so verdict was returned to local CLI rather than posted here). Both addressed in 96ba009:

1. Nested-graph deadlock with per-label limits

Codex finding: prepareForNesting() only bumps the global slot. If a parent task with label openai (limit 1) invokes a nested graph that also wants label openai, the children queue forever — parent waits for children, children wait for the label slot the parent owns.

Fix: prepareForNesting(label?) and restoreAfterNesting(label?) now also bump the parent's label limit when the label has a configured cap. node.ts passes this.label to both. Behavior when label is undefined or unconfigured is unchanged.

2. Runtime label validation

Codex finding: data.label was copied straight to ComputedNode.label. A non-string label from YAML/JSON would bypass the label bucket because Map<string, ...>.get(<non-string>) returns undefined → treated as untracked.

Fix:

  • Defensive coercion in ComputedNode constructor: keep label only if it is actually a string at runtime.
  • Schema check added to computed_node_validator: Computed node label must be a string.

Also (per user feedback)

Refactored task_manager.normalizeConcurrencyConfig and graph_data_validator to use the project's isObject / isNull utilities instead of inlined null/typeof/Array.isArray checks.

Tests added (graphai package now: 229 → 235)

  • test_task_manager.ts: 3 new tests for nested-graph deadlock prevention and the no-op cases (undefined / unconfigured label).
  • test_validator_computed_node.ts: 2 new tests for label type validation.

All 19 workspace test suites pass (yarn test fail count: 0). yarn build clean.

Findings from Codex iteration-2:

1. Exception safety of nested-graph slot bookkeeping. prepareForNesting
   was paired with a happy-path-only restoreAfterNesting, so a throw
   inside the agent body leaked both the global concurrency bump and
   (after iteration-1) the label bump. This was a pre-existing bug for
   the global slot but became more impactful with the label addition.
   Fix: wrap the nested execution block in try/finally so
   restoreAfterNesting always runs.

2. validateConcurrencyConfig was accepting concurrency.labels: null
   after the iteration-1 isNull refactor relaxed the check. The
   declared type is Record<string, number>; null is malformed. Fix:
   compare against undefined directly so only the absent sentinel
   skips validation.

3. Iteration-1 tests did not actually exercise the label decrement
   path -- the only post-restore assertion was the global concurrency
   value. Strengthened test_task_manager.ts so the nested-graph test
   schedules two openai tasks after restoreAfterNesting and confirms
   one queues (which would not happen if the label limit were left
   bumped).

Plus, per user feedback to cover negative patterns more thoroughly,
broadened the validator suites:

- test_validator_computed_node.ts: label rejected for null, array,
  object, boolean (in addition to number); valid for omitted and string.
- test_validator_graph_data.ts: top-level concurrency rejected for
  null, boolean, array. concurrency.global rejected for string, null,
  negative, float. concurrency.labels.<key> rejected for string, null,
  negative, boolean, object. Empty-key edge case still validates.

graphai package tests: 235 -> 254. All 19 workspace test suites pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@isamu
Copy link
Copy Markdown
Contributor Author

isamu commented May 3, 2026

Codex cross-review iteration 2 — addressed (f804289)

1. Exception safety of nested-graph slot bookkeeping

Codex finding: prepareForNesting() was paired with a happy-path-only restoreAfterNesting(), so a throw inside agentFilterHandler leaked both the global slot and the new label bump. Pre-existing for the global slot; iteration-1 made it worse by adding label leakage.

Fix: node.ts now wraps the nested-execution block in try/finally so restoreAfterNesting(this.label) always runs, even on throw.

2. concurrency.labels: null snuck through after iteration-1 refactor

Codex finding: I had switched the labels presence check to !isNull(...), which made null pass when it shouldn't (the declared type is Record<string, number>).

Fix: replaced with concurrency.labels !== undefined. Only undefined skips validation; null and other malformed shapes are rejected.

3. Iteration-1 nesting test did not actually verify the label decrement

Codex finding: my nesting test only checked concurrency after restoreAfterNesting. A broken label decrement could have passed silently.

Fix: rebuilt the test with global=2, labels: { openai: 1 }. After the full nested cycle, two new openai tasks are scheduled — only one runs and one queues, which is impossible if the label decrement was missed (a leaked bump would let both run since the global limit is 2).

Plus, per user direction to cover negative patterns more thoroughly

Extended validator suites:

  • test_validator_computed_node.ts: label rejected for null, array, object, boolean (in addition to number); valid for omitted and string.
  • test_validator_graph_data.ts: top-level concurrency rejected for null/boolean/array; concurrency.global rejected for string/null/negative/float; concurrency.labels.<key> rejected for string/null/negative/boolean/object; empty-key edge case still validates.

Test counts

graphai package: 235 → 254 tests (all pass). 19 workspace suites: fail count 0.

Findings from Codex iteration-3:

1. The iteration-2 try/finally block did not protect the
   forNestedGraph construction. resultOf() can throw when nestedGraph
   is a DataSource, leaving prepareForNesting's bump unrestored.

   Fix: move both prepareForNesting and the forNestedGraph build
   inside the try block, and gate restoreAfterNesting on a
   `nestingPrepared` flag that is set only after prepareForNesting
   actually returned. This way:
   - throw inside the build -> finally runs, restore is called.
   - throw inside prepareForNesting itself -> flag stays false,
     restore is NOT called (no bump to undo).

2. validateConcurrencyConfig accepted any non-array object for
   labels, so Map/Date/class instances would pass typeof === "object"
   and silently disable enforcement (Object.entries yields nothing).
   It also ignored unexpected extra keys on the concurrency object,
   so a typo like `labls` would be silently dropped.

   Fix: introduce isPlainObject (proto === Object.prototype || null)
   and require it for both the concurrency object and labels. Reject
   any keys other than "global"/"labels". Tightened
   validateConcurrencyValue to also require typeof === "number" before
   the Number.isInteger check.

Tests:
- Reject extra keys on concurrency: { global, foo }.
- Reject Map / Date as concurrency value (call validateGraphData
  directly to bypass anonymization's JSON round-trip, which would
  otherwise mask the class identity).
- Reject Map / class instance as labels value.

graphai package tests: 254 -> 261. All 19 workspace suites pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@isamu
Copy link
Copy Markdown
Contributor Author

isamu commented May 3, 2026

Codex cross-review iteration 3 — addressed (5d7314c)

1. try/finally still left a gap before agent invocation

Codex finding: prepareForNesting ran first, but context.forNestedGraph = { graphData: this.graph.resultOf(...) ... } could throw before the try block, leaking the bump.

Fix: pulled both prepareForNesting and the forNestedGraph construction into the try block. Added a nestingPrepared flag set only after prepareForNesting returned, so the finally restores only when there is a bump to undo (correct on three paths: build throws → restore; agent throws → restore; prepareForNesting itself throws → no restore).

2. Validator accepted non-plain-object shapes

Codex finding: Map, Date, class instances, etc. would pass isObject(...) but yield nothing from Object.entries(), silently disabling label enforcement. Also, extra/typo keys on the concurrency object were ignored.

Fix:

  • Added isPlainObject (proto === Object.prototype || null) and require it for both the concurrency object and labels.
  • Reject any keys on the concurrency object other than global / labels (catches typos like labls).
  • Tightened validateConcurrencyValue to require typeof === "number" before Number.isInteger.

Tests

  • Reject extra keys on concurrency: { global: 5, foo: 1 }Concurrency object does not allow foo.
  • Reject Map / Date as concurrency value, and Map / class instance as labels value. These tests bypass anonymization (which JSON-roundtrips and would mask class identity) by calling validateGraphData directly.

Test counts

graphai package: 254 → 261 tests (all pass). 19 workspace suites: fail count 0.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/graphai/src/task_manager.ts`:
- Around line 127-138: prepareForNesting currently increments the global
labelLimits (mutating labelLimits) which temporarily widens the shared cap and
lets unrelated tasks with the same label consume that slot; instead, stop
mutating labelLimits and implement a per-nesting reservation tracked on the
TaskManager instance (e.g., a new Map like nestingReservations keyed by the
nesting path or a per-run id) that increments on prepareForNesting(label) and
decrements on restoreAfterNesting(label), and change throttling/availability
checks to consult labelLimits minus any global usage plus the current run's
nestingReservations (so only the current nested path sees the reserved
capacity); update prepareForNesting and restoreAfterNesting to operate on this
per-nesting reservation map and add a regression test that runs a labeled parent
that nests without using the label while a queued sibling with the same label is
waiting and assert the sibling does not start during the parent’s nesting
window.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0f890848-88fa-470f-a785-d9decc27ce65

📥 Commits

Reviewing files that changed from the base of the PR and between 591f365 and 5d7314c.

📒 Files selected for processing (7)
  • packages/graphai/src/node.ts
  • packages/graphai/src/task_manager.ts
  • packages/graphai/src/validators/computed_node_validator.ts
  • packages/graphai/src/validators/graph_data_validator.ts
  • packages/graphai/tests/units/test_task_manager.ts
  • packages/graphai/tests/validators/test_validator_computed_node.ts
  • packages/graphai/tests/validators/test_validator_graph_data.ts

Comment thread packages/graphai/src/task_manager.ts Outdated
@isamu
Copy link
Copy Markdown
Contributor Author

isamu commented May 3, 2026

Codex cross-review iteration 4 — LGTM

Codex returned LGTM with one nit: isPlainObject compares against the current realm's Object.prototype, so a plain object created in a different JS realm would be rejected.

Decision: defer. GraphAI's graph data is sourced from JSON.parse or in-process programmatic construction; cross-realm objects are not a realistic path for this codebase. The constraint matches lodash's isPlainObject and similar libraries. If a real cross-realm use case appears, it is a one-line swap to Object.prototype.toString.call(x) === "[object Object]".

Final status

  • Codex iterations 1-4: 5 substantive findings (all MUST-FIX), all addressed.
  • graphai package tests: 212 (pre-PR baseline) → 261 after this PR.
  • 19 workspace test suites: fail count 0 at every iteration.
  • yarn build clean across the monorepo.

Ready for human review and CI green-light.

CodeRabbit raised that prior iterations widened the shared per-label
cap during nesting (labelLimits++), which meant unrelated tasks
sharing that label could opportunistically take the bumped slot --
defeating the throttle even when the nested graph never used the label.

Fix: introduce a per-parent-graphId bypass map.

  nestingBypassByLabel: Map<label, Map<parentGraphId, count>>

prepareForNesting(label, parentGraphId) records a bypass for that
parent without touching labelLimits. canRun() consults the bypass
ONLY for tasks whose graphId differs from the parent's. So:

  - parent on graphId X with label openai (limit 1) is running
  - sibling on graphId X with label openai is queued
  - parent invokes nested graph -> bypass recorded for parent X
  - canRun(sibling): same graphId as parent -> bypass NOT applied
    -> sibling stays queued (correct: per-label throttle preserved)
  - canRun(child on nested graphId Y): different from parent -> +1 cap
    -> child runs (correct: deadlock avoided)

Also: prepareForNesting now drains the queue (its global slot bump
or label bypass can immediately enable a queued task).

The Map<Map<>> shape is necessary: a single graph can host two
labeled nodes simultaneously invoking nested graphs, so refcounting
per parentGraphId is required (a Set would deduplicate). Kept as
the minimum data shape that handles refcount correctly.

Documented residual edge case in test file: two completely independent
top-level graphs sharing a TaskManager would still let an "unrelated"
queued node consume the bypass, because graphId comparison alone cannot
distinguish "my actual nested child" from "an unrelated independent graph
sharing this manager." That scenario is not part of GraphAI's documented
hierarchical usage and would require threading parent-child graphId
binding through the GraphAI constructor (out of scope).

Tests:
- New: child on nested graphId runs, sibling on parent's graphId blocks.
- Updated existing nesting tests to use the (label, parentGraphId)
  signature and to model real nested-vs-sibling graphId topology.

graphai package tests: 261 -> 264. All 19 workspace suites pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@isamu
Copy link
Copy Markdown
Contributor Author

isamu commented May 3, 2026

CodeRabbit finding addressed (77ec2cf)

CodeRabbit pointed out a subtle but real bug not caught by Codex: prior iterations widened the shared per-label cap during nesting (labelLimits.set(label, +1)). That defeats the per-label throttle for any unrelated task sharing the label, because the widened cap is global to the TaskManager — even if the nested graph never uses that label, an arbitrary sibling could sneak through during the nesting window.

Fix

prepareForNesting no longer mutates labelLimits. Instead it records a per-parent-graphId bypass:

nestingBypassByLabel: Map<label, Map<parentGraphId, count>>

canRun consults the bypass only for tasks whose graphId differs from the parent's. Concretely:

task graphId vs parent's running < limit? bypass applies? result
sibling same full no (same graphId) queued ✅ throttle preserved
nested child different full yes runs ✅ deadlock avoided

The Map-of-Maps shape is the minimum needed for correctness: a single outer graph can host two labeled nodes invoking nested graphs simultaneously, so refcount per parentGraphId is required (a Set would deduplicate the second prepareForNesting from the same graph and lead to a premature decrement on the first restore).

Other adjustments

  • prepareForNesting now drains the queue, since freeing global capacity or activating a label bypass can immediately unblock queued tasks (previously this was implicit because the next addTask would trigger dispatch).
  • Existing nested-graph test rewritten to model real topology (parent on outer graphId, child on nested graphId) and to assert the sibling-blocks-during-nesting property.

Documented residual edge case

Two completely independent top-level graphs sharing a single TaskManager would still let an "unrelated" queued node consume the bypass, because graphId comparison alone cannot distinguish "my actual nested child" from "an independent graph that happens to share this manager." That is not part of GraphAI's documented hierarchical usage; closing it would require threading parent→child graphId binding through the GraphAI constructor, which is out of scope here. Documented in test_task_manager.ts.

Test counts

graphai package: 261 → 264 (all pass). 19 workspace suites: fail count 0.

@snakajima
Copy link
Copy Markdown
Collaborator

Cross-review evaluation of CodeRabbit + Codex findings

I reviewed every reviewer comment on this PR and evaluated whether each finding is valid and whether the fix is correct. Summary first, details below.

Source Findings All addressed? Quality of fixes
CodeRabbit inline 3 yes Strong; the nestingBypassByLabel design is better than what was suggested
Codex iter 1 2 yes Deadlock fix + label runtime check
Codex iter 2 3 yes Exception-safety catch was the most valuable single finding
Codex iter 3 2 yes Tightened both try/finally guard and validator plain-object check
Codex iter 4 0 (LGTM, 1 nit) deferred Reasonable

All actionable findings are valid and have been correctly addressed. No reviewer comment was dismissed without a defensible reason.


CodeRabbit comment 1 — label runtime type validation

Valid and important. Fix is correct.

Without it, label: 123 from YAML/JSON would pass schema validation, then silently bypass per-label limits because Map<string, number>.get(123) is undefined (treated as untracked). Author went further than the suggestion — also added defensive runtime coercion in ComputedNode constructor (node.ts:125), so even if validation is bypassed via direct programmatic construction, label still defaults to undefined for non-strings. Six negative tests lock the contract.

CodeRabbit comment 2 — Reject unknown keys in concurrency config

Valid and important. Fix is correct.

Without it, a typo like concurrency: { global: 5, labls: { openai: 2 } } would validate fine and run with no per-label enforcement. Fix uses a concurrencyConfigKeys whitelist tied to keyof ConcurrencyConfig — type-safe and minimal.

CodeRabbit comment 3 — Don't widen shared labelLimits during nesting (the big one)

Valid and substantive. Fix exceeds what was suggested.

This was the most consequential finding. Earlier iterations did labelLimits.set(label, current+1) during prepareForNesting, which globally widened the cap — an unrelated openai sibling could run during the parent's nesting window even if the nested graph never used openai, silently breaking the rate-limit promise.

CodeRabbit suggested a "per-nesting reservation map keyed by nesting path or per-run id." The author chose a cleaner shape: Map<label, Map<parentGraphId, refcount>>, consulted in canRun only for tasks with graphId !== parentGraphId. The reasoning is solid:

  • Same-graphId siblings → bypass not applied → cap preserved
  • Different-graphId children (the actual nested ones) → +1 capacity → deadlock avoided
  • Refcount is needed (not a Set) because a single graph can host two labeled nodes nesting simultaneously

The new test TaskManager nesting allows child on different graphId, blocks sibling on same graphId directly exercises the regression scenario, plus tears down and re-checks bookkeeping. Above the bar.

The author honestly documented one residual edge case (two completely independent top-level GraphAI instances sharing a TaskManager). This is not the documented hierarchical pattern, and the workaround is documented in the test file. Reasonable to defer.


Codex iteration 1 — Nested-graph deadlock + label runtime check

Both valid and correctly addressed. The deadlock fix was later superseded by the per-parent-graphId bypass (CodeRabbit-driven), which is the more correct shape.

Codex iteration 2 — Exception safety + null labels + weak test

All three valid. The exception safety catch was the strongest single finding in the entire review cycle: an agent throw between prepareForNesting and restoreAfterNesting leaked the bump. This was a preexisting bug for the global slot, but the label addition made the leak more impactful. Try/finally fix is correct.

The concurrency.labels: null finding was a legitimate regression from the iteration-1 isNull refactor.

The test-strength critique was fair — iteration-1's nested test only checked concurrency after restore, not the label decrement. The strengthened test properly exercises both.

Codex iteration 3 — try/finally gap + non-plain-object validator

Both valid.

  • try/finally gap: subtle. prepareForNesting was called before the try block, but context.forNestedGraph = { graphData: this.graph.resultOf(...) ... } could throw between prepare and the try. The nestingPrepared flag pattern is the right resolution — restore only when there's a bump to undo. Verified at node.ts:356-407.
  • Non-plain-object accept: Map, Date, class instances all pass typeof === "object" but yield no entries from Object.entries, silently disabling enforcement. The isPlainObject (proto check) fix is right. The tests bypass anonymization (which would JSON-roundtrip these to plain objects) by calling validateGraphData directly — a thoughtful test design choice.

Codex iteration 4 — LGTM with cross-realm isPlainObject nit

Author's defer is correct. Object.getPrototypeOf(x) === Object.prototype doesn't cross JS realms (iframes, vm contexts, worker boundaries). Realistic vector for GraphAI? Effectively zero — graph data comes from JSON.parse (always current realm) or in-process construction. Lodash, type-fest, etc. have the same constraint. One-line swap available if a real case ever appears.


Minor items not raised by any reviewer

  1. this.concurrency / reset() interaction: prepareForNesting does this.concurrency++, restoreAfterNesting does this.concurrency--. If reset() is called between them, nestingBypassByLabel is cleared but this.concurrency retains its bumped value, and the eventual restoreAfterNesting decrements below the original. Pre-existing latent bug, not introduced here. Worth a future ticket if anyone hits it.

  2. Naming inconsistency with the plan doc: plans/feat-concurrency-per-label.md:55 says globalLimit: number but the field is named concurrency. Cosmetic.

  3. Direct TaskManager construction bypasses the validator: new TaskManager({ global: 5, labels: ["openai", 2] as any }) would let isObject through normalizeConcurrencyConfig. Only relevant for non-public construction paths; could harden with the same isPlainObject check inside task_manager.ts.

None of these are blockers.

Bottom line

Reviewer engagement was high-signal, author's responses are thorough, iteration trajectory is healthy. Ready for human merge approval from my standpoint.

🤖 Generated with Claude Code

Followups from snakajima's audit on #1314:

1. plan doc had a stale name `globalLimit` that did not match the
   actual field `concurrency` in TaskManager. Fixed.

2. TaskManager constructor previously silently coerced a malformed
   ConcurrencyConfig (e.g. labels as Array, Map, or class instance)
   into something benign, so direct callers bypassing validateGraphData
   could end up with broken label enforcement without any error.

   Fix:
   - Promoted `isPlainObject` to packages/graphai/src/utils/utils.ts
     (single source of truth, used by both task_manager and the
     graph_data_validator).
   - Hardened `normalizeConcurrencyConfig` to fail fast: rejects
     non-plain-object configs, non-integer / non-positive values,
     non-plain-object labels, and bad label values.
   - Reordered GraphAI constructor to run validateGraphData /
     validateAgent BEFORE constructing TaskManager, so user-facing
     ValidationError keeps wrapping malformed graph data instead of
     TaskManager's internal Error.

   New tests in test_task_manager.ts cover: zero/negative/non-integer
   concurrency, null/array/Date as config, Map/class-instance/Array
   as labels, and non-integer label values.

Out of scope for this PR (filed as #1315):
- reset() does not restore the original concurrency value, so
  prepare/restore left dangling across reset() can drift the global
  cap. Pre-existing bug, separate issue.

graphai package tests: 264 -> 270. All 19 workspace suites pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@isamu
Copy link
Copy Markdown
Contributor Author

isamu commented May 3, 2026

Followups from @snakajima's audit (0e8e395)

1. Plan doc naming inconsistency — fixed

plans/feat-concurrency-per-label.md had globalLimit: number in the implementation sketch, but the actual TaskManager field is concurrency. Updated the plan to match the code.

2. Direct TaskManager construction not guarded — fixed

The previous normalizeConcurrencyConfig used isObject for the labels guard, which lets Array, Map, Date, class instances, etc. through. A consumer doing new TaskManager({ global: 5, labels: ["openai", 2] as any }) would silently end up with Object.entries-derived garbage in the limits map.

  • Promoted isPlainObject to packages/graphai/src/utils/utils.ts so it's the single source of truth shared by task_manager.ts and graph_data_validator.ts.
  • Hardened normalizeConcurrencyConfig to fail fast on malformed shapes (non-plain-object config, non-positive-integer values, non-plain-object labels).
  • Reordered GraphAI constructor to run validateGraphData / validateAgent before constructing the TaskManager, so user-facing ValidationError still wraps malformed graph data; TaskManager's own guard is a backstop for direct callers that bypass validation.

7 new tests in test_task_manager.ts covering: zero/negative/non-integer concurrency, null/array/Date as the whole config, Map/class-instance/Array as labels, and non-integer label values.

3. Out of scope — filed as separate issue

#1315reset() does not restore the original concurrency value, so a reset() between prepareForNesting and restoreAfterNesting can drift the global cap. Pre-existing bug; not blocking this PR.

Test counts

graphai package: 264 → 270 (all pass). 19 workspace suites: fail count 0.

CI will rerun automatically.

@isamu isamu merged commit 51248b9 into main May 3, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GraphAIでagent / label単位のconcurrency制御をサポートしたい

2 participants