feat(auto): implement external wait dispatch, polling, and tool by OfficialDelta · Pull Request #4792 · gsd-build/gsd-2

OfficialDelta · 2026-04-24T04:31:07Z

Summary

Split 2/3 of #4655 — probe execution and tool registration.

Depends on #4790 (schema). Merge that first; this PR's diff will auto-update to show only its incremental 9 files.

Dispatch rule probes external processes via pollWhileCommand (exit 0 = still running, non-zero = done)
Two-phase checking with optional successCheck for post-completion validation
Failure counting: 3-strike escalation to manual-attention
Per-probe timeout (30s–120s, separate from overall wait timeout)
gsd_register_external_wait tool with pattern-based rejection of dangerous commands (curl|sh, rm -rf, etc.)
Atomic registration: task status transition + DB insert + JSON probe spec in one operation
Every registration and probe invocation logged with full command text for audit trail

Security surface

The pollWhileCommand is executed via /bin/sh (or cmd.exe on Windows). This is arbitrary command execution by design — the agent already has full shell access via bash tools. The probe runs unattended on a timer, so:

Pattern-based rejection blocks obviously destructive commands
Full command text is logged on registration and every probe invocation
Poll interval has a 1s minimum at runtime

Incremental files (9)

Group	Files
Dispatch rule	`auto-dispatch.ts` (+197)
Tool registration	`bootstrap/db-tools.ts` (+176), `dev-workflow-engine.ts`
Tests (new)	`external-wait-e2e.test.ts`, `external-wait-registration.test.ts`, `external-wait-resume.test.ts`
Tests (tool count bumps)	`complete-slice.test.ts`, `complete-task.test.ts`, `tool-naming.test.ts`

Test plan

node --import ./src/resources/extensions/gsd/tests/resolve-ts.mjs --experimental-strip-types --test src/resources/extensions/gsd/tests/external-wait-e2e.test.ts
node --import ./src/resources/extensions/gsd/tests/resolve-ts.mjs --experimental-strip-types --test src/resources/extensions/gsd/tests/external-wait-registration.test.ts
node --import ./src/resources/extensions/gsd/tests/resolve-ts.mjs --experimental-strip-types --test src/resources/extensions/gsd/tests/external-wait-resume.test.ts
E2e tests use real child_process.exec — no mocks

Merge order: #4790 → this → #4793. Supersedes #4655.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added external process registration and monitoring with automatic polling capabilities
- Configurable timeout handling and health checks for external operations
- Automatic escalation to manual attention after repeated probe failures
- Resume-on-failure option for graceful timeout recovery
Tests
- Added comprehensive integration tests for external process monitoring and timeout scenarios

Schema v23 migration adds `external_waits` table with indexed lookup by milestone + status. New `awaiting-external` phase type, `sleep` dispatch action, journal event types, and state derivation logic enable the auto-loop to recognize and handle tasks waiting on external processes. Includes session state additions, dashboard display labels, and phase carry-forward logic for resuming after external completion. Co-Authored-By: Claude <noreply@anthropic.com>

Dispatch rule probes external processes via pollWhileCommand (exit 0 = still running, non-zero = done). Implements two-phase checking with optional successCheck, failure counting with 3-strike escalation to manual-attention, per-probe timeout, and configurable poll intervals. Adds gsd_register_external_wait tool with pattern-based rejection of dangerous commands, atomic registration with task status transition, and JSON probe spec persistence. Every registration and probe invocation is logged with full command text for audit trail. Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2026-04-24T04:31:20Z

📝 Walkthrough

Walkthrough

This change introduces a new external-wait system allowing tasks to pause execution and poll external processes. It adds an awaiting-external phase, database schema v23 with external_waits table, a registration tool, dispatch logic with polling/timeout/failure handling, session state for result carry-forward, and extensive integration tests.

Changes

Cohort / File(s)	Summary
Type System & Core Definitions `types.ts`, `engine-types.ts`, `journal.ts`	Added `"awaiting-external"` phase, `sleep` dispatch action with `durationMs`, and `"dispatch-sleep"` journal event type.
Database Schema & CRUD APIs `gsd-db.ts`	Bumped schema version to 23, introduced `external_waits` table with polling/timeout/probe-failure tracking, and exposed 6 new CRUD methods (`insertExternalWait`, `getExternalWait`, `updateExternalWaitStatus`, `incrementProbeFailureCount`, `resetProbeFailureCount`, `getAllWaitingExternalWaits`).
External-Wait Registration Tool `bootstrap/db-tools.ts`	Registered `gsd_register_external_wait` tool that validates task state, enforces command safety, persists probe spec JSON, inserts DB record, and transitions task to `awaiting-external`.
State Detection `state.ts`	Added early return in `deriveStateFromDb` when task status is `awaiting-external`, bypassing normal execution checks.
Auto-Dispatch Polling Rule `auto-dispatch.ts`	Implemented rule for `awaiting-external` phase that polls external conditions, respects per-wait timeouts with optional resume-on-failure, increments probe failure counts, escalates to `manual-attention` after failures, transitions to `executing` on success, and carries result context via `pendingExternalResume`. Introduced `sleep` dispatch action.
Auto-Loop & Phase Execution `auto/loop.ts`, `auto/phases.ts`	Added support for `sleep` dispatch action with interruptible wait using 1-second chunks; extended prompt retry logic to inject `pendingExternalResume` with truncation and adjust escalation to avoid duplicate diagnostics.
Dispatch Bridging `dev-workflow-engine.ts`	Added `sleep` action forwarding from `DispatchAction` to `EngineDispatchAction`.
Session State `auto/session.ts`	Added `pendingExternalResume: string
UI Description `auto-dashboard.ts`	Added label/description for `awaiting-external` phase in `describeNextUnit`.
Schema Migration Tests `tests/complete-*.test.ts`, `tests/ensure-db-open.test.ts`, `tests/escalation.test.ts`, `tests/gsd-db.test.ts`, `tests/md-importer.test.ts`, `tests/memory-store.test.ts`, `tests/tool-naming.test.ts`	Updated expected schema version from 22 to 23; incremented registered tool count to 31.
External-Wait Integration Tests `tests/schema-v23-external-waits.test.ts`, `tests/external-wait-registration.test.ts`, `tests/external-wait-e2e.test.ts`, `tests/external-wait-resume.test.ts`, `tests/external-wait-state-dispatch.test.ts`	Added 5 comprehensive test suites covering schema structure, tool registration validation, full lifecycle polling/timeout/failure scenarios, probe execution with success-check gating, and state-derivation/dispatch integration.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Tool as gsd_register_external_wait Tool
    participant DB as External Wait DB
    participant DispatchEngine as Auto-Dispatch Engine
    participant ExternalProc as External Process
    participant TaskState as Task Status & Session
    
    User->>Tool: Register external wait (pollWhileCommand, timeoutMs)
    Tool->>DB: Validate task status = executing
    Tool->>DB: Insert external_wait record
    Tool->>DB: Transition task → awaiting-external
    Tool-->>User: Success + JSON probe spec path
    
    Note over DispatchEngine: Polling Loop
    loop Poll until timeout or success
        DispatchEngine->>DB: Detect awaiting-external phase
        DispatchEngine->>ExternalProc: Execute poll_while_command
        alt Command succeeds (exit 0)
            ExternalProc-->>DispatchEngine: exit 0
            DispatchEngine->>DB: Mark external_wait resolved
            DispatchEngine->>DB: Transition task → executing
            DispatchEngine->>TaskState: Set pendingExternalResume with result
        else Command fails or times out
            ExternalProc-->>DispatchEngine: non-zero exit or timeout
            DispatchEngine->>DB: Increment probe_failure_count
            alt Failure count ≥ 3
                DispatchEngine->>DB: Mark task → manual-attention
            else Failure count < 3
                DispatchEngine->>TaskState: Return sleep action
                TaskState->>TaskState: Wait durationMs (interruptible)
            end
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

feat(gsd): ADR-011 Phase 2 mid-execution escalation #4410: Both PRs extend the Phase union and modify deriveStateFromDb to add early-return phase detection logic.
fix(auto): harden workflow state transitions #4758: Both PRs modify auto-mode control flow in phases.ts and the auto loop/dispatch paths, touching shared execution functions.

Suggested labels

enhancement, needs-review, test

Poem

🐰 A hop, a poll, a pause so wise,
External waits beneath the skies—
We sleep, we probe, we carry through,
Awaiting tasks reborn anew! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main feature: implementing external wait dispatch, polling, and a registration tool. It directly reflects the primary changes across multiple files in the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-24T04:31:25Z

🔴 PR Risk Report — CRITICAL


Files changed	25
Systems affected	7
Overall risk	🔴 CRITICAL

Affected Systems

Risk	System
🔴 critical	Auto Engine
🔴 critical	State Machine
🔴 critical	Agent Core
🟠 high	GSD Workflow
🟠 high	AI Providers
🟡 medium	Web Mode
🟢 low	Loader/Bootstrap

File Breakdown

Risk	File	Systems
🔴	`src/resources/extensions/gsd/auto-dashboard.ts`	Auto Engine, Web Mode
🔴	`src/resources/extensions/gsd/auto-dispatch.ts`	Auto Engine
🔴	`src/resources/extensions/gsd/auto/loop.ts`	Auto Engine
🔴	`src/resources/extensions/gsd/auto/phases.ts`	Auto Engine
🔴	`src/resources/extensions/gsd/auto/session.ts`	Auto Engine
🟠	`src/resources/extensions/gsd/bootstrap/db-tools.ts`	GSD Workflow, Loader/Bootstrap
🔴	`src/resources/extensions/gsd/state.ts`	State Machine
🔴	`src/resources/extensions/gsd/types.ts`	Agent Core, AI Providers
⚪	`src/resources/extensions/gsd/dev-workflow-engine.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/engine-types.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/gsd-db.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/journal.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/complete-slice.test.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/complete-task.test.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/ensure-db-open.test.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/escalation.test.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/external-wait-e2e.test.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/external-wait-registration.test.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/external-wait-resume.test.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/gsd-db.test.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/md-importer.test.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/memory-store.test.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/schema-v23-external-waits.test.ts`	(unclassified)
⚪	`src/resources/extensions/gsd/tests/tool-naming.test.ts`	(unclassified)

⚠️ 🔴 Critical risk — the following systems require verification before merge:

🔴 Auto Engine: validate auto-mode trigger conditions and loop termination

🔴 State Machine: test state persistence across a session restart

🔴 Agent Core: check for race conditions and loop exit paths in the agent loop

🟠 GSD Workflow: verify GSD workflow state transitions end-to-end

🟠 AI Providers: confirm provider API contract and error handling are intact

⛔ This PR should not be merged without executing this follow-up prompt.

Ask your coding agent to verify before submitting:
Review this PR for risks in: Auto Engine, State Machine, Agent Core, GSD Workflow, AI Providers. Verify:

1. validate auto-mode trigger conditions and loop termination
2. test state persistence across a session restart
3. check for race conditions and loop exit paths in the agent loop
4. verify GSD workflow state transitions end-to-end
5. confirm provider API contract and error handling are intact

Before modifying any code, assess the scope of this fix:

- Identify the root cause, not just the reported symptom.
- Search the codebase for other call sites, similar patterns, or duplicated logic that may share the same bug.
- List affected tests, documentation, and any downstream consumers that depend on the current behavior.
- Flag any changes that extend beyond the immediate file or function.

Report findings first. Then propose a fix scoped to the actual root cause, and wait for confirmation before applying changes outside the originally reported location.
💡 Have a Codex subscription? Get an independent second opinion: codex review --adversarial

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (6)

src/resources/extensions/gsd/auto-dashboard.ts (1)

186-187: Add null-safe fallback for awaiting-external label text.

If activeTask is missing, this can render undefined: undefined in the dashboard.

Proposed guard for stable label rendering

     case "awaiting-external":
-      return { label: `Awaiting external process for ${tid}: ${tTitle}`, description: "Task is waiting on an external job; probe will check status." };
+      return {
+        label: tid
+          ? `Awaiting external process for ${tid}${tTitle ? `: ${tTitle}` : ""}`
+          : "Awaiting external process",
+        description: "Task is waiting on an external job; probe will check status.",
+      };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/auto-dashboard.ts` around lines 186 - 187, The
"awaiting-external" case builds a label using tid and tTitle which can be
undefined if activeTask is missing; update the label construction in the case
for "awaiting-external" to use null-safe fallbacks (e.g., const tidSafe =
activeTask?.id ?? "unknown"; const tTitleSafe = activeTask?.title ?? "untitled")
or inline optional chaining with nullish coalescing so the returned label never
reads "undefined: undefined" and instead shows a sensible default when
activeTask, tid, or tTitle are absent.

src/resources/extensions/gsd/engine-types.ts (1)

45-46: Update the union docs to mention sleep.

The type is correct, but the explanatory bullets above EngineDispatchAction still omit the new action.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/engine-types.ts` around lines 45 - 46, The
documentation bullets above the EngineDispatchAction union are missing the newly
added "sleep" action; update the explanatory comments describing
EngineDispatchAction to include a bullet describing { action: "sleep";
durationMs: number } (what it does and what durationMs represents) so the docs
match the union type defined by EngineDispatchAction.

src/resources/extensions/gsd/tests/tool-naming.test.ts (1)

48-48: Avoid hardcoded total tool count in this assertion.

This passes now, but it’s brittle as registration evolves. Prefer deriving expected count from RENAME_MAP.length plus explicit non-paired tools.

Refactor suggestion

-assert.deepStrictEqual(pi.tools.length, 31, 'Should register exactly 31 tools (14 canonical + 14 aliases + 1 gate tool + 1 gsd_skip_slice + 1 gsd_register_external_wait)');
+const expectedToolCount = (RENAME_MAP.length * 2) + 3; // gate + gsd_skip_slice + gsd_register_external_wait
+assert.deepStrictEqual(
+  pi.tools.length,
+  expectedToolCount,
+  `Should register exactly ${expectedToolCount} tools`,
+);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/tests/tool-naming.test.ts` at line 48, The test
hardcodes the total tool count; instead compute the expected count dynamically
by using RENAME_MAP.length for the paired canonical+alias tools and then add the
explicit non-paired tools (the gate tool, 'gsd_skip_slice', and
'gsd_register_external_wait'), then assert pi.tools.length equals that computed
value; update the assertion in tool-naming.test.ts to reference RENAME_MAP and
the three named tools rather than the literal 31 to avoid brittleness.

src/resources/extensions/gsd/tests/external-wait-registration.test.ts (1)

313-379: Test the real registration implementation instead of a local clone.

simulateHandlerFlow() is already drifting from registerExternalWaitExecute in src/resources/extensions/gsd/bootstrap/db-tools.ts—it omits the dangerous-command checks, interval validation, resolveTasksDir() path resolution, and the real response shaping. That means this suite can stay green while the actual tool breaks. Please extract the handler into an exported helper or drive it through the registered tool surface.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/tests/external-wait-registration.test.ts` around
lines 313 - 379, simulateHandlerFlow is a diverging local clone of
registerExternalWaitExecute (in
src/resources/extensions/gsd/bootstrap/db-tools.ts) missing dangerous-command
checks, interval validation, resolveTasksDir path resolution and real response
shaping; replace the local duplicate by calling the real implementation (or
extract the shared logic into an exported helper and import it in the test) so
the test exercises registerExternalWaitExecute’s actual behavior, ensure the
helper/used function performs the dangerous-command validation, interval and
timeout validation, uses resolveTasksDir (instead of join(..., ".gsd")), and
returns the same shaped response as registerExternalWaitExecute so the test
fails when the real tool breaks.

src/resources/extensions/gsd/tests/external-wait-resume.test.ts (1)

430-437: Avoid a hard node:sqlite dependency in this failure-count setup.

This one-off mutation bypasses gsd-db.ts and only works when the built-in driver is available. Please move it behind the same provider-agnostic helper pattern as the rest of the fixture code so the suite still runs under the better-sqlite3 fallback.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/tests/external-wait-resume.test.ts` around lines
430 - 437, The test currently imports node:sqlite and calls DatabaseSync to
mutate external_waits.probe_failure_count (rawDb.prepare(...).run()), which
breaks when the built-in driver is swapped; replace that direct DatabaseSync
usage with the same provider-agnostic DB helper used by the other fixtures (the
helper around gsd-db.ts) to perform the UPDATE for milestone_id 'M001', slice_id
'S01', task_id 'T01' and set probe_failure_count = 2; locate the one-off
mutation (rawDb.prepare(...).run()) and call the shared helper method instead
(the helper that exposes a run/prepare-style API used elsewhere in the test
suite) so the change works with better-sqlite3 fallback.

src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts (1)

161-202: Build this fixture through gsd-db.ts instead of raw node:sqlite.

insertExternalWait() exists now, so this helper is duplicating production SQL and hard-wiring the suite to the built-in driver. That makes the test drift-prone and can fail anywhere the DB layer is using the better-sqlite3 fallback instead.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts`
around lines 161 - 202, The test helper inserts rows using raw node:sqlite via
getRawDb and insertExternalWaitRow which duplicates SQL and ties tests to the
built-in driver; replace calls to insertExternalWaitRow with the public
insertExternalWait function from gsd-db.ts (import it in the test) and remove
raw DB access (getRawDb/DatabaseSync usage) so the fixture is built through the
production DB API and respects any driver/fallback behavior; ensure you pass the
same opts (milestoneId, sliceId, taskId, pollWhileCommand, pollIntervalMs,
timeoutMs, probeFailureCount) to insertExternalWait so the test data remains
identical.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/resources/extensions/gsd/auto-dispatch.ts`:
- Line 767: The probe audit currently writes raw command and output objects to
disk (the appendFileSync call that writes JSON containing registeredAt,
timeoutMs, onTimeout), which can persist secrets; replace this by
sanitizing/redacting any secret-bearing arguments and stdout before logging:
implement a small helper (e.g., redactSensitiveArgs or sanitizeForLog) that
masks tokens, passwords, signed URLs and long-looking secrets in
strings/arrays/objects, call it on onTimeout (and any command/output fields)
before building the JSON, and use the same helper for the analogous registration
logging in the db-tools registration code where command/output are persisted;
ensure the log still contains timestamps and non-sensitive metadata
(registeredAt, timeoutMs) but never raw command or output with secrets.
- Around line 808-815: The exec callback currently treats any err from exec as a
"job finished" result; detect shell/setup/infrastructure failures (e.g. err.code
=== 127, 126, or other non-probe semantics like command not found or permission
denied) and do NOT resolve as a successful probe completion. Instead return a
marker (or reject) that indicates an infrastructure probe failure so the
surrounding probe loop (the code handling pollWhileCommand, probeTimeoutMs,
probeShell and the logic between the exec result handling and the
failure-count/resume logic) increments the probe-failure path rather than
treating it as job-done; update the exec callback to inspect (err as any).code
and (err as any).killed and branch: legitimate probe exit -> resolve({ exitCode:
0,... }) or resolve with exitCode from process, but infrastructure errors
(127,126, command-not-found, shell syntax) should be surfaced as failure (e.g.
resolve({ infraFailure: true, exitCode: (err as any).code, stdout })) so the
downstream logic can increment failures instead of resuming the task.

In `@src/resources/extensions/gsd/auto/loop.ts`:
- Around line 435-443: The sleep branch exits the turn loop with continue and
never calls finishTurn(...), leaving the turn lifecycle open; after the chunked
sleep loop (inside the if (dispatch.action === "sleep") block) call and await
finishTurn(...) with the same context used elsewhere (e.g., await finishTurn(s,
dispatch) or the equivalent signature used in this file) before continuing,
ensuring you only continue if finishTurn completes and preserving use of
s.active for responsiveness.

In `@src/resources/extensions/gsd/bootstrap/db-tools.ts`:
- Around line 1181-1189: The runtime guards for pollIntervalMs and timeoutMs
enforce a 1ms minimum but the contract requires a 1s minimum; update the
validation in the function containing pollIntervalMs/timeoutMs to require >=
1000 (ms) instead of >= 1, update the error messages to say "must be at least
1000 ms (1s)" and keep the comparison between timeoutMs and pollIntervalMs but
ensure it compares using the same 1000ms floor; also apply the same change to
the other validation block that handles pollIntervalMs/timeoutMs (the second
occurrence referenced in the diff) so both runtime guards and messaging match
the documented 1-second minimum.
- Around line 1222-1236: The current registration isn't atomic: if
insertExternalWait(s) succeeds but updateTaskStatus(...) throws, the catch only
unlinks jsonPath and leaves the external_waits row orphaned; fix by wrapping the
two DB operations in a single transaction (begin/commit/rollback) so both
insertExternalWait and updateTaskStatus execute atomically, or if transactions
aren't available, delete the inserted external_wait row inside the catch before
unlinking the JSON file (use the same identifiers used by insertExternalWait to
find the row) and ensure rollback/cleanup logic runs for any thrown dbErr to
preserve the task status + DB row + JSON spec invariant.

In `@src/resources/extensions/gsd/gsd-db.ts`:
- Around line 560-579: The external_waits table rows are not being included when
snapshotting or merging task state, causing tasks with status
'awaiting-external' to lose their probe row after restore/merge; update the
snapshot/merge logic in the same-file flows that handle task state—specifically
restoreManifest(), reconcileWorktreeDb(), and any functions that
serialize/deserialize tasks—to persist and rehydrate external_waits rows
alongside the tasks table so that PRIMARY KEY (milestone_id, slice_id, task_id)
entries from external_waits are written to the DB during restore/merge and read
back on reconcile, ensuring resolveDispatch() will find the probe row and resume
polling.

In `@src/resources/extensions/gsd/tests/external-wait-e2e.test.ts`:
- Around line 206-275: The POSIX-only stateful probe test ("stateful POSIX
probe: exit 0 twice...") creates and executes a shell script (probeScript, uses
chmodSync and shebang, writes counterFile) and sets pollWhileCommand to a POSIX
command, which will fail on Windows; either skip this test on Windows by
detecting process.platform === 'win32' and calling test.skip() at the top of the
test, or replace the shell probe with a platform-portable Node-based probe
(e.g., a small JS script or a cross-platform command) and update usages of
pollWhileCommand, probeScript, chmodSync, and counterFile accordingly so CI
Windows runners don't execute POSIX-only commands.

In `@src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts`:
- Around line 397-408: Replace platform-specific "sleep 35" probes with a
portable long-running command so the test runs on Windows and POSIX: update the
two occurrences (in the test "probe timeout increments failure count" where
insertExternalWaitRow sets pollWhileCommand and where writeProbeSpec writes the
probe) to use a Node-based one-liner launched via the test runtime (i.e., use
process.execPath to run a short JS that waits ~35s) instead of "sleep 35"; apply
the same change to the other test block around lines 428-439 that uses "sleep
35".

---

Nitpick comments:
In `@src/resources/extensions/gsd/auto-dashboard.ts`:
- Around line 186-187: The "awaiting-external" case builds a label using tid and
tTitle which can be undefined if activeTask is missing; update the label
construction in the case for "awaiting-external" to use null-safe fallbacks
(e.g., const tidSafe = activeTask?.id ?? "unknown"; const tTitleSafe =
activeTask?.title ?? "untitled") or inline optional chaining with nullish
coalescing so the returned label never reads "undefined: undefined" and instead
shows a sensible default when activeTask, tid, or tTitle are absent.

In `@src/resources/extensions/gsd/engine-types.ts`:
- Around line 45-46: The documentation bullets above the EngineDispatchAction
union are missing the newly added "sleep" action; update the explanatory
comments describing EngineDispatchAction to include a bullet describing {
action: "sleep"; durationMs: number } (what it does and what durationMs
represents) so the docs match the union type defined by EngineDispatchAction.

In `@src/resources/extensions/gsd/tests/external-wait-registration.test.ts`:
- Around line 313-379: simulateHandlerFlow is a diverging local clone of
registerExternalWaitExecute (in
src/resources/extensions/gsd/bootstrap/db-tools.ts) missing dangerous-command
checks, interval validation, resolveTasksDir path resolution and real response
shaping; replace the local duplicate by calling the real implementation (or
extract the shared logic into an exported helper and import it in the test) so
the test exercises registerExternalWaitExecute’s actual behavior, ensure the
helper/used function performs the dangerous-command validation, interval and
timeout validation, uses resolveTasksDir (instead of join(..., ".gsd")), and
returns the same shaped response as registerExternalWaitExecute so the test
fails when the real tool breaks.

In `@src/resources/extensions/gsd/tests/external-wait-resume.test.ts`:
- Around line 430-437: The test currently imports node:sqlite and calls
DatabaseSync to mutate external_waits.probe_failure_count
(rawDb.prepare(...).run()), which breaks when the built-in driver is swapped;
replace that direct DatabaseSync usage with the same provider-agnostic DB helper
used by the other fixtures (the helper around gsd-db.ts) to perform the UPDATE
for milestone_id 'M001', slice_id 'S01', task_id 'T01' and set
probe_failure_count = 2; locate the one-off mutation (rawDb.prepare(...).run())
and call the shared helper method instead (the helper that exposes a
run/prepare-style API used elsewhere in the test suite) so the change works with
better-sqlite3 fallback.

In `@src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts`:
- Around line 161-202: The test helper inserts rows using raw node:sqlite via
getRawDb and insertExternalWaitRow which duplicates SQL and ties tests to the
built-in driver; replace calls to insertExternalWaitRow with the public
insertExternalWait function from gsd-db.ts (import it in the test) and remove
raw DB access (getRawDb/DatabaseSync usage) so the fixture is built through the
production DB API and respects any driver/fallback behavior; ensure you pass the
same opts (milestoneId, sliceId, taskId, pollWhileCommand, pollIntervalMs,
timeoutMs, probeFailureCount) to insertExternalWait so the test data remains
identical.

In `@src/resources/extensions/gsd/tests/tool-naming.test.ts`:
- Line 48: The test hardcodes the total tool count; instead compute the expected
count dynamically by using RENAME_MAP.length for the paired canonical+alias
tools and then add the explicit non-paired tools (the gate tool,
'gsd_skip_slice', and 'gsd_register_external_wait'), then assert pi.tools.length
equals that computed value; update the assertion in tool-naming.test.ts to
reference RENAME_MAP and the three named tools rather than the literal 31 to
avoid brittleness.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 058c9ded-76ee-4776-8392-f1fc5f178a8f

📥 Commits

Reviewing files that changed from the base of the PR and between 58d3d4d and 12ac93d.

📒 Files selected for processing (25)

src/resources/extensions/gsd/auto-dashboard.ts
src/resources/extensions/gsd/auto-dispatch.ts
src/resources/extensions/gsd/auto/loop.ts
src/resources/extensions/gsd/auto/phases.ts
src/resources/extensions/gsd/auto/session.ts
src/resources/extensions/gsd/bootstrap/db-tools.ts
src/resources/extensions/gsd/dev-workflow-engine.ts
src/resources/extensions/gsd/engine-types.ts
src/resources/extensions/gsd/gsd-db.ts
src/resources/extensions/gsd/journal.ts
src/resources/extensions/gsd/state.ts
src/resources/extensions/gsd/tests/complete-slice.test.ts
src/resources/extensions/gsd/tests/complete-task.test.ts
src/resources/extensions/gsd/tests/ensure-db-open.test.ts
src/resources/extensions/gsd/tests/escalation.test.ts
src/resources/extensions/gsd/tests/external-wait-e2e.test.ts
src/resources/extensions/gsd/tests/external-wait-registration.test.ts
src/resources/extensions/gsd/tests/external-wait-resume.test.ts
src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts
src/resources/extensions/gsd/tests/gsd-db.test.ts
src/resources/extensions/gsd/tests/md-importer.test.ts
src/resources/extensions/gsd/tests/memory-store.test.ts
src/resources/extensions/gsd/tests/schema-v23-external-waits.test.ts
src/resources/extensions/gsd/tests/tool-naming.test.ts
src/resources/extensions/gsd/types.ts

coderabbitai · 2026-04-24T04:40:40Z

+      if (Date.now() > Date.parse(registeredAt) + timeoutMs) {
+        const onTimeout = (waitRow.on_timeout as string) || "manual-attention";
+        updateExternalWaitStatus(mid, sid, tid, "timed-out");
+        try { appendFileSync(logPath, JSON.stringify({ ts: new Date().toISOString(), event: "timeout", registeredAt, timeoutMs, onTimeout }) + "\n"); } catch (logErr) { logWarning("dispatch", `Failed to write external wait log: ${logErr instanceof Error ? logErr.message : String(logErr)}`); }


⚠️ Potential issue | 🟠 Major

Probe audit logs should redact secret-bearing arguments and output.

These log entries persist the raw shell command and captured output to disk on every probe cycle. Any embedded token, password, signed URL, or sensitive identifier in the command line or stdout becomes durable plaintext under .gsd, and the same pattern is also used at registration time in src/resources/extensions/gsd/bootstrap/db-tools.ts Lines 1178-1179.

Also applies to: 819-820, 857-857

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/resources/extensions/gsd/auto-dispatch.ts` at line 767, The probe audit currently writes raw command and output objects to disk (the appendFileSync call that writes JSON containing registeredAt, timeoutMs, onTimeout), which can persist secrets; replace this by sanitizing/redacting any secret-bearing arguments and stdout before logging: implement a small helper (e.g., redactSensitiveArgs or sanitizeForLog) that masks tokens, passwords, signed URLs and long-looking secrets in strings/arrays/objects, call it on onTimeout (and any command/output fields) before building the JSON, and use the same helper for the analogous registration logging in the db-tools registration code where command/output are persisted; ensure the log still contains timestamps and non-sensitive metadata (registeredAt, timeoutMs) but never raw command or output with secrets.

coderabbitai · 2026-04-24T04:40:40Z

+        const { exitCode, killed, stdout } = await new Promise<{ exitCode: number | null; killed: boolean; stdout: string }>((resolve) => {
+          exec(pollWhileCommand, { timeout: probeTimeoutMs, shell: probeShell }, (err, stdout, _stderr) => {
+            if (err) {
+              resolve({ exitCode: (err as any).code ?? null, killed: !!(err as any).killed, stdout: stdout || "" });
+            } else {
+              resolve({ exitCode: 0, killed: false, stdout: stdout || "" });
+            }
+          });


⚠️ Potential issue | 🔴 Critical

Don't treat shell/setup failures as "job finished".

exec() reports non-zero exits through err, so command-not-found (127), permission denied (126), and shell syntax errors land in the same branch as the legitimate "done" signal. Right now those cases fall through to Line 843 and resume the task as if the external job completed. Count infrastructure/shell failures as probe failures instead of resolving the wait.

#!/bin/bash set -euo pipefail node - <<'NODE' const { exec } = require('node:child_process'); const cases = [ 'does_not_exist_12345', 'exit 1' ]; let pending = cases.length; for (const cmd of cases) { exec(cmd, (err, stdout, stderr) => { console.log(JSON.stringify({ cmd, hasErr: !!err, code: err && err.code, killed: !!(err && err.killed), stderr: String(stderr).trim() })); if (--pending === 0) process.exit(0); }); } NODE

Expected result: both commands arrive via err, but only the semantic probe failure should mean "done". Shell/setup failures should stay on the failure-count path instead.

Also applies to: 843-883

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/resources/extensions/gsd/auto-dispatch.ts` around lines 808 - 815, The exec callback currently treats any err from exec as a "job finished" result; detect shell/setup/infrastructure failures (e.g. err.code === 127, 126, or other non-probe semantics like command not found or permission denied) and do NOT resolve as a successful probe completion. Instead return a marker (or reject) that indicates an infrastructure probe failure so the surrounding probe loop (the code handling pollWhileCommand, probeTimeoutMs, probeShell and the logic between the exec result handling and the failure-count/resume logic) increments the probe-failure path rather than treating it as job-done; update the exec callback to inspect (err as any).code and (err as any).killed and branch: legitimate probe exit -> resolve({ exitCode: 0,... }) or resolve with exitCode from process, but infrastructure errors (127,126, command-not-found, shell syntax) should be surfaced as failure (e.g. resolve({ infraFailure: true, exitCode: (err as any).code, stdout })) so the downstream logic can increment failures instead of resuming the task.

coderabbitai · 2026-04-24T04:40:40Z

+        if (dispatch.action === "sleep") {
+          // Chunked sleep: poll s.active every 1s so pause/stop is responsive
+          const sleepMs = dispatch.durationMs;
+          const start = Date.now();
+          while (Date.now() - start < sleepMs && s.active) {
+            await new Promise(r => setTimeout(r, Math.min(1000, sleepMs - (Date.now() - start))));
+          }
+          continue;
+        }


⚠️ Potential issue | 🟠 Major

Finalize turn before continuing from sleep dispatch.

This branch skips finishTurn(...) and immediately continues, which leaves the turn lifecycle unmatched for observer/audit accounting.

Proposed fix

if (dispatch.action === "sleep") { // Chunked sleep: poll s.active every 1s so pause/stop is responsive const sleepMs = dispatch.durationMs; const start = Date.now(); while (Date.now() - start < sleepMs && s.active) { await new Promise(r => setTimeout(r, Math.min(1000, sleepMs - (Date.now() - start)))); } + finishTurn("skipped"); continue; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (dispatch.action === "sleep") {

// Chunked sleep: poll s.active every 1s so pause/stop is responsive

const sleepMs = dispatch.durationMs;

const start = Date.now();

while (Date.now() - start < sleepMs && s.active) {

await new Promise(r => setTimeout(r, Math.min(1000, sleepMs - (Date.now() - start))));

}

continue;

}

if (dispatch.action === "sleep") {

// Chunked sleep: poll s.active every 1s so pause/stop is responsive

const sleepMs = dispatch.durationMs;

const start = Date.now();

while (Date.now() - start < sleepMs && s.active) {

await new Promise(r => setTimeout(r, Math.min(1000, sleepMs - (Date.now() - start))));

}

finishTurn("skipped");

continue;

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/resources/extensions/gsd/auto/loop.ts` around lines 435 - 443, The sleep branch exits the turn loop with continue and never calls finishTurn(...), leaving the turn lifecycle open; after the chunked sleep loop (inside the if (dispatch.action === "sleep") block) call and await finishTurn(...) with the same context used elsewhere (e.g., await finishTurn(s, dispatch) or the equivalent signature used in this file) before continuing, ensuring you only continue if finishTurn completes and preserving use of s.active for responsiveness.

coderabbitai · 2026-04-24T04:40:40Z

+    if (pollIntervalMs !== undefined && (!Number.isInteger(pollIntervalMs) || pollIntervalMs < 1)) {
+      return { content: [{ type: "text" as const, text: "Error: pollIntervalMs must be a positive integer" }], isError: true, details: { error: "invalid pollIntervalMs" } as any };
+    }
+    if (timeoutMs !== undefined && (!Number.isInteger(timeoutMs) || timeoutMs < 1)) {
+      return { content: [{ type: "text" as const, text: "Error: timeoutMs must be a positive integer" }], isError: true, details: { error: "invalid timeoutMs" } as any };
+    }
+    if (pollIntervalMs !== undefined && timeoutMs !== undefined && timeoutMs < pollIntervalMs) {
+      return { content: [{ type: "text" as const, text: "Error: timeoutMs must be >= pollIntervalMs" }], isError: true, details: { error: "timeoutMs < pollIntervalMs" } as any };
+    }


⚠️ Potential issue | 🟡 Minor

Enforce the documented 1-second minimum here, not 1ms.

The new contract says pollIntervalMs has a 1s floor, but both the runtime guard and the TypeBox schema accept 1. Those values will be persisted and only later clamped by dispatch, which makes the tool contract misleading and leaves invalid JSON/DB state behind.

Also applies to: 1272-1273

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/resources/extensions/gsd/bootstrap/db-tools.ts` around lines 1181 - 1189, The runtime guards for pollIntervalMs and timeoutMs enforce a 1ms minimum but the contract requires a 1s minimum; update the validation in the function containing pollIntervalMs/timeoutMs to require >= 1000 (ms) instead of >= 1, update the error messages to say "must be at least 1000 ms (1s)" and keep the comparison between timeoutMs and pollIntervalMs but ensure it compares using the same 1000ms floor; also apply the same change to the other validation block that handles pollIntervalMs/timeoutMs (the second occurrence referenced in the diff) so both runtime guards and messaging match the documented 1-second minimum.

coderabbitai · 2026-04-24T04:40:40Z

+    // Insert external_waits DB row and update task status (R213, R214, R223)
+    try {
+      insertExternalWait(milestoneId, sliceId, taskId, trimmedCommand, {
+        successCheck,
+        pollIntervalMs,
+        timeoutMs,
+        contextHint,
+        onTimeout,
+      });
+      updateTaskStatus(milestoneId, sliceId, taskId, "awaiting-external");
+    } catch (dbErr) {
+      // Cleanup the JSON file to avoid partial state
+      try { unlinkSync(jsonPath); } catch (cleanupErr) { logError("db", `Failed to clean up probe spec ${jsonPath}: ${cleanupErr instanceof Error ? cleanupErr.message : String(cleanupErr)}`); }
+      return { content: [{ type: "text" as const, text: `Error: DB update failed — ${dbErr instanceof Error ? dbErr.message : String(dbErr)}` }], isError: true, details: { error: "db update failed" } as any };
+    }


⚠️ Potential issue | 🟠 Major

The registration path is not actually atomic.

If insertExternalWait() succeeds and updateTaskStatus() throws, the catch only deletes the JSON file. The external_waits row remains committed, so the task stays executing while a stray wait record says waiting. Wrap both DB mutations in one transaction, or explicitly delete the inserted wait row on rollback, so the task status + DB row + JSON spec invariant is preserved.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/resources/extensions/gsd/bootstrap/db-tools.ts` around lines 1222 - 1236, The current registration isn't atomic: if insertExternalWait(s) succeeds but updateTaskStatus(...) throws, the catch only unlinks jsonPath and leaves the external_waits row orphaned; fix by wrapping the two DB operations in a single transaction (begin/commit/rollback) so both insertExternalWait and updateTaskStatus execute atomically, or if transactions aren't available, delete the inserted external_wait row inside the catch before unlinking the JSON file (use the same identifiers used by insertExternalWait to find the row) and ensure rollback/cleanup logic runs for any thrown dbErr to preserve the task status + DB row + JSON spec invariant.

coderabbitai · 2026-04-24T04:40:40Z

+    db.exec(`
+      CREATE TABLE IF NOT EXISTS external_waits (
+        milestone_id TEXT NOT NULL,
+        slice_id TEXT NOT NULL,
+        task_id TEXT NOT NULL,
+        status TEXT NOT NULL DEFAULT 'waiting',
+        poll_while_command TEXT NOT NULL,
+        success_check TEXT,
+        poll_interval_ms INTEGER NOT NULL DEFAULT 30000,
+        timeout_ms INTEGER NOT NULL DEFAULT 86400000,
+        context_hint TEXT,
+        on_timeout TEXT NOT NULL DEFAULT 'manual-attention',
+        probe_failure_count INTEGER NOT NULL DEFAULT 0,
+        registered_at TEXT NOT NULL,
+        resolved_at TEXT,
+        PRIMARY KEY (milestone_id, slice_id, task_id),
+        FOREIGN KEY (milestone_id, slice_id, task_id) REFERENCES tasks(milestone_id, slice_id, id) ON DELETE CASCADE
+      )
+    `);
+    db.exec("CREATE INDEX IF NOT EXISTS idx_external_waits_milestone_status ON external_waits(milestone_id, status)");


⚠️ Potential issue | 🟠 Major

Persist external_waits anywhere task state is snapshotted or merged.

This table is now part of runnable workflow state, but same-file flows like restoreManifest() and reconcileWorktreeDb() still do not carry it. That leaves you with tasks.status = 'awaiting-external' but no probe row after a restore/merge, and resolveDispatch() turns that into a hard stop/manual-attention path instead of resuming polling.

Also applies to: 1256-1283, 2678-2762

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/resources/extensions/gsd/gsd-db.ts` around lines 560 - 579, The external_waits table rows are not being included when snapshotting or merging task state, causing tasks with status 'awaiting-external' to lose their probe row after restore/merge; update the snapshot/merge logic in the same-file flows that handle task state—specifically restoreManifest(), reconcileWorktreeDb(), and any functions that serialize/deserialize tasks—to persist and rehydrate external_waits rows alongside the tasks table so that PRIMARY KEY (milestone_id, slice_id, task_id) entries from external_waits are written to the DB during restore/merge and read back on reconcile, ensuring resolveDispatch() will find the probe row and resume polling.

coderabbitai · 2026-04-24T04:40:40Z

+  test("stateful POSIX probe: exit 0 twice (still running), exit 1 third (done) → resume with contextHint", async () => {
+    const tmpBase = mkdtempSync(join(tmpdir(), "gsd-e2e-multi-"));
+    base = tmpBase;
+    const gsdDir = join(tmpBase, ".gsd");
+    const m001Dir = join(gsdDir, "milestones", "M001");
+    const s01Dir = join(m001Dir, "slices", "S01");
+    const tasksDir = join(s01Dir, "tasks");
+
+    mkdirSync(tasksDir, { recursive: true });
+
+    writeFileSync(
+      join(m001Dir, "M001-CONTEXT.md"),
+      "# M001: Multi-cycle\n\n## Purpose\nMulti-cycle probe test.\n",
+    );
+    writeFileSync(
+      join(m001Dir, "M001-ROADMAP.md"),
+      [
+        "# M001: Multi-cycle",
+        "",
+        "## Vision",
+        "Multi-cycle probe.",
+        "",
+        "## Success Criteria",
+        "- Multi-cycle works",
+        "",
+        "## Slices",
+        "",
+        "- [ ] **S01: Test** `risk:low` `depends:[]`",
+        "  - After this: done.",
+        "",
+        "## Boundary Map",
+        "",
+        "| From | To | Produces | Consumes |",
+        "|------|----|----------|----------|",
+        "| S01 | terminal | result | nothing |",
+      ].join("\n"),
+    );
+    writeFileSync(
+      join(s01Dir, "S01-PLAN.md"),
+      [
+        "# S01: Test",
+        "",
+        "**Goal:** multi-cycle probe",
+        "",
+        "## Tasks",
+        "",
+        "- [ ] **T01: Test** `est:30m`",
+        "  - Do: test",
+        "  - Verify: pass",
+      ].join("\n"),
+    );
+    writeFileSync(join(tasksDir, "T01-PLAN.md"), "# T01: Test\n\n## Steps\n1. test\n");
+
+    // Create POSIX-portable stateful probe script
+    const counterFile = join(tmpBase, "counter.txt");
+    const probeScript = join(tmpBase, "probe.sh");
+    writeFileSync(
+      probeScript,
+      [
+        '#!/bin/sh',
+        'COUNT=$(cat "$1" 2>/dev/null || echo 0)',
+        'COUNT=$(expr $COUNT + 1)',
+        'echo $COUNT > "$1"',
+        'if [ $COUNT -ge 3 ]; then exit 1; fi',
+        'exit 0',
+      ].join("\n"),
+    );
+    chmodSync(probeScript, 0o755);
+
+    const pollWhileCommand = `${probeScript} ${counterFile}`;


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail if [ -d .github/workflows ]; then fd -HI '.*\.ya?ml' .github/workflows -x rg -nC2 'windows-latest|windows-2022|runs-on:.*windows' {} fi

Repository: gsd-build/gsd-2

Length of output: 445

🏁 Script executed:

# Get the test file size and then read the specific line ranges wc -l src/resources/extensions/gsd/tests/external-wait-e2e.test.ts # Read the line ranges mentioned in the review (206-275, 343-504, 522-630) echo "=== Lines 206-275 ===" sed -n '206,275p' src/resources/extensions/gsd/tests/external-wait-e2e.test.ts echo "=== Lines 343-504 ===" sed -n '343,504p' src/resources/extensions/gsd/tests/external-wait-e2e.test.ts echo "=== Lines 522-630 ===" sed -n '522,630p' src/resources/extensions/gsd/tests/external-wait-e2e.test.ts

Repository: gsd-build/gsd-2

Length of output: 12326

🏁 Script executed:

# Check for any platform detection or skip logic in the test file rg -n "process\.platform|skip|windows|darwin|linux" src/resources/extensions/gsd/tests/external-wait-e2e.test.ts

Repository: gsd-build/gsd-2

Length of output: 683

Skip these POSIX-specific test scenarios on Windows or make them platform-portable.

These test suites hardcode POSIX commands (sleep, exit N, shell scripts with #!/bin/sh) that will fail on Windows runners in your CI matrix (windows-latest, windows-2025).

Lines 206–275: Creates a shell script with POSIX syntax (#!/bin/sh, chmodSync) and invokes it directly.

Lines 343–504: Passes "sleep 35" and bare exit commands as pollWhileCommand.

Lines 522–630: Uses "exit 0" as pollWhileCommand.

While the feature advertises cmd.exe support, these tests will fail before exercising the dispatcher. Add process.platform === 'win32' checks with test.skip() or refactor the probes to be OS-portable.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/resources/extensions/gsd/tests/external-wait-e2e.test.ts` around lines 206 - 275, The POSIX-only stateful probe test ("stateful POSIX probe: exit 0 twice...") creates and executes a shell script (probeScript, uses chmodSync and shebang, writes counterFile) and sets pollWhileCommand to a POSIX command, which will fail on Windows; either skip this test on Windows by detecting process.platform === 'win32' and calling test.skip() at the top of the test, or replace the shell probe with a platform-portable Node-based probe (e.g., a small JS script or a cross-platform command) and update usages of pollWhileCommand, probeScript, chmodSync, and counterFile accordingly so CI Windows runners don't execute POSIX-only commands.

coderabbitai · 2026-04-24T04:40:40Z

+  test("probe timeout increments failure count", { timeout: 40000, skip: skipSlow }, async () => {
+    const { basePath } = createFixture();
+    updateTaskStatus("M001", "S01", "T01", "awaiting-external");
+    insertExternalWaitRow(basePath, {
+      milestoneId: "M001",
+      sliceId: "S01",
+      taskId: "T01",
+      pollWhileCommand: "sleep 35",
+      pollIntervalMs: 10000,
+      probeFailureCount: 0,
+    });
+    writeProbeSpec(basePath, "M001", "S01", "T01", "sleep 35");


⚠️ Potential issue | 🟠 Major

Use a portable long-running probe for the timeout cases.

sleep 35 only works on POSIX shells. These timeout assertions will fail on the Windows cmd.exe path that the dispatcher explicitly supports, so the coverage is currently platform-specific.

Portable test pattern

+ const longRunningProbe = `"${process.execPath}" -e "setTimeout(() => process.exit(0), 35000)"`; + test("probe timeout increments failure count", { timeout: 40000, skip: skipSlow }, async () => { const { basePath } = createFixture(); updateTaskStatus("M001", "S01", "T01", "awaiting-external"); insertExternalWaitRow(basePath, { milestoneId: "M001", sliceId: "S01", taskId: "T01", - pollWhileCommand: "sleep 35", + pollWhileCommand: longRunningProbe, pollIntervalMs: 10000, probeFailureCount: 0, }); - writeProbeSpec(basePath, "M001", "S01", "T01", "sleep 35"); + writeProbeSpec(basePath, "M001", "S01", "T01", longRunningProbe);

Also applies to: 428-439

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts` around lines 397 - 408, Replace platform-specific "sleep 35" probes with a portable long-running command so the test runs on Windows and POSIX: update the two occurrences (in the test "probe timeout increments failure count" where insertExternalWaitRow sets pollWhileCommand and where writeProbeSpec writes the probe) to use a Node-based one-liner launched via the test runtime (i.e., use process.execPath to run a short JS that waits ~35s) instead of "sleep 35"; apply the same change to the other test block around lines 428-439 that uses "sleep 35".

OfficialDelta and others added 2 commits April 24, 2026 00:28

OfficialDelta mentioned this pull request Apr 24, 2026

docs: add external process waiting guide and doctor check #4793

Open

2 tasks

github-actions Bot added enhancement New feature or request High Priority labels Apr 24, 2026

This was referenced Apr 24, 2026

feat(db): add external wait schema, types, and state tracking #4790

Open

feat(auto): add external process wait mechanism for SLURM/CI jobs #4655

Closed

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Uh oh!

Conversation

OfficialDelta commented Apr 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Security surface

Incremental files (9)

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Apr 24, 2026

🔴 PR Risk Report — CRITICAL

Affected Systems

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

OfficialDelta commented Apr 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading