Skip to content

feat(auto): implement external wait dispatch, polling, and tool#4792

Open
OfficialDelta wants to merge 2 commits intogsd-build:mainfrom
OfficialDelta:feat/external-wait-2-dispatch
Open

feat(auto): implement external wait dispatch, polling, and tool#4792
OfficialDelta wants to merge 2 commits intogsd-build:mainfrom
OfficialDelta:feat/external-wait-2-dispatch

Conversation

@OfficialDelta
Copy link
Copy Markdown
Contributor

@OfficialDelta OfficialDelta commented Apr 24, 2026

Summary

Split 2/3 of #4655 — probe execution and tool registration.

Depends on #4790 (schema). Merge that first; this PR's diff will auto-update to show only its incremental 9 files.

  • Dispatch rule probes external processes via pollWhileCommand (exit 0 = still running, non-zero = done)
  • Two-phase checking with optional successCheck for post-completion validation
  • Failure counting: 3-strike escalation to manual-attention
  • Per-probe timeout (30s–120s, separate from overall wait timeout)
  • gsd_register_external_wait tool with pattern-based rejection of dangerous commands (curl|sh, rm -rf, etc.)
  • Atomic registration: task status transition + DB insert + JSON probe spec in one operation
  • Every registration and probe invocation logged with full command text for audit trail

Security surface

The pollWhileCommand is executed via /bin/sh (or cmd.exe on Windows). This is arbitrary command execution by design — the agent already has full shell access via bash tools. The probe runs unattended on a timer, so:

  • Pattern-based rejection blocks obviously destructive commands
  • Full command text is logged on registration and every probe invocation
  • Poll interval has a 1s minimum at runtime

Incremental files (9)

Group Files
Dispatch rule auto-dispatch.ts (+197)
Tool registration bootstrap/db-tools.ts (+176), dev-workflow-engine.ts
Tests (new) external-wait-e2e.test.ts, external-wait-registration.test.ts, external-wait-resume.test.ts
Tests (tool count bumps) complete-slice.test.ts, complete-task.test.ts, tool-naming.test.ts

Test plan

  • node --import ./src/resources/extensions/gsd/tests/resolve-ts.mjs --experimental-strip-types --test src/resources/extensions/gsd/tests/external-wait-e2e.test.ts
  • node --import ./src/resources/extensions/gsd/tests/resolve-ts.mjs --experimental-strip-types --test src/resources/extensions/gsd/tests/external-wait-registration.test.ts
  • node --import ./src/resources/extensions/gsd/tests/resolve-ts.mjs --experimental-strip-types --test src/resources/extensions/gsd/tests/external-wait-resume.test.ts
  • E2e tests use real child_process.exec — no mocks

Merge order: #4790this#4793. Supersedes #4655.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added external process registration and monitoring with automatic polling capabilities
    • Configurable timeout handling and health checks for external operations
    • Automatic escalation to manual attention after repeated probe failures
    • Resume-on-failure option for graceful timeout recovery
  • Tests

    • Added comprehensive integration tests for external process monitoring and timeout scenarios

OfficialDelta and others added 2 commits April 24, 2026 00:28
Schema v23 migration adds `external_waits` table with indexed lookup
by milestone + status. New `awaiting-external` phase type, `sleep`
dispatch action, journal event types, and state derivation logic
enable the auto-loop to recognize and handle tasks waiting on
external processes.

Includes session state additions, dashboard display labels, and
phase carry-forward logic for resuming after external completion.

Co-Authored-By: Claude <noreply@anthropic.com>
Dispatch rule probes external processes via pollWhileCommand (exit 0 =
still running, non-zero = done). Implements two-phase checking with
optional successCheck, failure counting with 3-strike escalation to
manual-attention, per-probe timeout, and configurable poll intervals.

Adds gsd_register_external_wait tool with pattern-based rejection of
dangerous commands, atomic registration with task status transition,
and JSON probe spec persistence. Every registration and probe
invocation is logged with full command text for audit trail.

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

This change introduces a new external-wait system allowing tasks to pause execution and poll external processes. It adds an awaiting-external phase, database schema v23 with external_waits table, a registration tool, dispatch logic with polling/timeout/failure handling, session state for result carry-forward, and extensive integration tests.

Changes

Cohort / File(s) Summary
Type System & Core Definitions
types.ts, engine-types.ts, journal.ts
Added "awaiting-external" phase, sleep dispatch action with durationMs, and "dispatch-sleep" journal event type.
Database Schema & CRUD APIs
gsd-db.ts
Bumped schema version to 23, introduced external_waits table with polling/timeout/probe-failure tracking, and exposed 6 new CRUD methods (insertExternalWait, getExternalWait, updateExternalWaitStatus, incrementProbeFailureCount, resetProbeFailureCount, getAllWaitingExternalWaits).
External-Wait Registration Tool
bootstrap/db-tools.ts
Registered gsd_register_external_wait tool that validates task state, enforces command safety, persists probe spec JSON, inserts DB record, and transitions task to awaiting-external.
State Detection
state.ts
Added early return in deriveStateFromDb when task status is awaiting-external, bypassing normal execution checks.
Auto-Dispatch Polling Rule
auto-dispatch.ts
Implemented rule for awaiting-external phase that polls external conditions, respects per-wait timeouts with optional resume-on-failure, increments probe failure counts, escalates to manual-attention after failures, transitions to executing on success, and carries result context via pendingExternalResume. Introduced sleep dispatch action.
Auto-Loop & Phase Execution
auto/loop.ts, auto/phases.ts
Added support for sleep dispatch action with interruptible wait using 1-second chunks; extended prompt retry logic to inject pendingExternalResume with truncation and adjust escalation to avoid duplicate diagnostics.
Dispatch Bridging
dev-workflow-engine.ts
Added sleep action forwarding from DispatchAction to EngineDispatchAction.
Session State
auto/session.ts
Added `pendingExternalResume: string
UI Description
auto-dashboard.ts
Added label/description for awaiting-external phase in describeNextUnit.
Schema Migration Tests
tests/complete-*.test.ts, tests/ensure-db-open.test.ts, tests/escalation.test.ts, tests/gsd-db.test.ts, tests/md-importer.test.ts, tests/memory-store.test.ts, tests/tool-naming.test.ts
Updated expected schema version from 22 to 23; incremented registered tool count to 31.
External-Wait Integration Tests
tests/schema-v23-external-waits.test.ts, tests/external-wait-registration.test.ts, tests/external-wait-e2e.test.ts, tests/external-wait-resume.test.ts, tests/external-wait-state-dispatch.test.ts
Added 5 comprehensive test suites covering schema structure, tool registration validation, full lifecycle polling/timeout/failure scenarios, probe execution with success-check gating, and state-derivation/dispatch integration.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Tool as gsd_register_external_wait Tool
    participant DB as External Wait DB
    participant DispatchEngine as Auto-Dispatch Engine
    participant ExternalProc as External Process
    participant TaskState as Task Status & Session
    
    User->>Tool: Register external wait (pollWhileCommand, timeoutMs)
    Tool->>DB: Validate task status = executing
    Tool->>DB: Insert external_wait record
    Tool->>DB: Transition task → awaiting-external
    Tool-->>User: Success + JSON probe spec path
    
    Note over DispatchEngine: Polling Loop
    loop Poll until timeout or success
        DispatchEngine->>DB: Detect awaiting-external phase
        DispatchEngine->>ExternalProc: Execute poll_while_command
        alt Command succeeds (exit 0)
            ExternalProc-->>DispatchEngine: exit 0
            DispatchEngine->>DB: Mark external_wait resolved
            DispatchEngine->>DB: Transition task → executing
            DispatchEngine->>TaskState: Set pendingExternalResume with result
        else Command fails or times out
            ExternalProc-->>DispatchEngine: non-zero exit or timeout
            DispatchEngine->>DB: Increment probe_failure_count
            alt Failure count ≥ 3
                DispatchEngine->>DB: Mark task → manual-attention
            else Failure count < 3
                DispatchEngine->>TaskState: Return sleep action
                TaskState->>TaskState: Wait durationMs (interruptible)
            end
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested labels

enhancement, needs-review, test

Poem

🐰 A hop, a poll, a pause so wise,
External waits beneath the skies—
We sleep, we probe, we carry through,
Awaiting tasks reborn anew!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main feature: implementing external wait dispatch, polling, and a registration tool. It directly reflects the primary changes across multiple files in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

🔴 PR Risk Report — CRITICAL

Files changed 25
Systems affected 7
Overall risk 🔴 CRITICAL

Affected Systems

Risk System
🔴 critical Auto Engine
🔴 critical State Machine
🔴 critical Agent Core
🟠 high GSD Workflow
🟠 high AI Providers
🟡 medium Web Mode
🟢 low Loader/Bootstrap
File Breakdown
Risk File Systems
🔴 src/resources/extensions/gsd/auto-dashboard.ts Auto Engine, Web Mode
🔴 src/resources/extensions/gsd/auto-dispatch.ts Auto Engine
🔴 src/resources/extensions/gsd/auto/loop.ts Auto Engine
🔴 src/resources/extensions/gsd/auto/phases.ts Auto Engine
🔴 src/resources/extensions/gsd/auto/session.ts Auto Engine
🟠 src/resources/extensions/gsd/bootstrap/db-tools.ts GSD Workflow, Loader/Bootstrap
🔴 src/resources/extensions/gsd/state.ts State Machine
🔴 src/resources/extensions/gsd/types.ts Agent Core, AI Providers
src/resources/extensions/gsd/dev-workflow-engine.ts (unclassified)
src/resources/extensions/gsd/engine-types.ts (unclassified)
src/resources/extensions/gsd/gsd-db.ts (unclassified)
src/resources/extensions/gsd/journal.ts (unclassified)
src/resources/extensions/gsd/tests/complete-slice.test.ts (unclassified)
src/resources/extensions/gsd/tests/complete-task.test.ts (unclassified)
src/resources/extensions/gsd/tests/ensure-db-open.test.ts (unclassified)
src/resources/extensions/gsd/tests/escalation.test.ts (unclassified)
src/resources/extensions/gsd/tests/external-wait-e2e.test.ts (unclassified)
src/resources/extensions/gsd/tests/external-wait-registration.test.ts (unclassified)
src/resources/extensions/gsd/tests/external-wait-resume.test.ts (unclassified)
src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts (unclassified)
src/resources/extensions/gsd/tests/gsd-db.test.ts (unclassified)
src/resources/extensions/gsd/tests/md-importer.test.ts (unclassified)
src/resources/extensions/gsd/tests/memory-store.test.ts (unclassified)
src/resources/extensions/gsd/tests/schema-v23-external-waits.test.ts (unclassified)
src/resources/extensions/gsd/tests/tool-naming.test.ts (unclassified)

⚠️ 🔴 Critical risk — the following systems require verification before merge:

  • 🔴 Auto Engine: validate auto-mode trigger conditions and loop termination
  • 🔴 State Machine: test state persistence across a session restart
  • 🔴 Agent Core: check for race conditions and loop exit paths in the agent loop
  • 🟠 GSD Workflow: verify GSD workflow state transitions end-to-end
  • 🟠 AI Providers: confirm provider API contract and error handling are intact

⛔ This PR should not be merged without executing this follow-up prompt.

Ask your coding agent to verify before submitting:

Review this PR for risks in: Auto Engine, State Machine, Agent Core, GSD Workflow, AI Providers. Verify:

1. validate auto-mode trigger conditions and loop termination
2. test state persistence across a session restart
3. check for race conditions and loop exit paths in the agent loop
4. verify GSD workflow state transitions end-to-end
5. confirm provider API contract and error handling are intact

Before modifying any code, assess the scope of this fix:

- Identify the root cause, not just the reported symptom.
- Search the codebase for other call sites, similar patterns, or duplicated logic that may share the same bug.
- List affected tests, documentation, and any downstream consumers that depend on the current behavior.
- Flag any changes that extend beyond the immediate file or function.

Report findings first. Then propose a fix scoped to the actual root cause, and wait for confirmation before applying changes outside the originally reported location.

💡 Have a Codex subscription? Get an independent second opinion: codex review --adversarial

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (6)
src/resources/extensions/gsd/auto-dashboard.ts (1)

186-187: Add null-safe fallback for awaiting-external label text.

If activeTask is missing, this can render undefined: undefined in the dashboard.

Proposed guard for stable label rendering
     case "awaiting-external":
-      return { label: `Awaiting external process for ${tid}: ${tTitle}`, description: "Task is waiting on an external job; probe will check status." };
+      return {
+        label: tid
+          ? `Awaiting external process for ${tid}${tTitle ? `: ${tTitle}` : ""}`
+          : "Awaiting external process",
+        description: "Task is waiting on an external job; probe will check status.",
+      };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/auto-dashboard.ts` around lines 186 - 187, The
"awaiting-external" case builds a label using tid and tTitle which can be
undefined if activeTask is missing; update the label construction in the case
for "awaiting-external" to use null-safe fallbacks (e.g., const tidSafe =
activeTask?.id ?? "unknown"; const tTitleSafe = activeTask?.title ?? "untitled")
or inline optional chaining with nullish coalescing so the returned label never
reads "undefined: undefined" and instead shows a sensible default when
activeTask, tid, or tTitle are absent.
src/resources/extensions/gsd/engine-types.ts (1)

45-46: Update the union docs to mention sleep.

The type is correct, but the explanatory bullets above EngineDispatchAction still omit the new action.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/engine-types.ts` around lines 45 - 46, The
documentation bullets above the EngineDispatchAction union are missing the newly
added "sleep" action; update the explanatory comments describing
EngineDispatchAction to include a bullet describing { action: "sleep";
durationMs: number } (what it does and what durationMs represents) so the docs
match the union type defined by EngineDispatchAction.
src/resources/extensions/gsd/tests/tool-naming.test.ts (1)

48-48: Avoid hardcoded total tool count in this assertion.

This passes now, but it’s brittle as registration evolves. Prefer deriving expected count from RENAME_MAP.length plus explicit non-paired tools.

Refactor suggestion
-assert.deepStrictEqual(pi.tools.length, 31, 'Should register exactly 31 tools (14 canonical + 14 aliases + 1 gate tool + 1 gsd_skip_slice + 1 gsd_register_external_wait)');
+const expectedToolCount = (RENAME_MAP.length * 2) + 3; // gate + gsd_skip_slice + gsd_register_external_wait
+assert.deepStrictEqual(
+  pi.tools.length,
+  expectedToolCount,
+  `Should register exactly ${expectedToolCount} tools`,
+);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/tests/tool-naming.test.ts` at line 48, The test
hardcodes the total tool count; instead compute the expected count dynamically
by using RENAME_MAP.length for the paired canonical+alias tools and then add the
explicit non-paired tools (the gate tool, 'gsd_skip_slice', and
'gsd_register_external_wait'), then assert pi.tools.length equals that computed
value; update the assertion in tool-naming.test.ts to reference RENAME_MAP and
the three named tools rather than the literal 31 to avoid brittleness.
src/resources/extensions/gsd/tests/external-wait-registration.test.ts (1)

313-379: Test the real registration implementation instead of a local clone.

simulateHandlerFlow() is already drifting from registerExternalWaitExecute in src/resources/extensions/gsd/bootstrap/db-tools.ts—it omits the dangerous-command checks, interval validation, resolveTasksDir() path resolution, and the real response shaping. That means this suite can stay green while the actual tool breaks. Please extract the handler into an exported helper or drive it through the registered tool surface.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/tests/external-wait-registration.test.ts` around
lines 313 - 379, simulateHandlerFlow is a diverging local clone of
registerExternalWaitExecute (in
src/resources/extensions/gsd/bootstrap/db-tools.ts) missing dangerous-command
checks, interval validation, resolveTasksDir path resolution and real response
shaping; replace the local duplicate by calling the real implementation (or
extract the shared logic into an exported helper and import it in the test) so
the test exercises registerExternalWaitExecute’s actual behavior, ensure the
helper/used function performs the dangerous-command validation, interval and
timeout validation, uses resolveTasksDir (instead of join(..., ".gsd")), and
returns the same shaped response as registerExternalWaitExecute so the test
fails when the real tool breaks.
src/resources/extensions/gsd/tests/external-wait-resume.test.ts (1)

430-437: Avoid a hard node:sqlite dependency in this failure-count setup.

This one-off mutation bypasses gsd-db.ts and only works when the built-in driver is available. Please move it behind the same provider-agnostic helper pattern as the rest of the fixture code so the suite still runs under the better-sqlite3 fallback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/tests/external-wait-resume.test.ts` around lines
430 - 437, The test currently imports node:sqlite and calls DatabaseSync to
mutate external_waits.probe_failure_count (rawDb.prepare(...).run()), which
breaks when the built-in driver is swapped; replace that direct DatabaseSync
usage with the same provider-agnostic DB helper used by the other fixtures (the
helper around gsd-db.ts) to perform the UPDATE for milestone_id 'M001', slice_id
'S01', task_id 'T01' and set probe_failure_count = 2; locate the one-off
mutation (rawDb.prepare(...).run()) and call the shared helper method instead
(the helper that exposes a run/prepare-style API used elsewhere in the test
suite) so the change works with better-sqlite3 fallback.
src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts (1)

161-202: Build this fixture through gsd-db.ts instead of raw node:sqlite.

insertExternalWait() exists now, so this helper is duplicating production SQL and hard-wiring the suite to the built-in driver. That makes the test drift-prone and can fail anywhere the DB layer is using the better-sqlite3 fallback instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts`
around lines 161 - 202, The test helper inserts rows using raw node:sqlite via
getRawDb and insertExternalWaitRow which duplicates SQL and ties tests to the
built-in driver; replace calls to insertExternalWaitRow with the public
insertExternalWait function from gsd-db.ts (import it in the test) and remove
raw DB access (getRawDb/DatabaseSync usage) so the fixture is built through the
production DB API and respects any driver/fallback behavior; ensure you pass the
same opts (milestoneId, sliceId, taskId, pollWhileCommand, pollIntervalMs,
timeoutMs, probeFailureCount) to insertExternalWait so the test data remains
identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/resources/extensions/gsd/auto-dispatch.ts`:
- Line 767: The probe audit currently writes raw command and output objects to
disk (the appendFileSync call that writes JSON containing registeredAt,
timeoutMs, onTimeout), which can persist secrets; replace this by
sanitizing/redacting any secret-bearing arguments and stdout before logging:
implement a small helper (e.g., redactSensitiveArgs or sanitizeForLog) that
masks tokens, passwords, signed URLs and long-looking secrets in
strings/arrays/objects, call it on onTimeout (and any command/output fields)
before building the JSON, and use the same helper for the analogous registration
logging in the db-tools registration code where command/output are persisted;
ensure the log still contains timestamps and non-sensitive metadata
(registeredAt, timeoutMs) but never raw command or output with secrets.
- Around line 808-815: The exec callback currently treats any err from exec as a
"job finished" result; detect shell/setup/infrastructure failures (e.g. err.code
=== 127, 126, or other non-probe semantics like command not found or permission
denied) and do NOT resolve as a successful probe completion. Instead return a
marker (or reject) that indicates an infrastructure probe failure so the
surrounding probe loop (the code handling pollWhileCommand, probeTimeoutMs,
probeShell and the logic between the exec result handling and the
failure-count/resume logic) increments the probe-failure path rather than
treating it as job-done; update the exec callback to inspect (err as any).code
and (err as any).killed and branch: legitimate probe exit -> resolve({ exitCode:
0,... }) or resolve with exitCode from process, but infrastructure errors
(127,126, command-not-found, shell syntax) should be surfaced as failure (e.g.
resolve({ infraFailure: true, exitCode: (err as any).code, stdout })) so the
downstream logic can increment failures instead of resuming the task.

In `@src/resources/extensions/gsd/auto/loop.ts`:
- Around line 435-443: The sleep branch exits the turn loop with continue and
never calls finishTurn(...), leaving the turn lifecycle open; after the chunked
sleep loop (inside the if (dispatch.action === "sleep") block) call and await
finishTurn(...) with the same context used elsewhere (e.g., await finishTurn(s,
dispatch) or the equivalent signature used in this file) before continuing,
ensuring you only continue if finishTurn completes and preserving use of
s.active for responsiveness.

In `@src/resources/extensions/gsd/bootstrap/db-tools.ts`:
- Around line 1181-1189: The runtime guards for pollIntervalMs and timeoutMs
enforce a 1ms minimum but the contract requires a 1s minimum; update the
validation in the function containing pollIntervalMs/timeoutMs to require >=
1000 (ms) instead of >= 1, update the error messages to say "must be at least
1000 ms (1s)" and keep the comparison between timeoutMs and pollIntervalMs but
ensure it compares using the same 1000ms floor; also apply the same change to
the other validation block that handles pollIntervalMs/timeoutMs (the second
occurrence referenced in the diff) so both runtime guards and messaging match
the documented 1-second minimum.
- Around line 1222-1236: The current registration isn't atomic: if
insertExternalWait(s) succeeds but updateTaskStatus(...) throws, the catch only
unlinks jsonPath and leaves the external_waits row orphaned; fix by wrapping the
two DB operations in a single transaction (begin/commit/rollback) so both
insertExternalWait and updateTaskStatus execute atomically, or if transactions
aren't available, delete the inserted external_wait row inside the catch before
unlinking the JSON file (use the same identifiers used by insertExternalWait to
find the row) and ensure rollback/cleanup logic runs for any thrown dbErr to
preserve the task status + DB row + JSON spec invariant.

In `@src/resources/extensions/gsd/gsd-db.ts`:
- Around line 560-579: The external_waits table rows are not being included when
snapshotting or merging task state, causing tasks with status
'awaiting-external' to lose their probe row after restore/merge; update the
snapshot/merge logic in the same-file flows that handle task state—specifically
restoreManifest(), reconcileWorktreeDb(), and any functions that
serialize/deserialize tasks—to persist and rehydrate external_waits rows
alongside the tasks table so that PRIMARY KEY (milestone_id, slice_id, task_id)
entries from external_waits are written to the DB during restore/merge and read
back on reconcile, ensuring resolveDispatch() will find the probe row and resume
polling.

In `@src/resources/extensions/gsd/tests/external-wait-e2e.test.ts`:
- Around line 206-275: The POSIX-only stateful probe test ("stateful POSIX
probe: exit 0 twice...") creates and executes a shell script (probeScript, uses
chmodSync and shebang, writes counterFile) and sets pollWhileCommand to a POSIX
command, which will fail on Windows; either skip this test on Windows by
detecting process.platform === 'win32' and calling test.skip() at the top of the
test, or replace the shell probe with a platform-portable Node-based probe
(e.g., a small JS script or a cross-platform command) and update usages of
pollWhileCommand, probeScript, chmodSync, and counterFile accordingly so CI
Windows runners don't execute POSIX-only commands.

In `@src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts`:
- Around line 397-408: Replace platform-specific "sleep 35" probes with a
portable long-running command so the test runs on Windows and POSIX: update the
two occurrences (in the test "probe timeout increments failure count" where
insertExternalWaitRow sets pollWhileCommand and where writeProbeSpec writes the
probe) to use a Node-based one-liner launched via the test runtime (i.e., use
process.execPath to run a short JS that waits ~35s) instead of "sleep 35"; apply
the same change to the other test block around lines 428-439 that uses "sleep
35".

---

Nitpick comments:
In `@src/resources/extensions/gsd/auto-dashboard.ts`:
- Around line 186-187: The "awaiting-external" case builds a label using tid and
tTitle which can be undefined if activeTask is missing; update the label
construction in the case for "awaiting-external" to use null-safe fallbacks
(e.g., const tidSafe = activeTask?.id ?? "unknown"; const tTitleSafe =
activeTask?.title ?? "untitled") or inline optional chaining with nullish
coalescing so the returned label never reads "undefined: undefined" and instead
shows a sensible default when activeTask, tid, or tTitle are absent.

In `@src/resources/extensions/gsd/engine-types.ts`:
- Around line 45-46: The documentation bullets above the EngineDispatchAction
union are missing the newly added "sleep" action; update the explanatory
comments describing EngineDispatchAction to include a bullet describing {
action: "sleep"; durationMs: number } (what it does and what durationMs
represents) so the docs match the union type defined by EngineDispatchAction.

In `@src/resources/extensions/gsd/tests/external-wait-registration.test.ts`:
- Around line 313-379: simulateHandlerFlow is a diverging local clone of
registerExternalWaitExecute (in
src/resources/extensions/gsd/bootstrap/db-tools.ts) missing dangerous-command
checks, interval validation, resolveTasksDir path resolution and real response
shaping; replace the local duplicate by calling the real implementation (or
extract the shared logic into an exported helper and import it in the test) so
the test exercises registerExternalWaitExecute’s actual behavior, ensure the
helper/used function performs the dangerous-command validation, interval and
timeout validation, uses resolveTasksDir (instead of join(..., ".gsd")), and
returns the same shaped response as registerExternalWaitExecute so the test
fails when the real tool breaks.

In `@src/resources/extensions/gsd/tests/external-wait-resume.test.ts`:
- Around line 430-437: The test currently imports node:sqlite and calls
DatabaseSync to mutate external_waits.probe_failure_count
(rawDb.prepare(...).run()), which breaks when the built-in driver is swapped;
replace that direct DatabaseSync usage with the same provider-agnostic DB helper
used by the other fixtures (the helper around gsd-db.ts) to perform the UPDATE
for milestone_id 'M001', slice_id 'S01', task_id 'T01' and set
probe_failure_count = 2; locate the one-off mutation (rawDb.prepare(...).run())
and call the shared helper method instead (the helper that exposes a
run/prepare-style API used elsewhere in the test suite) so the change works with
better-sqlite3 fallback.

In `@src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts`:
- Around line 161-202: The test helper inserts rows using raw node:sqlite via
getRawDb and insertExternalWaitRow which duplicates SQL and ties tests to the
built-in driver; replace calls to insertExternalWaitRow with the public
insertExternalWait function from gsd-db.ts (import it in the test) and remove
raw DB access (getRawDb/DatabaseSync usage) so the fixture is built through the
production DB API and respects any driver/fallback behavior; ensure you pass the
same opts (milestoneId, sliceId, taskId, pollWhileCommand, pollIntervalMs,
timeoutMs, probeFailureCount) to insertExternalWait so the test data remains
identical.

In `@src/resources/extensions/gsd/tests/tool-naming.test.ts`:
- Line 48: The test hardcodes the total tool count; instead compute the expected
count dynamically by using RENAME_MAP.length for the paired canonical+alias
tools and then add the explicit non-paired tools (the gate tool,
'gsd_skip_slice', and 'gsd_register_external_wait'), then assert pi.tools.length
equals that computed value; update the assertion in tool-naming.test.ts to
reference RENAME_MAP and the three named tools rather than the literal 31 to
avoid brittleness.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 058c9ded-76ee-4776-8392-f1fc5f178a8f

📥 Commits

Reviewing files that changed from the base of the PR and between 58d3d4d and 12ac93d.

📒 Files selected for processing (25)
  • src/resources/extensions/gsd/auto-dashboard.ts
  • src/resources/extensions/gsd/auto-dispatch.ts
  • src/resources/extensions/gsd/auto/loop.ts
  • src/resources/extensions/gsd/auto/phases.ts
  • src/resources/extensions/gsd/auto/session.ts
  • src/resources/extensions/gsd/bootstrap/db-tools.ts
  • src/resources/extensions/gsd/dev-workflow-engine.ts
  • src/resources/extensions/gsd/engine-types.ts
  • src/resources/extensions/gsd/gsd-db.ts
  • src/resources/extensions/gsd/journal.ts
  • src/resources/extensions/gsd/state.ts
  • src/resources/extensions/gsd/tests/complete-slice.test.ts
  • src/resources/extensions/gsd/tests/complete-task.test.ts
  • src/resources/extensions/gsd/tests/ensure-db-open.test.ts
  • src/resources/extensions/gsd/tests/escalation.test.ts
  • src/resources/extensions/gsd/tests/external-wait-e2e.test.ts
  • src/resources/extensions/gsd/tests/external-wait-registration.test.ts
  • src/resources/extensions/gsd/tests/external-wait-resume.test.ts
  • src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts
  • src/resources/extensions/gsd/tests/gsd-db.test.ts
  • src/resources/extensions/gsd/tests/md-importer.test.ts
  • src/resources/extensions/gsd/tests/memory-store.test.ts
  • src/resources/extensions/gsd/tests/schema-v23-external-waits.test.ts
  • src/resources/extensions/gsd/tests/tool-naming.test.ts
  • src/resources/extensions/gsd/types.ts

if (Date.now() > Date.parse(registeredAt) + timeoutMs) {
const onTimeout = (waitRow.on_timeout as string) || "manual-attention";
updateExternalWaitStatus(mid, sid, tid, "timed-out");
try { appendFileSync(logPath, JSON.stringify({ ts: new Date().toISOString(), event: "timeout", registeredAt, timeoutMs, onTimeout }) + "\n"); } catch (logErr) { logWarning("dispatch", `Failed to write external wait log: ${logErr instanceof Error ? logErr.message : String(logErr)}`); }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Probe audit logs should redact secret-bearing arguments and output.

These log entries persist the raw shell command and captured output to disk on every probe cycle. Any embedded token, password, signed URL, or sensitive identifier in the command line or stdout becomes durable plaintext under .gsd, and the same pattern is also used at registration time in src/resources/extensions/gsd/bootstrap/db-tools.ts Lines 1178-1179.

Also applies to: 819-820, 857-857

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/auto-dispatch.ts` at line 767, The probe audit
currently writes raw command and output objects to disk (the appendFileSync call
that writes JSON containing registeredAt, timeoutMs, onTimeout), which can
persist secrets; replace this by sanitizing/redacting any secret-bearing
arguments and stdout before logging: implement a small helper (e.g.,
redactSensitiveArgs or sanitizeForLog) that masks tokens, passwords, signed URLs
and long-looking secrets in strings/arrays/objects, call it on onTimeout (and
any command/output fields) before building the JSON, and use the same helper for
the analogous registration logging in the db-tools registration code where
command/output are persisted; ensure the log still contains timestamps and
non-sensitive metadata (registeredAt, timeoutMs) but never raw command or output
with secrets.

Comment on lines +808 to +815
const { exitCode, killed, stdout } = await new Promise<{ exitCode: number | null; killed: boolean; stdout: string }>((resolve) => {
exec(pollWhileCommand, { timeout: probeTimeoutMs, shell: probeShell }, (err, stdout, _stderr) => {
if (err) {
resolve({ exitCode: (err as any).code ?? null, killed: !!(err as any).killed, stdout: stdout || "" });
} else {
resolve({ exitCode: 0, killed: false, stdout: stdout || "" });
}
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't treat shell/setup failures as "job finished".

exec() reports non-zero exits through err, so command-not-found (127), permission denied (126), and shell syntax errors land in the same branch as the legitimate "done" signal. Right now those cases fall through to Line 843 and resume the task as if the external job completed. Count infrastructure/shell failures as probe failures instead of resolving the wait.

#!/bin/bash
set -euo pipefail

node - <<'NODE'
const { exec } = require('node:child_process');

const cases = [
  'does_not_exist_12345',
  'exit 1'
];

let pending = cases.length;
for (const cmd of cases) {
  exec(cmd, (err, stdout, stderr) => {
    console.log(JSON.stringify({
      cmd,
      hasErr: !!err,
      code: err && err.code,
      killed: !!(err && err.killed),
      stderr: String(stderr).trim()
    }));
    if (--pending === 0) process.exit(0);
  });
}
NODE

Expected result: both commands arrive via err, but only the semantic probe failure should mean "done". Shell/setup failures should stay on the failure-count path instead.

Also applies to: 843-883

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/auto-dispatch.ts` around lines 808 - 815, The
exec callback currently treats any err from exec as a "job finished" result;
detect shell/setup/infrastructure failures (e.g. err.code === 127, 126, or other
non-probe semantics like command not found or permission denied) and do NOT
resolve as a successful probe completion. Instead return a marker (or reject)
that indicates an infrastructure probe failure so the surrounding probe loop
(the code handling pollWhileCommand, probeTimeoutMs, probeShell and the logic
between the exec result handling and the failure-count/resume logic) increments
the probe-failure path rather than treating it as job-done; update the exec
callback to inspect (err as any).code and (err as any).killed and branch:
legitimate probe exit -> resolve({ exitCode: 0,... }) or resolve with exitCode
from process, but infrastructure errors (127,126, command-not-found, shell
syntax) should be surfaced as failure (e.g. resolve({ infraFailure: true,
exitCode: (err as any).code, stdout })) so the downstream logic can increment
failures instead of resuming the task.

Comment on lines +435 to +443
if (dispatch.action === "sleep") {
// Chunked sleep: poll s.active every 1s so pause/stop is responsive
const sleepMs = dispatch.durationMs;
const start = Date.now();
while (Date.now() - start < sleepMs && s.active) {
await new Promise(r => setTimeout(r, Math.min(1000, sleepMs - (Date.now() - start))));
}
continue;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Finalize turn before continuing from sleep dispatch.

This branch skips finishTurn(...) and immediately continues, which leaves the turn lifecycle unmatched for observer/audit accounting.

Proposed fix
         if (dispatch.action === "sleep") {
           // Chunked sleep: poll s.active every 1s so pause/stop is responsive
           const sleepMs = dispatch.durationMs;
           const start = Date.now();
           while (Date.now() - start < sleepMs && s.active) {
             await new Promise(r => setTimeout(r, Math.min(1000, sleepMs - (Date.now() - start))));
           }
+          finishTurn("skipped");
           continue;
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (dispatch.action === "sleep") {
// Chunked sleep: poll s.active every 1s so pause/stop is responsive
const sleepMs = dispatch.durationMs;
const start = Date.now();
while (Date.now() - start < sleepMs && s.active) {
await new Promise(r => setTimeout(r, Math.min(1000, sleepMs - (Date.now() - start))));
}
continue;
}
if (dispatch.action === "sleep") {
// Chunked sleep: poll s.active every 1s so pause/stop is responsive
const sleepMs = dispatch.durationMs;
const start = Date.now();
while (Date.now() - start < sleepMs && s.active) {
await new Promise(r => setTimeout(r, Math.min(1000, sleepMs - (Date.now() - start))));
}
finishTurn("skipped");
continue;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/auto/loop.ts` around lines 435 - 443, The sleep
branch exits the turn loop with continue and never calls finishTurn(...),
leaving the turn lifecycle open; after the chunked sleep loop (inside the if
(dispatch.action === "sleep") block) call and await finishTurn(...) with the
same context used elsewhere (e.g., await finishTurn(s, dispatch) or the
equivalent signature used in this file) before continuing, ensuring you only
continue if finishTurn completes and preserving use of s.active for
responsiveness.

Comment on lines +1181 to +1189
if (pollIntervalMs !== undefined && (!Number.isInteger(pollIntervalMs) || pollIntervalMs < 1)) {
return { content: [{ type: "text" as const, text: "Error: pollIntervalMs must be a positive integer" }], isError: true, details: { error: "invalid pollIntervalMs" } as any };
}
if (timeoutMs !== undefined && (!Number.isInteger(timeoutMs) || timeoutMs < 1)) {
return { content: [{ type: "text" as const, text: "Error: timeoutMs must be a positive integer" }], isError: true, details: { error: "invalid timeoutMs" } as any };
}
if (pollIntervalMs !== undefined && timeoutMs !== undefined && timeoutMs < pollIntervalMs) {
return { content: [{ type: "text" as const, text: "Error: timeoutMs must be >= pollIntervalMs" }], isError: true, details: { error: "timeoutMs < pollIntervalMs" } as any };
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Enforce the documented 1-second minimum here, not 1ms.

The new contract says pollIntervalMs has a 1s floor, but both the runtime guard and the TypeBox schema accept 1. Those values will be persisted and only later clamped by dispatch, which makes the tool contract misleading and leaves invalid JSON/DB state behind.

Also applies to: 1272-1273

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/bootstrap/db-tools.ts` around lines 1181 - 1189,
The runtime guards for pollIntervalMs and timeoutMs enforce a 1ms minimum but
the contract requires a 1s minimum; update the validation in the function
containing pollIntervalMs/timeoutMs to require >= 1000 (ms) instead of >= 1,
update the error messages to say "must be at least 1000 ms (1s)" and keep the
comparison between timeoutMs and pollIntervalMs but ensure it compares using the
same 1000ms floor; also apply the same change to the other validation block that
handles pollIntervalMs/timeoutMs (the second occurrence referenced in the diff)
so both runtime guards and messaging match the documented 1-second minimum.

Comment on lines +1222 to +1236
// Insert external_waits DB row and update task status (R213, R214, R223)
try {
insertExternalWait(milestoneId, sliceId, taskId, trimmedCommand, {
successCheck,
pollIntervalMs,
timeoutMs,
contextHint,
onTimeout,
});
updateTaskStatus(milestoneId, sliceId, taskId, "awaiting-external");
} catch (dbErr) {
// Cleanup the JSON file to avoid partial state
try { unlinkSync(jsonPath); } catch (cleanupErr) { logError("db", `Failed to clean up probe spec ${jsonPath}: ${cleanupErr instanceof Error ? cleanupErr.message : String(cleanupErr)}`); }
return { content: [{ type: "text" as const, text: `Error: DB update failed — ${dbErr instanceof Error ? dbErr.message : String(dbErr)}` }], isError: true, details: { error: "db update failed" } as any };
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The registration path is not actually atomic.

If insertExternalWait() succeeds and updateTaskStatus() throws, the catch only deletes the JSON file. The external_waits row remains committed, so the task stays executing while a stray wait record says waiting. Wrap both DB mutations in one transaction, or explicitly delete the inserted wait row on rollback, so the task status + DB row + JSON spec invariant is preserved.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/bootstrap/db-tools.ts` around lines 1222 - 1236,
The current registration isn't atomic: if insertExternalWait(s) succeeds but
updateTaskStatus(...) throws, the catch only unlinks jsonPath and leaves the
external_waits row orphaned; fix by wrapping the two DB operations in a single
transaction (begin/commit/rollback) so both insertExternalWait and
updateTaskStatus execute atomically, or if transactions aren't available, delete
the inserted external_wait row inside the catch before unlinking the JSON file
(use the same identifiers used by insertExternalWait to find the row) and ensure
rollback/cleanup logic runs for any thrown dbErr to preserve the task status +
DB row + JSON spec invariant.

Comment on lines +560 to +579
db.exec(`
CREATE TABLE IF NOT EXISTS external_waits (
milestone_id TEXT NOT NULL,
slice_id TEXT NOT NULL,
task_id TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'waiting',
poll_while_command TEXT NOT NULL,
success_check TEXT,
poll_interval_ms INTEGER NOT NULL DEFAULT 30000,
timeout_ms INTEGER NOT NULL DEFAULT 86400000,
context_hint TEXT,
on_timeout TEXT NOT NULL DEFAULT 'manual-attention',
probe_failure_count INTEGER NOT NULL DEFAULT 0,
registered_at TEXT NOT NULL,
resolved_at TEXT,
PRIMARY KEY (milestone_id, slice_id, task_id),
FOREIGN KEY (milestone_id, slice_id, task_id) REFERENCES tasks(milestone_id, slice_id, id) ON DELETE CASCADE
)
`);
db.exec("CREATE INDEX IF NOT EXISTS idx_external_waits_milestone_status ON external_waits(milestone_id, status)");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Persist external_waits anywhere task state is snapshotted or merged.

This table is now part of runnable workflow state, but same-file flows like restoreManifest() and reconcileWorktreeDb() still do not carry it. That leaves you with tasks.status = 'awaiting-external' but no probe row after a restore/merge, and resolveDispatch() turns that into a hard stop/manual-attention path instead of resuming polling.

Also applies to: 1256-1283, 2678-2762

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/gsd-db.ts` around lines 560 - 579, The
external_waits table rows are not being included when snapshotting or merging
task state, causing tasks with status 'awaiting-external' to lose their probe
row after restore/merge; update the snapshot/merge logic in the same-file flows
that handle task state—specifically restoreManifest(), reconcileWorktreeDb(),
and any functions that serialize/deserialize tasks—to persist and rehydrate
external_waits rows alongside the tasks table so that PRIMARY KEY (milestone_id,
slice_id, task_id) entries from external_waits are written to the DB during
restore/merge and read back on reconcile, ensuring resolveDispatch() will find
the probe row and resume polling.

Comment on lines +206 to +275
test("stateful POSIX probe: exit 0 twice (still running), exit 1 third (done) → resume with contextHint", async () => {
const tmpBase = mkdtempSync(join(tmpdir(), "gsd-e2e-multi-"));
base = tmpBase;
const gsdDir = join(tmpBase, ".gsd");
const m001Dir = join(gsdDir, "milestones", "M001");
const s01Dir = join(m001Dir, "slices", "S01");
const tasksDir = join(s01Dir, "tasks");

mkdirSync(tasksDir, { recursive: true });

writeFileSync(
join(m001Dir, "M001-CONTEXT.md"),
"# M001: Multi-cycle\n\n## Purpose\nMulti-cycle probe test.\n",
);
writeFileSync(
join(m001Dir, "M001-ROADMAP.md"),
[
"# M001: Multi-cycle",
"",
"## Vision",
"Multi-cycle probe.",
"",
"## Success Criteria",
"- Multi-cycle works",
"",
"## Slices",
"",
"- [ ] **S01: Test** `risk:low` `depends:[]`",
" - After this: done.",
"",
"## Boundary Map",
"",
"| From | To | Produces | Consumes |",
"|------|----|----------|----------|",
"| S01 | terminal | result | nothing |",
].join("\n"),
);
writeFileSync(
join(s01Dir, "S01-PLAN.md"),
[
"# S01: Test",
"",
"**Goal:** multi-cycle probe",
"",
"## Tasks",
"",
"- [ ] **T01: Test** `est:30m`",
" - Do: test",
" - Verify: pass",
].join("\n"),
);
writeFileSync(join(tasksDir, "T01-PLAN.md"), "# T01: Test\n\n## Steps\n1. test\n");

// Create POSIX-portable stateful probe script
const counterFile = join(tmpBase, "counter.txt");
const probeScript = join(tmpBase, "probe.sh");
writeFileSync(
probeScript,
[
'#!/bin/sh',
'COUNT=$(cat "$1" 2>/dev/null || echo 0)',
'COUNT=$(expr $COUNT + 1)',
'echo $COUNT > "$1"',
'if [ $COUNT -ge 3 ]; then exit 1; fi',
'exit 0',
].join("\n"),
);
chmodSync(probeScript, 0o755);

const pollWhileCommand = `${probeScript} ${counterFile}`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

if [ -d .github/workflows ]; then
  fd -HI '.*\.ya?ml' .github/workflows -x rg -nC2 'windows-latest|windows-2022|runs-on:.*windows' {}
fi

Repository: gsd-build/gsd-2

Length of output: 445


🏁 Script executed:

# Get the test file size and then read the specific line ranges
wc -l src/resources/extensions/gsd/tests/external-wait-e2e.test.ts

# Read the line ranges mentioned in the review (206-275, 343-504, 522-630)
echo "=== Lines 206-275 ==="
sed -n '206,275p' src/resources/extensions/gsd/tests/external-wait-e2e.test.ts

echo "=== Lines 343-504 ==="
sed -n '343,504p' src/resources/extensions/gsd/tests/external-wait-e2e.test.ts

echo "=== Lines 522-630 ==="
sed -n '522,630p' src/resources/extensions/gsd/tests/external-wait-e2e.test.ts

Repository: gsd-build/gsd-2

Length of output: 12326


🏁 Script executed:

# Check for any platform detection or skip logic in the test file
rg -n "process\.platform|skip|windows|darwin|linux" src/resources/extensions/gsd/tests/external-wait-e2e.test.ts

Repository: gsd-build/gsd-2

Length of output: 683


Skip these POSIX-specific test scenarios on Windows or make them platform-portable.

These test suites hardcode POSIX commands (sleep, exit N, shell scripts with #!/bin/sh) that will fail on Windows runners in your CI matrix (windows-latest, windows-2025).

  • Lines 206–275: Creates a shell script with POSIX syntax (#!/bin/sh, chmodSync) and invokes it directly.
  • Lines 343–504: Passes "sleep 35" and bare exit commands as pollWhileCommand.
  • Lines 522–630: Uses "exit 0" as pollWhileCommand.

While the feature advertises cmd.exe support, these tests will fail before exercising the dispatcher. Add process.platform === 'win32' checks with test.skip() or refactor the probes to be OS-portable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/tests/external-wait-e2e.test.ts` around lines
206 - 275, The POSIX-only stateful probe test ("stateful POSIX probe: exit 0
twice...") creates and executes a shell script (probeScript, uses chmodSync and
shebang, writes counterFile) and sets pollWhileCommand to a POSIX command, which
will fail on Windows; either skip this test on Windows by detecting
process.platform === 'win32' and calling test.skip() at the top of the test, or
replace the shell probe with a platform-portable Node-based probe (e.g., a small
JS script or a cross-platform command) and update usages of pollWhileCommand,
probeScript, chmodSync, and counterFile accordingly so CI Windows runners don't
execute POSIX-only commands.

Comment on lines +397 to +408
test("probe timeout increments failure count", { timeout: 40000, skip: skipSlow }, async () => {
const { basePath } = createFixture();
updateTaskStatus("M001", "S01", "T01", "awaiting-external");
insertExternalWaitRow(basePath, {
milestoneId: "M001",
sliceId: "S01",
taskId: "T01",
pollWhileCommand: "sleep 35",
pollIntervalMs: 10000,
probeFailureCount: 0,
});
writeProbeSpec(basePath, "M001", "S01", "T01", "sleep 35");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a portable long-running probe for the timeout cases.

sleep 35 only works on POSIX shells. These timeout assertions will fail on the Windows cmd.exe path that the dispatcher explicitly supports, so the coverage is currently platform-specific.

Portable test pattern
+  const longRunningProbe = `"${process.execPath}" -e "setTimeout(() => process.exit(0), 35000)"`;
+
   test("probe timeout increments failure count", { timeout: 40000, skip: skipSlow }, async () => {
     const { basePath } = createFixture();
     updateTaskStatus("M001", "S01", "T01", "awaiting-external");
     insertExternalWaitRow(basePath, {
       milestoneId: "M001",
       sliceId: "S01",
       taskId: "T01",
-      pollWhileCommand: "sleep 35",
+      pollWhileCommand: longRunningProbe,
       pollIntervalMs: 10000,
       probeFailureCount: 0,
     });
-    writeProbeSpec(basePath, "M001", "S01", "T01", "sleep 35");
+    writeProbeSpec(basePath, "M001", "S01", "T01", longRunningProbe);

Also applies to: 428-439

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts`
around lines 397 - 408, Replace platform-specific "sleep 35" probes with a
portable long-running command so the test runs on Windows and POSIX: update the
two occurrences (in the test "probe timeout increments failure count" where
insertExternalWaitRow sets pollWhileCommand and where writeProbeSpec writes the
probe) to use a Node-based one-liner launched via the test runtime (i.e., use
process.execPath to run a short JS that waits ~35s) instead of "sleep 35"; apply
the same change to the other test block around lines 428-439 that uses "sleep
35".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request High Priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant