feat(auto): implement external wait dispatch, polling, and tool#4792
feat(auto): implement external wait dispatch, polling, and tool#4792OfficialDelta wants to merge 2 commits intogsd-build:mainfrom
Conversation
Schema v23 migration adds `external_waits` table with indexed lookup by milestone + status. New `awaiting-external` phase type, `sleep` dispatch action, journal event types, and state derivation logic enable the auto-loop to recognize and handle tasks waiting on external processes. Includes session state additions, dashboard display labels, and phase carry-forward logic for resuming after external completion. Co-Authored-By: Claude <noreply@anthropic.com>
Dispatch rule probes external processes via pollWhileCommand (exit 0 = still running, non-zero = done). Implements two-phase checking with optional successCheck, failure counting with 3-strike escalation to manual-attention, per-probe timeout, and configurable poll intervals. Adds gsd_register_external_wait tool with pattern-based rejection of dangerous commands, atomic registration with task status transition, and JSON probe spec persistence. Every registration and probe invocation is logged with full command text for audit trail. Co-Authored-By: Claude <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis change introduces a new external-wait system allowing tasks to pause execution and poll external processes. It adds an Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Tool as gsd_register_external_wait Tool
participant DB as External Wait DB
participant DispatchEngine as Auto-Dispatch Engine
participant ExternalProc as External Process
participant TaskState as Task Status & Session
User->>Tool: Register external wait (pollWhileCommand, timeoutMs)
Tool->>DB: Validate task status = executing
Tool->>DB: Insert external_wait record
Tool->>DB: Transition task → awaiting-external
Tool-->>User: Success + JSON probe spec path
Note over DispatchEngine: Polling Loop
loop Poll until timeout or success
DispatchEngine->>DB: Detect awaiting-external phase
DispatchEngine->>ExternalProc: Execute poll_while_command
alt Command succeeds (exit 0)
ExternalProc-->>DispatchEngine: exit 0
DispatchEngine->>DB: Mark external_wait resolved
DispatchEngine->>DB: Transition task → executing
DispatchEngine->>TaskState: Set pendingExternalResume with result
else Command fails or times out
ExternalProc-->>DispatchEngine: non-zero exit or timeout
DispatchEngine->>DB: Increment probe_failure_count
alt Failure count ≥ 3
DispatchEngine->>DB: Mark task → manual-attention
else Failure count < 3
DispatchEngine->>TaskState: Return sleep action
TaskState->>TaskState: Wait durationMs (interruptible)
end
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
🔴 PR Risk Report — CRITICAL
Affected Systems
File Breakdown
|
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (6)
src/resources/extensions/gsd/auto-dashboard.ts (1)
186-187: Add null-safe fallback for awaiting-external label text.If
activeTaskis missing, this can renderundefined: undefinedin the dashboard.Proposed guard for stable label rendering
case "awaiting-external": - return { label: `Awaiting external process for ${tid}: ${tTitle}`, description: "Task is waiting on an external job; probe will check status." }; + return { + label: tid + ? `Awaiting external process for ${tid}${tTitle ? `: ${tTitle}` : ""}` + : "Awaiting external process", + description: "Task is waiting on an external job; probe will check status.", + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/resources/extensions/gsd/auto-dashboard.ts` around lines 186 - 187, The "awaiting-external" case builds a label using tid and tTitle which can be undefined if activeTask is missing; update the label construction in the case for "awaiting-external" to use null-safe fallbacks (e.g., const tidSafe = activeTask?.id ?? "unknown"; const tTitleSafe = activeTask?.title ?? "untitled") or inline optional chaining with nullish coalescing so the returned label never reads "undefined: undefined" and instead shows a sensible default when activeTask, tid, or tTitle are absent.src/resources/extensions/gsd/engine-types.ts (1)
45-46: Update the union docs to mentionsleep.The type is correct, but the explanatory bullets above
EngineDispatchActionstill omit the new action.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/resources/extensions/gsd/engine-types.ts` around lines 45 - 46, The documentation bullets above the EngineDispatchAction union are missing the newly added "sleep" action; update the explanatory comments describing EngineDispatchAction to include a bullet describing { action: "sleep"; durationMs: number } (what it does and what durationMs represents) so the docs match the union type defined by EngineDispatchAction.src/resources/extensions/gsd/tests/tool-naming.test.ts (1)
48-48: Avoid hardcoded total tool count in this assertion.This passes now, but it’s brittle as registration evolves. Prefer deriving expected count from
RENAME_MAP.lengthplus explicit non-paired tools.Refactor suggestion
-assert.deepStrictEqual(pi.tools.length, 31, 'Should register exactly 31 tools (14 canonical + 14 aliases + 1 gate tool + 1 gsd_skip_slice + 1 gsd_register_external_wait)'); +const expectedToolCount = (RENAME_MAP.length * 2) + 3; // gate + gsd_skip_slice + gsd_register_external_wait +assert.deepStrictEqual( + pi.tools.length, + expectedToolCount, + `Should register exactly ${expectedToolCount} tools`, +);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/resources/extensions/gsd/tests/tool-naming.test.ts` at line 48, The test hardcodes the total tool count; instead compute the expected count dynamically by using RENAME_MAP.length for the paired canonical+alias tools and then add the explicit non-paired tools (the gate tool, 'gsd_skip_slice', and 'gsd_register_external_wait'), then assert pi.tools.length equals that computed value; update the assertion in tool-naming.test.ts to reference RENAME_MAP and the three named tools rather than the literal 31 to avoid brittleness.src/resources/extensions/gsd/tests/external-wait-registration.test.ts (1)
313-379: Test the real registration implementation instead of a local clone.
simulateHandlerFlow()is already drifting fromregisterExternalWaitExecuteinsrc/resources/extensions/gsd/bootstrap/db-tools.ts—it omits the dangerous-command checks, interval validation,resolveTasksDir()path resolution, and the real response shaping. That means this suite can stay green while the actual tool breaks. Please extract the handler into an exported helper or drive it through the registered tool surface.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/resources/extensions/gsd/tests/external-wait-registration.test.ts` around lines 313 - 379, simulateHandlerFlow is a diverging local clone of registerExternalWaitExecute (in src/resources/extensions/gsd/bootstrap/db-tools.ts) missing dangerous-command checks, interval validation, resolveTasksDir path resolution and real response shaping; replace the local duplicate by calling the real implementation (or extract the shared logic into an exported helper and import it in the test) so the test exercises registerExternalWaitExecute’s actual behavior, ensure the helper/used function performs the dangerous-command validation, interval and timeout validation, uses resolveTasksDir (instead of join(..., ".gsd")), and returns the same shaped response as registerExternalWaitExecute so the test fails when the real tool breaks.src/resources/extensions/gsd/tests/external-wait-resume.test.ts (1)
430-437: Avoid a hardnode:sqlitedependency in this failure-count setup.This one-off mutation bypasses
gsd-db.tsand only works when the built-in driver is available. Please move it behind the same provider-agnostic helper pattern as the rest of the fixture code so the suite still runs under thebetter-sqlite3fallback.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/resources/extensions/gsd/tests/external-wait-resume.test.ts` around lines 430 - 437, The test currently imports node:sqlite and calls DatabaseSync to mutate external_waits.probe_failure_count (rawDb.prepare(...).run()), which breaks when the built-in driver is swapped; replace that direct DatabaseSync usage with the same provider-agnostic DB helper used by the other fixtures (the helper around gsd-db.ts) to perform the UPDATE for milestone_id 'M001', slice_id 'S01', task_id 'T01' and set probe_failure_count = 2; locate the one-off mutation (rawDb.prepare(...).run()) and call the shared helper method instead (the helper that exposes a run/prepare-style API used elsewhere in the test suite) so the change works with better-sqlite3 fallback.src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts (1)
161-202: Build this fixture throughgsd-db.tsinstead of rawnode:sqlite.
insertExternalWait()exists now, so this helper is duplicating production SQL and hard-wiring the suite to the built-in driver. That makes the test drift-prone and can fail anywhere the DB layer is using thebetter-sqlite3fallback instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts` around lines 161 - 202, The test helper inserts rows using raw node:sqlite via getRawDb and insertExternalWaitRow which duplicates SQL and ties tests to the built-in driver; replace calls to insertExternalWaitRow with the public insertExternalWait function from gsd-db.ts (import it in the test) and remove raw DB access (getRawDb/DatabaseSync usage) so the fixture is built through the production DB API and respects any driver/fallback behavior; ensure you pass the same opts (milestoneId, sliceId, taskId, pollWhileCommand, pollIntervalMs, timeoutMs, probeFailureCount) to insertExternalWait so the test data remains identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/resources/extensions/gsd/auto-dispatch.ts`:
- Line 767: The probe audit currently writes raw command and output objects to
disk (the appendFileSync call that writes JSON containing registeredAt,
timeoutMs, onTimeout), which can persist secrets; replace this by
sanitizing/redacting any secret-bearing arguments and stdout before logging:
implement a small helper (e.g., redactSensitiveArgs or sanitizeForLog) that
masks tokens, passwords, signed URLs and long-looking secrets in
strings/arrays/objects, call it on onTimeout (and any command/output fields)
before building the JSON, and use the same helper for the analogous registration
logging in the db-tools registration code where command/output are persisted;
ensure the log still contains timestamps and non-sensitive metadata
(registeredAt, timeoutMs) but never raw command or output with secrets.
- Around line 808-815: The exec callback currently treats any err from exec as a
"job finished" result; detect shell/setup/infrastructure failures (e.g. err.code
=== 127, 126, or other non-probe semantics like command not found or permission
denied) and do NOT resolve as a successful probe completion. Instead return a
marker (or reject) that indicates an infrastructure probe failure so the
surrounding probe loop (the code handling pollWhileCommand, probeTimeoutMs,
probeShell and the logic between the exec result handling and the
failure-count/resume logic) increments the probe-failure path rather than
treating it as job-done; update the exec callback to inspect (err as any).code
and (err as any).killed and branch: legitimate probe exit -> resolve({ exitCode:
0,... }) or resolve with exitCode from process, but infrastructure errors
(127,126, command-not-found, shell syntax) should be surfaced as failure (e.g.
resolve({ infraFailure: true, exitCode: (err as any).code, stdout })) so the
downstream logic can increment failures instead of resuming the task.
In `@src/resources/extensions/gsd/auto/loop.ts`:
- Around line 435-443: The sleep branch exits the turn loop with continue and
never calls finishTurn(...), leaving the turn lifecycle open; after the chunked
sleep loop (inside the if (dispatch.action === "sleep") block) call and await
finishTurn(...) with the same context used elsewhere (e.g., await finishTurn(s,
dispatch) or the equivalent signature used in this file) before continuing,
ensuring you only continue if finishTurn completes and preserving use of
s.active for responsiveness.
In `@src/resources/extensions/gsd/bootstrap/db-tools.ts`:
- Around line 1181-1189: The runtime guards for pollIntervalMs and timeoutMs
enforce a 1ms minimum but the contract requires a 1s minimum; update the
validation in the function containing pollIntervalMs/timeoutMs to require >=
1000 (ms) instead of >= 1, update the error messages to say "must be at least
1000 ms (1s)" and keep the comparison between timeoutMs and pollIntervalMs but
ensure it compares using the same 1000ms floor; also apply the same change to
the other validation block that handles pollIntervalMs/timeoutMs (the second
occurrence referenced in the diff) so both runtime guards and messaging match
the documented 1-second minimum.
- Around line 1222-1236: The current registration isn't atomic: if
insertExternalWait(s) succeeds but updateTaskStatus(...) throws, the catch only
unlinks jsonPath and leaves the external_waits row orphaned; fix by wrapping the
two DB operations in a single transaction (begin/commit/rollback) so both
insertExternalWait and updateTaskStatus execute atomically, or if transactions
aren't available, delete the inserted external_wait row inside the catch before
unlinking the JSON file (use the same identifiers used by insertExternalWait to
find the row) and ensure rollback/cleanup logic runs for any thrown dbErr to
preserve the task status + DB row + JSON spec invariant.
In `@src/resources/extensions/gsd/gsd-db.ts`:
- Around line 560-579: The external_waits table rows are not being included when
snapshotting or merging task state, causing tasks with status
'awaiting-external' to lose their probe row after restore/merge; update the
snapshot/merge logic in the same-file flows that handle task state—specifically
restoreManifest(), reconcileWorktreeDb(), and any functions that
serialize/deserialize tasks—to persist and rehydrate external_waits rows
alongside the tasks table so that PRIMARY KEY (milestone_id, slice_id, task_id)
entries from external_waits are written to the DB during restore/merge and read
back on reconcile, ensuring resolveDispatch() will find the probe row and resume
polling.
In `@src/resources/extensions/gsd/tests/external-wait-e2e.test.ts`:
- Around line 206-275: The POSIX-only stateful probe test ("stateful POSIX
probe: exit 0 twice...") creates and executes a shell script (probeScript, uses
chmodSync and shebang, writes counterFile) and sets pollWhileCommand to a POSIX
command, which will fail on Windows; either skip this test on Windows by
detecting process.platform === 'win32' and calling test.skip() at the top of the
test, or replace the shell probe with a platform-portable Node-based probe
(e.g., a small JS script or a cross-platform command) and update usages of
pollWhileCommand, probeScript, chmodSync, and counterFile accordingly so CI
Windows runners don't execute POSIX-only commands.
In `@src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts`:
- Around line 397-408: Replace platform-specific "sleep 35" probes with a
portable long-running command so the test runs on Windows and POSIX: update the
two occurrences (in the test "probe timeout increments failure count" where
insertExternalWaitRow sets pollWhileCommand and where writeProbeSpec writes the
probe) to use a Node-based one-liner launched via the test runtime (i.e., use
process.execPath to run a short JS that waits ~35s) instead of "sleep 35"; apply
the same change to the other test block around lines 428-439 that uses "sleep
35".
---
Nitpick comments:
In `@src/resources/extensions/gsd/auto-dashboard.ts`:
- Around line 186-187: The "awaiting-external" case builds a label using tid and
tTitle which can be undefined if activeTask is missing; update the label
construction in the case for "awaiting-external" to use null-safe fallbacks
(e.g., const tidSafe = activeTask?.id ?? "unknown"; const tTitleSafe =
activeTask?.title ?? "untitled") or inline optional chaining with nullish
coalescing so the returned label never reads "undefined: undefined" and instead
shows a sensible default when activeTask, tid, or tTitle are absent.
In `@src/resources/extensions/gsd/engine-types.ts`:
- Around line 45-46: The documentation bullets above the EngineDispatchAction
union are missing the newly added "sleep" action; update the explanatory
comments describing EngineDispatchAction to include a bullet describing {
action: "sleep"; durationMs: number } (what it does and what durationMs
represents) so the docs match the union type defined by EngineDispatchAction.
In `@src/resources/extensions/gsd/tests/external-wait-registration.test.ts`:
- Around line 313-379: simulateHandlerFlow is a diverging local clone of
registerExternalWaitExecute (in
src/resources/extensions/gsd/bootstrap/db-tools.ts) missing dangerous-command
checks, interval validation, resolveTasksDir path resolution and real response
shaping; replace the local duplicate by calling the real implementation (or
extract the shared logic into an exported helper and import it in the test) so
the test exercises registerExternalWaitExecute’s actual behavior, ensure the
helper/used function performs the dangerous-command validation, interval and
timeout validation, uses resolveTasksDir (instead of join(..., ".gsd")), and
returns the same shaped response as registerExternalWaitExecute so the test
fails when the real tool breaks.
In `@src/resources/extensions/gsd/tests/external-wait-resume.test.ts`:
- Around line 430-437: The test currently imports node:sqlite and calls
DatabaseSync to mutate external_waits.probe_failure_count
(rawDb.prepare(...).run()), which breaks when the built-in driver is swapped;
replace that direct DatabaseSync usage with the same provider-agnostic DB helper
used by the other fixtures (the helper around gsd-db.ts) to perform the UPDATE
for milestone_id 'M001', slice_id 'S01', task_id 'T01' and set
probe_failure_count = 2; locate the one-off mutation (rawDb.prepare(...).run())
and call the shared helper method instead (the helper that exposes a
run/prepare-style API used elsewhere in the test suite) so the change works with
better-sqlite3 fallback.
In `@src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts`:
- Around line 161-202: The test helper inserts rows using raw node:sqlite via
getRawDb and insertExternalWaitRow which duplicates SQL and ties tests to the
built-in driver; replace calls to insertExternalWaitRow with the public
insertExternalWait function from gsd-db.ts (import it in the test) and remove
raw DB access (getRawDb/DatabaseSync usage) so the fixture is built through the
production DB API and respects any driver/fallback behavior; ensure you pass the
same opts (milestoneId, sliceId, taskId, pollWhileCommand, pollIntervalMs,
timeoutMs, probeFailureCount) to insertExternalWait so the test data remains
identical.
In `@src/resources/extensions/gsd/tests/tool-naming.test.ts`:
- Line 48: The test hardcodes the total tool count; instead compute the expected
count dynamically by using RENAME_MAP.length for the paired canonical+alias
tools and then add the explicit non-paired tools (the gate tool,
'gsd_skip_slice', and 'gsd_register_external_wait'), then assert pi.tools.length
equals that computed value; update the assertion in tool-naming.test.ts to
reference RENAME_MAP and the three named tools rather than the literal 31 to
avoid brittleness.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 058c9ded-76ee-4776-8392-f1fc5f178a8f
📒 Files selected for processing (25)
src/resources/extensions/gsd/auto-dashboard.tssrc/resources/extensions/gsd/auto-dispatch.tssrc/resources/extensions/gsd/auto/loop.tssrc/resources/extensions/gsd/auto/phases.tssrc/resources/extensions/gsd/auto/session.tssrc/resources/extensions/gsd/bootstrap/db-tools.tssrc/resources/extensions/gsd/dev-workflow-engine.tssrc/resources/extensions/gsd/engine-types.tssrc/resources/extensions/gsd/gsd-db.tssrc/resources/extensions/gsd/journal.tssrc/resources/extensions/gsd/state.tssrc/resources/extensions/gsd/tests/complete-slice.test.tssrc/resources/extensions/gsd/tests/complete-task.test.tssrc/resources/extensions/gsd/tests/ensure-db-open.test.tssrc/resources/extensions/gsd/tests/escalation.test.tssrc/resources/extensions/gsd/tests/external-wait-e2e.test.tssrc/resources/extensions/gsd/tests/external-wait-registration.test.tssrc/resources/extensions/gsd/tests/external-wait-resume.test.tssrc/resources/extensions/gsd/tests/external-wait-state-dispatch.test.tssrc/resources/extensions/gsd/tests/gsd-db.test.tssrc/resources/extensions/gsd/tests/md-importer.test.tssrc/resources/extensions/gsd/tests/memory-store.test.tssrc/resources/extensions/gsd/tests/schema-v23-external-waits.test.tssrc/resources/extensions/gsd/tests/tool-naming.test.tssrc/resources/extensions/gsd/types.ts
| if (Date.now() > Date.parse(registeredAt) + timeoutMs) { | ||
| const onTimeout = (waitRow.on_timeout as string) || "manual-attention"; | ||
| updateExternalWaitStatus(mid, sid, tid, "timed-out"); | ||
| try { appendFileSync(logPath, JSON.stringify({ ts: new Date().toISOString(), event: "timeout", registeredAt, timeoutMs, onTimeout }) + "\n"); } catch (logErr) { logWarning("dispatch", `Failed to write external wait log: ${logErr instanceof Error ? logErr.message : String(logErr)}`); } |
There was a problem hiding this comment.
Probe audit logs should redact secret-bearing arguments and output.
These log entries persist the raw shell command and captured output to disk on every probe cycle. Any embedded token, password, signed URL, or sensitive identifier in the command line or stdout becomes durable plaintext under .gsd, and the same pattern is also used at registration time in src/resources/extensions/gsd/bootstrap/db-tools.ts Lines 1178-1179.
Also applies to: 819-820, 857-857
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/resources/extensions/gsd/auto-dispatch.ts` at line 767, The probe audit
currently writes raw command and output objects to disk (the appendFileSync call
that writes JSON containing registeredAt, timeoutMs, onTimeout), which can
persist secrets; replace this by sanitizing/redacting any secret-bearing
arguments and stdout before logging: implement a small helper (e.g.,
redactSensitiveArgs or sanitizeForLog) that masks tokens, passwords, signed URLs
and long-looking secrets in strings/arrays/objects, call it on onTimeout (and
any command/output fields) before building the JSON, and use the same helper for
the analogous registration logging in the db-tools registration code where
command/output are persisted; ensure the log still contains timestamps and
non-sensitive metadata (registeredAt, timeoutMs) but never raw command or output
with secrets.
| const { exitCode, killed, stdout } = await new Promise<{ exitCode: number | null; killed: boolean; stdout: string }>((resolve) => { | ||
| exec(pollWhileCommand, { timeout: probeTimeoutMs, shell: probeShell }, (err, stdout, _stderr) => { | ||
| if (err) { | ||
| resolve({ exitCode: (err as any).code ?? null, killed: !!(err as any).killed, stdout: stdout || "" }); | ||
| } else { | ||
| resolve({ exitCode: 0, killed: false, stdout: stdout || "" }); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Don't treat shell/setup failures as "job finished".
exec() reports non-zero exits through err, so command-not-found (127), permission denied (126), and shell syntax errors land in the same branch as the legitimate "done" signal. Right now those cases fall through to Line 843 and resume the task as if the external job completed. Count infrastructure/shell failures as probe failures instead of resolving the wait.
#!/bin/bash
set -euo pipefail
node - <<'NODE'
const { exec } = require('node:child_process');
const cases = [
'does_not_exist_12345',
'exit 1'
];
let pending = cases.length;
for (const cmd of cases) {
exec(cmd, (err, stdout, stderr) => {
console.log(JSON.stringify({
cmd,
hasErr: !!err,
code: err && err.code,
killed: !!(err && err.killed),
stderr: String(stderr).trim()
}));
if (--pending === 0) process.exit(0);
});
}
NODEExpected result: both commands arrive via err, but only the semantic probe failure should mean "done". Shell/setup failures should stay on the failure-count path instead.
Also applies to: 843-883
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/resources/extensions/gsd/auto-dispatch.ts` around lines 808 - 815, The
exec callback currently treats any err from exec as a "job finished" result;
detect shell/setup/infrastructure failures (e.g. err.code === 127, 126, or other
non-probe semantics like command not found or permission denied) and do NOT
resolve as a successful probe completion. Instead return a marker (or reject)
that indicates an infrastructure probe failure so the surrounding probe loop
(the code handling pollWhileCommand, probeTimeoutMs, probeShell and the logic
between the exec result handling and the failure-count/resume logic) increments
the probe-failure path rather than treating it as job-done; update the exec
callback to inspect (err as any).code and (err as any).killed and branch:
legitimate probe exit -> resolve({ exitCode: 0,... }) or resolve with exitCode
from process, but infrastructure errors (127,126, command-not-found, shell
syntax) should be surfaced as failure (e.g. resolve({ infraFailure: true,
exitCode: (err as any).code, stdout })) so the downstream logic can increment
failures instead of resuming the task.
| if (dispatch.action === "sleep") { | ||
| // Chunked sleep: poll s.active every 1s so pause/stop is responsive | ||
| const sleepMs = dispatch.durationMs; | ||
| const start = Date.now(); | ||
| while (Date.now() - start < sleepMs && s.active) { | ||
| await new Promise(r => setTimeout(r, Math.min(1000, sleepMs - (Date.now() - start)))); | ||
| } | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Finalize turn before continuing from sleep dispatch.
This branch skips finishTurn(...) and immediately continues, which leaves the turn lifecycle unmatched for observer/audit accounting.
Proposed fix
if (dispatch.action === "sleep") {
// Chunked sleep: poll s.active every 1s so pause/stop is responsive
const sleepMs = dispatch.durationMs;
const start = Date.now();
while (Date.now() - start < sleepMs && s.active) {
await new Promise(r => setTimeout(r, Math.min(1000, sleepMs - (Date.now() - start))));
}
+ finishTurn("skipped");
continue;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (dispatch.action === "sleep") { | |
| // Chunked sleep: poll s.active every 1s so pause/stop is responsive | |
| const sleepMs = dispatch.durationMs; | |
| const start = Date.now(); | |
| while (Date.now() - start < sleepMs && s.active) { | |
| await new Promise(r => setTimeout(r, Math.min(1000, sleepMs - (Date.now() - start)))); | |
| } | |
| continue; | |
| } | |
| if (dispatch.action === "sleep") { | |
| // Chunked sleep: poll s.active every 1s so pause/stop is responsive | |
| const sleepMs = dispatch.durationMs; | |
| const start = Date.now(); | |
| while (Date.now() - start < sleepMs && s.active) { | |
| await new Promise(r => setTimeout(r, Math.min(1000, sleepMs - (Date.now() - start)))); | |
| } | |
| finishTurn("skipped"); | |
| continue; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/resources/extensions/gsd/auto/loop.ts` around lines 435 - 443, The sleep
branch exits the turn loop with continue and never calls finishTurn(...),
leaving the turn lifecycle open; after the chunked sleep loop (inside the if
(dispatch.action === "sleep") block) call and await finishTurn(...) with the
same context used elsewhere (e.g., await finishTurn(s, dispatch) or the
equivalent signature used in this file) before continuing, ensuring you only
continue if finishTurn completes and preserving use of s.active for
responsiveness.
| if (pollIntervalMs !== undefined && (!Number.isInteger(pollIntervalMs) || pollIntervalMs < 1)) { | ||
| return { content: [{ type: "text" as const, text: "Error: pollIntervalMs must be a positive integer" }], isError: true, details: { error: "invalid pollIntervalMs" } as any }; | ||
| } | ||
| if (timeoutMs !== undefined && (!Number.isInteger(timeoutMs) || timeoutMs < 1)) { | ||
| return { content: [{ type: "text" as const, text: "Error: timeoutMs must be a positive integer" }], isError: true, details: { error: "invalid timeoutMs" } as any }; | ||
| } | ||
| if (pollIntervalMs !== undefined && timeoutMs !== undefined && timeoutMs < pollIntervalMs) { | ||
| return { content: [{ type: "text" as const, text: "Error: timeoutMs must be >= pollIntervalMs" }], isError: true, details: { error: "timeoutMs < pollIntervalMs" } as any }; | ||
| } |
There was a problem hiding this comment.
Enforce the documented 1-second minimum here, not 1ms.
The new contract says pollIntervalMs has a 1s floor, but both the runtime guard and the TypeBox schema accept 1. Those values will be persisted and only later clamped by dispatch, which makes the tool contract misleading and leaves invalid JSON/DB state behind.
Also applies to: 1272-1273
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/resources/extensions/gsd/bootstrap/db-tools.ts` around lines 1181 - 1189,
The runtime guards for pollIntervalMs and timeoutMs enforce a 1ms minimum but
the contract requires a 1s minimum; update the validation in the function
containing pollIntervalMs/timeoutMs to require >= 1000 (ms) instead of >= 1,
update the error messages to say "must be at least 1000 ms (1s)" and keep the
comparison between timeoutMs and pollIntervalMs but ensure it compares using the
same 1000ms floor; also apply the same change to the other validation block that
handles pollIntervalMs/timeoutMs (the second occurrence referenced in the diff)
so both runtime guards and messaging match the documented 1-second minimum.
| // Insert external_waits DB row and update task status (R213, R214, R223) | ||
| try { | ||
| insertExternalWait(milestoneId, sliceId, taskId, trimmedCommand, { | ||
| successCheck, | ||
| pollIntervalMs, | ||
| timeoutMs, | ||
| contextHint, | ||
| onTimeout, | ||
| }); | ||
| updateTaskStatus(milestoneId, sliceId, taskId, "awaiting-external"); | ||
| } catch (dbErr) { | ||
| // Cleanup the JSON file to avoid partial state | ||
| try { unlinkSync(jsonPath); } catch (cleanupErr) { logError("db", `Failed to clean up probe spec ${jsonPath}: ${cleanupErr instanceof Error ? cleanupErr.message : String(cleanupErr)}`); } | ||
| return { content: [{ type: "text" as const, text: `Error: DB update failed — ${dbErr instanceof Error ? dbErr.message : String(dbErr)}` }], isError: true, details: { error: "db update failed" } as any }; | ||
| } |
There was a problem hiding this comment.
The registration path is not actually atomic.
If insertExternalWait() succeeds and updateTaskStatus() throws, the catch only deletes the JSON file. The external_waits row remains committed, so the task stays executing while a stray wait record says waiting. Wrap both DB mutations in one transaction, or explicitly delete the inserted wait row on rollback, so the task status + DB row + JSON spec invariant is preserved.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/resources/extensions/gsd/bootstrap/db-tools.ts` around lines 1222 - 1236,
The current registration isn't atomic: if insertExternalWait(s) succeeds but
updateTaskStatus(...) throws, the catch only unlinks jsonPath and leaves the
external_waits row orphaned; fix by wrapping the two DB operations in a single
transaction (begin/commit/rollback) so both insertExternalWait and
updateTaskStatus execute atomically, or if transactions aren't available, delete
the inserted external_wait row inside the catch before unlinking the JSON file
(use the same identifiers used by insertExternalWait to find the row) and ensure
rollback/cleanup logic runs for any thrown dbErr to preserve the task status +
DB row + JSON spec invariant.
| db.exec(` | ||
| CREATE TABLE IF NOT EXISTS external_waits ( | ||
| milestone_id TEXT NOT NULL, | ||
| slice_id TEXT NOT NULL, | ||
| task_id TEXT NOT NULL, | ||
| status TEXT NOT NULL DEFAULT 'waiting', | ||
| poll_while_command TEXT NOT NULL, | ||
| success_check TEXT, | ||
| poll_interval_ms INTEGER NOT NULL DEFAULT 30000, | ||
| timeout_ms INTEGER NOT NULL DEFAULT 86400000, | ||
| context_hint TEXT, | ||
| on_timeout TEXT NOT NULL DEFAULT 'manual-attention', | ||
| probe_failure_count INTEGER NOT NULL DEFAULT 0, | ||
| registered_at TEXT NOT NULL, | ||
| resolved_at TEXT, | ||
| PRIMARY KEY (milestone_id, slice_id, task_id), | ||
| FOREIGN KEY (milestone_id, slice_id, task_id) REFERENCES tasks(milestone_id, slice_id, id) ON DELETE CASCADE | ||
| ) | ||
| `); | ||
| db.exec("CREATE INDEX IF NOT EXISTS idx_external_waits_milestone_status ON external_waits(milestone_id, status)"); |
There was a problem hiding this comment.
Persist external_waits anywhere task state is snapshotted or merged.
This table is now part of runnable workflow state, but same-file flows like restoreManifest() and reconcileWorktreeDb() still do not carry it. That leaves you with tasks.status = 'awaiting-external' but no probe row after a restore/merge, and resolveDispatch() turns that into a hard stop/manual-attention path instead of resuming polling.
Also applies to: 1256-1283, 2678-2762
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/resources/extensions/gsd/gsd-db.ts` around lines 560 - 579, The
external_waits table rows are not being included when snapshotting or merging
task state, causing tasks with status 'awaiting-external' to lose their probe
row after restore/merge; update the snapshot/merge logic in the same-file flows
that handle task state—specifically restoreManifest(), reconcileWorktreeDb(),
and any functions that serialize/deserialize tasks—to persist and rehydrate
external_waits rows alongside the tasks table so that PRIMARY KEY (milestone_id,
slice_id, task_id) entries from external_waits are written to the DB during
restore/merge and read back on reconcile, ensuring resolveDispatch() will find
the probe row and resume polling.
| test("stateful POSIX probe: exit 0 twice (still running), exit 1 third (done) → resume with contextHint", async () => { | ||
| const tmpBase = mkdtempSync(join(tmpdir(), "gsd-e2e-multi-")); | ||
| base = tmpBase; | ||
| const gsdDir = join(tmpBase, ".gsd"); | ||
| const m001Dir = join(gsdDir, "milestones", "M001"); | ||
| const s01Dir = join(m001Dir, "slices", "S01"); | ||
| const tasksDir = join(s01Dir, "tasks"); | ||
|
|
||
| mkdirSync(tasksDir, { recursive: true }); | ||
|
|
||
| writeFileSync( | ||
| join(m001Dir, "M001-CONTEXT.md"), | ||
| "# M001: Multi-cycle\n\n## Purpose\nMulti-cycle probe test.\n", | ||
| ); | ||
| writeFileSync( | ||
| join(m001Dir, "M001-ROADMAP.md"), | ||
| [ | ||
| "# M001: Multi-cycle", | ||
| "", | ||
| "## Vision", | ||
| "Multi-cycle probe.", | ||
| "", | ||
| "## Success Criteria", | ||
| "- Multi-cycle works", | ||
| "", | ||
| "## Slices", | ||
| "", | ||
| "- [ ] **S01: Test** `risk:low` `depends:[]`", | ||
| " - After this: done.", | ||
| "", | ||
| "## Boundary Map", | ||
| "", | ||
| "| From | To | Produces | Consumes |", | ||
| "|------|----|----------|----------|", | ||
| "| S01 | terminal | result | nothing |", | ||
| ].join("\n"), | ||
| ); | ||
| writeFileSync( | ||
| join(s01Dir, "S01-PLAN.md"), | ||
| [ | ||
| "# S01: Test", | ||
| "", | ||
| "**Goal:** multi-cycle probe", | ||
| "", | ||
| "## Tasks", | ||
| "", | ||
| "- [ ] **T01: Test** `est:30m`", | ||
| " - Do: test", | ||
| " - Verify: pass", | ||
| ].join("\n"), | ||
| ); | ||
| writeFileSync(join(tasksDir, "T01-PLAN.md"), "# T01: Test\n\n## Steps\n1. test\n"); | ||
|
|
||
| // Create POSIX-portable stateful probe script | ||
| const counterFile = join(tmpBase, "counter.txt"); | ||
| const probeScript = join(tmpBase, "probe.sh"); | ||
| writeFileSync( | ||
| probeScript, | ||
| [ | ||
| '#!/bin/sh', | ||
| 'COUNT=$(cat "$1" 2>/dev/null || echo 0)', | ||
| 'COUNT=$(expr $COUNT + 1)', | ||
| 'echo $COUNT > "$1"', | ||
| 'if [ $COUNT -ge 3 ]; then exit 1; fi', | ||
| 'exit 0', | ||
| ].join("\n"), | ||
| ); | ||
| chmodSync(probeScript, 0o755); | ||
|
|
||
| const pollWhileCommand = `${probeScript} ${counterFile}`; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
if [ -d .github/workflows ]; then
fd -HI '.*\.ya?ml' .github/workflows -x rg -nC2 'windows-latest|windows-2022|runs-on:.*windows' {}
fiRepository: gsd-build/gsd-2
Length of output: 445
🏁 Script executed:
# Get the test file size and then read the specific line ranges
wc -l src/resources/extensions/gsd/tests/external-wait-e2e.test.ts
# Read the line ranges mentioned in the review (206-275, 343-504, 522-630)
echo "=== Lines 206-275 ==="
sed -n '206,275p' src/resources/extensions/gsd/tests/external-wait-e2e.test.ts
echo "=== Lines 343-504 ==="
sed -n '343,504p' src/resources/extensions/gsd/tests/external-wait-e2e.test.ts
echo "=== Lines 522-630 ==="
sed -n '522,630p' src/resources/extensions/gsd/tests/external-wait-e2e.test.tsRepository: gsd-build/gsd-2
Length of output: 12326
🏁 Script executed:
# Check for any platform detection or skip logic in the test file
rg -n "process\.platform|skip|windows|darwin|linux" src/resources/extensions/gsd/tests/external-wait-e2e.test.tsRepository: gsd-build/gsd-2
Length of output: 683
Skip these POSIX-specific test scenarios on Windows or make them platform-portable.
These test suites hardcode POSIX commands (sleep, exit N, shell scripts with #!/bin/sh) that will fail on Windows runners in your CI matrix (windows-latest, windows-2025).
- Lines 206–275: Creates a shell script with POSIX syntax (
#!/bin/sh,chmodSync) and invokes it directly. - Lines 343–504: Passes
"sleep 35"and bareexitcommands aspollWhileCommand. - Lines 522–630: Uses
"exit 0"aspollWhileCommand.
While the feature advertises cmd.exe support, these tests will fail before exercising the dispatcher. Add process.platform === 'win32' checks with test.skip() or refactor the probes to be OS-portable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/resources/extensions/gsd/tests/external-wait-e2e.test.ts` around lines
206 - 275, The POSIX-only stateful probe test ("stateful POSIX probe: exit 0
twice...") creates and executes a shell script (probeScript, uses chmodSync and
shebang, writes counterFile) and sets pollWhileCommand to a POSIX command, which
will fail on Windows; either skip this test on Windows by detecting
process.platform === 'win32' and calling test.skip() at the top of the test, or
replace the shell probe with a platform-portable Node-based probe (e.g., a small
JS script or a cross-platform command) and update usages of pollWhileCommand,
probeScript, chmodSync, and counterFile accordingly so CI Windows runners don't
execute POSIX-only commands.
| test("probe timeout increments failure count", { timeout: 40000, skip: skipSlow }, async () => { | ||
| const { basePath } = createFixture(); | ||
| updateTaskStatus("M001", "S01", "T01", "awaiting-external"); | ||
| insertExternalWaitRow(basePath, { | ||
| milestoneId: "M001", | ||
| sliceId: "S01", | ||
| taskId: "T01", | ||
| pollWhileCommand: "sleep 35", | ||
| pollIntervalMs: 10000, | ||
| probeFailureCount: 0, | ||
| }); | ||
| writeProbeSpec(basePath, "M001", "S01", "T01", "sleep 35"); |
There was a problem hiding this comment.
Use a portable long-running probe for the timeout cases.
sleep 35 only works on POSIX shells. These timeout assertions will fail on the Windows cmd.exe path that the dispatcher explicitly supports, so the coverage is currently platform-specific.
Portable test pattern
+ const longRunningProbe = `"${process.execPath}" -e "setTimeout(() => process.exit(0), 35000)"`;
+
test("probe timeout increments failure count", { timeout: 40000, skip: skipSlow }, async () => {
const { basePath } = createFixture();
updateTaskStatus("M001", "S01", "T01", "awaiting-external");
insertExternalWaitRow(basePath, {
milestoneId: "M001",
sliceId: "S01",
taskId: "T01",
- pollWhileCommand: "sleep 35",
+ pollWhileCommand: longRunningProbe,
pollIntervalMs: 10000,
probeFailureCount: 0,
});
- writeProbeSpec(basePath, "M001", "S01", "T01", "sleep 35");
+ writeProbeSpec(basePath, "M001", "S01", "T01", longRunningProbe);Also applies to: 428-439
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/resources/extensions/gsd/tests/external-wait-state-dispatch.test.ts`
around lines 397 - 408, Replace platform-specific "sleep 35" probes with a
portable long-running command so the test runs on Windows and POSIX: update the
two occurrences (in the test "probe timeout increments failure count" where
insertExternalWaitRow sets pollWhileCommand and where writeProbeSpec writes the
probe) to use a Node-based one-liner launched via the test runtime (i.e., use
process.execPath to run a short JS that waits ~35s) instead of "sleep 35"; apply
the same change to the other test block around lines 428-439 that uses "sleep
35".
Summary
Split 2/3 of #4655 — probe execution and tool registration.
pollWhileCommand(exit 0 = still running, non-zero = done)successCheckfor post-completion validationmanual-attentiongsd_register_external_waittool with pattern-based rejection of dangerous commands (curl|sh,rm -rf, etc.)Security surface
The
pollWhileCommandis executed via/bin/sh(orcmd.exeon Windows). This is arbitrary command execution by design — the agent already has full shell access via bash tools. The probe runs unattended on a timer, so:Incremental files (9)
auto-dispatch.ts(+197)bootstrap/db-tools.ts(+176),dev-workflow-engine.tsexternal-wait-e2e.test.ts,external-wait-registration.test.ts,external-wait-resume.test.tscomplete-slice.test.ts,complete-task.test.ts,tool-naming.test.tsTest plan
node --import ./src/resources/extensions/gsd/tests/resolve-ts.mjs --experimental-strip-types --test src/resources/extensions/gsd/tests/external-wait-e2e.test.tsnode --import ./src/resources/extensions/gsd/tests/resolve-ts.mjs --experimental-strip-types --test src/resources/extensions/gsd/tests/external-wait-registration.test.tsnode --import ./src/resources/extensions/gsd/tests/resolve-ts.mjs --experimental-strip-types --test src/resources/extensions/gsd/tests/external-wait-resume.test.tschild_process.exec— no mocksMerge order: #4790 → this → #4793. Supersedes #4655.
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Tests