Summary
clauck work <text> can hang forever after the work has actually completed. The spawned claude -p subprocess emits its terminal result JSON envelope but never exits, and the wrapper's subprocess.run(..., capture_output=True) (no timeout=) blocks on communicate() waiting for pipe EOF that never comes.
User-visible symptom: the spinner keeps incrementing (executing (sonnet, medium effort)... (1455s) and counting) long after the work it described is done and durably committed to disk. There is no log file for clauck work itself, so a casual observer can't tell whether work is in flight or whether the wrapper is zombie-waiting.
Repro (observed)
clauck work "<long natural-language install prompt>" invoked at 11:35 local.
- Stage 1 interpreter classified, stage 2 dispatched
claude -p ... --output-format json (lib/clauck:4337–4350).
- Inner
claude -p ran the requested work cleanly: job file written at 11:36, clauck fire dream-pass ran 11:37–11:41 to exit_code=0, dream summary posted to Slack. Fire log ~/.clauck/dream-pass-<ts>-<pid>.log ends with the full success envelope: terminal_reason: completed, duration_ms: 274989, total_cost_usd: 0.77.
- 75 minutes later, the wrapper is still spinning.
ps: inner claude -p (PID 27870) idle at 0% CPU, STAT S+, no clauck-mcp children, no live TCP connections (lsof). Process is alive but not doing anything.
- The spinner is the only signal the user has, and it is positively misleading — it implies ongoing execution.
Diagnosis
lib/clauck:4337:
result = subprocess.run(
[claude_bin, "-p", enhanced_prompt,
...
"--output-format", "json",
"--setting-sources", ""],
capture_output=True, text=True,
)
capture_output=True → Popen.communicate() under the hood, which drains stdout and stderr and waits for the child to exit, not just for terminal output.
- No
timeout= argument.
- If
claude -p emits its result JSON to stdout but doesn't close its file descriptors / exit (a known failure mode where a spawned MCP server or worker keeps fds alive after the agent loop ends), the wrapper waits indefinitely. The terminal envelope is sitting in the captured-stdout buffer, never delivered to the user, never parsed.
The same subprocess.run(..., capture_output=True) pattern without timeout exists in cmd_semantic's stage-1 interpreter call (lib/clauck:4195) and is the pattern used by _parse_interpreter_result. Stage 1 hangs would be much rarer (no MCP, 3-turn cap, $0.30 budget) but the structural risk is the same.
Why it's worse than a bare hang
There is no log file for clauck work. Fired/scheduled jobs write to ~/.clauck/<name>-<ts>-<pid>.log via run-job.sh, but the work-alias path streams to the terminal and captures via subprocess.run. So when the wrapper hangs:
- The spinner says
executing (sonnet, medium effort)... (Ns) — implies in-flight work.
- No log to tail. The fire log of any job the work triggered exists (and shows clean completion), but that's two layers down and not obviously connected.
- The user has no way to discover that work is done short of running
ps / lsof on the spawned claude -p.
Proposed fix
Two changes, both small:
-
Stream + watchdog instead of blocking communicate. Switch the stage-2 (and stage-1) call to subprocess.Popen reading stdout line-by-line. As soon as a JSON line with "type":"result" and "terminal_reason":"completed" (or any of the other terminal reasons) lands, render it, then terminate() the child if it doesn't exit on its own within e.g. 5s, escalating to kill() after another 5s. Same approach run-job.sh already takes for tombstoning. Don't trust the child to close pipes voluntarily.
-
Write a clauck work log alongside fired-job logs. Today fired/scheduled jobs are observable via ~/.clauck/<name>-<ts>-<pid>.log; clauck work is not. Mirror the pattern: ~/.clauck/_work-<ts>-<pid>.log containing the routing decision, the enhanced prompt, the spawned argv, and the streamed JSON. Then clauck logs --last 1 _work works for diagnosis. (Alternatively, the work alias could route through run-job.sh for free, but that's a bigger refactor.)
The watchdog change is the load-bearing one — it prevents hangs entirely and surfaces results immediately. The log change is an observability backstop: even if a future failure mode escapes the watchdog, the user has somewhere to look.
Workaround until fixed
If you see clauck work spinning past ~5× the expected duration, in another terminal:
ps -ef | grep "claude -p" | grep -v grep # find the inner claude PID
ls -lt ~/.clauck/*.log | head # check whether a fire it triggered finished
kill <inner-pid> # collapse the wrapper
The work the inner claude -p did is durable; killing the parent doesn't roll it back.
Environment
- clauck v1.5.x (main, recent)
- Claude CLI 2.1.129
- macOS Darwin 25.4.0, Python 3.14.4 (homebrew),
/usr/bin/python3 for scheduler
Summary
clauck work <text>can hang forever after the work has actually completed. The spawnedclaude -psubprocess emits its terminalresultJSON envelope but never exits, and the wrapper'ssubprocess.run(..., capture_output=True)(notimeout=) blocks oncommunicate()waiting for pipe EOF that never comes.User-visible symptom: the spinner keeps incrementing (
executing (sonnet, medium effort)... (1455s)and counting) long after the work it described is done and durably committed to disk. There is no log file forclauck workitself, so a casual observer can't tell whether work is in flight or whether the wrapper is zombie-waiting.Repro (observed)
clauck work "<long natural-language install prompt>"invoked at 11:35 local.claude -p ... --output-format json(lib/clauck:4337–4350).claude -pran the requested work cleanly: job file written at 11:36,clauck fire dream-passran 11:37–11:41 toexit_code=0, dream summary posted to Slack. Fire log~/.clauck/dream-pass-<ts>-<pid>.logends with the full success envelope:terminal_reason: completed,duration_ms: 274989,total_cost_usd: 0.77.ps: innerclaude -p(PID 27870) idle at 0% CPU,STAT S+, noclauck-mcpchildren, no live TCP connections (lsof). Process is alive but not doing anything.Diagnosis
lib/clauck:4337:capture_output=True→Popen.communicate()under the hood, which drains stdout and stderr and waits for the child to exit, not just for terminal output.timeout=argument.claude -pemits itsresultJSON to stdout but doesn't close its file descriptors / exit (a known failure mode where a spawned MCP server or worker keeps fds alive after the agent loop ends), the wrapper waits indefinitely. The terminal envelope is sitting in the captured-stdout buffer, never delivered to the user, never parsed.The same
subprocess.run(..., capture_output=True)pattern without timeout exists incmd_semantic's stage-1 interpreter call (lib/clauck:4195) and is the pattern used by_parse_interpreter_result. Stage 1 hangs would be much rarer (no MCP, 3-turn cap, $0.30 budget) but the structural risk is the same.Why it's worse than a bare hang
There is no log file for
clauck work. Fired/scheduled jobs write to~/.clauck/<name>-<ts>-<pid>.logviarun-job.sh, but the work-alias path streams to the terminal and captures viasubprocess.run. So when the wrapper hangs:executing (sonnet, medium effort)... (Ns)— implies in-flight work.ps/lsofon the spawnedclaude -p.Proposed fix
Two changes, both small:
Stream + watchdog instead of blocking communicate. Switch the stage-2 (and stage-1) call to
subprocess.Popenreading stdout line-by-line. As soon as a JSON line with"type":"result"and"terminal_reason":"completed"(or any of the other terminal reasons) lands, render it, thenterminate()the child if it doesn't exit on its own within e.g. 5s, escalating tokill()after another 5s. Same approachrun-job.shalready takes for tombstoning. Don't trust the child to close pipes voluntarily.Write a
clauck worklog alongside fired-job logs. Today fired/scheduled jobs are observable via~/.clauck/<name>-<ts>-<pid>.log;clauck workis not. Mirror the pattern:~/.clauck/_work-<ts>-<pid>.logcontaining the routing decision, the enhanced prompt, the spawned argv, and the streamed JSON. Thenclauck logs --last 1 _workworks for diagnosis. (Alternatively, the work alias could route throughrun-job.shfor free, but that's a bigger refactor.)The watchdog change is the load-bearing one — it prevents hangs entirely and surfaces results immediately. The log change is an observability backstop: even if a future failure mode escapes the watchdog, the user has somewhere to look.
Workaround until fixed
If you see
clauck workspinning past ~5× the expected duration, in another terminal:The work the inner
claude -pdid is durable; killing the parent doesn't roll it back.Environment
/usr/bin/python3for scheduler