clauck work hangs indefinitely after inner `claude -p` emits terminal JSON

## Summary

`clauck work <text>` can hang forever after the work has actually completed. The spawned `claude -p` subprocess emits its terminal `result` JSON envelope but never exits, and the wrapper's `subprocess.run(..., capture_output=True)` (no `timeout=`) blocks on `communicate()` waiting for pipe EOF that never comes.

User-visible symptom: the spinner keeps incrementing (`executing (sonnet, medium effort)... (1455s)` and counting) long after the work it described is done and durably committed to disk. There is no log file for `clauck work` itself, so a casual observer can't tell whether work is in flight or whether the wrapper is zombie-waiting.

## Repro (observed)

1. `clauck work "<long natural-language install prompt>"` invoked at 11:35 local.
2. Stage 1 interpreter classified, stage 2 dispatched `claude -p ... --output-format json` (lib/clauck:4337–4350).
3. Inner `claude -p` ran the requested work cleanly: job file written at 11:36, `clauck fire dream-pass` ran 11:37–11:41 to `exit_code=0`, dream summary posted to Slack. Fire log `~/.clauck/dream-pass-<ts>-<pid>.log` ends with the full success envelope: `terminal_reason: completed`, `duration_ms: 274989`, `total_cost_usd: 0.77`.
4. **75 minutes later**, the wrapper is still spinning. `ps`: inner `claude -p` (PID 27870) idle at 0% CPU, `STAT S+`, no `clauck-mcp` children, no live TCP connections (lsof). Process is alive but not doing anything.
5. The spinner is the only signal the user has, and it is positively misleading — it implies ongoing execution.

## Diagnosis

`lib/clauck:4337`:

```python
result = subprocess.run(
    [claude_bin, "-p", enhanced_prompt,
     ...
     "--output-format", "json",
     "--setting-sources", ""],
    capture_output=True, text=True,
)
```

- `capture_output=True` → `Popen.communicate()` under the hood, which drains stdout and stderr and waits for the child to **exit**, not just for terminal output.
- No `timeout=` argument.
- If `claude -p` emits its `result` JSON to stdout but doesn't close its file descriptors / exit (a known failure mode where a spawned MCP server or worker keeps fds alive after the agent loop ends), the wrapper waits indefinitely. The terminal envelope is sitting in the captured-stdout buffer, never delivered to the user, never parsed.

The same `subprocess.run(..., capture_output=True)` pattern without timeout exists in `cmd_semantic`'s stage-1 interpreter call (`lib/clauck:4195`) and is the pattern used by `_parse_interpreter_result`. Stage 1 hangs would be much rarer (no MCP, 3-turn cap, $0.30 budget) but the structural risk is the same.

## Why it's worse than a bare hang

There is no log file for `clauck work`. Fired/scheduled jobs write to `~/.clauck/<name>-<ts>-<pid>.log` via `run-job.sh`, but the work-alias path streams to the terminal and captures via `subprocess.run`. So when the wrapper hangs:

- The spinner says `executing (sonnet, medium effort)... (Ns)` — implies in-flight work.
- No log to tail. The fire log of any job the work *triggered* exists (and shows clean completion), but that's two layers down and not obviously connected.
- The user has no way to discover that work is done short of running `ps` / `lsof` on the spawned `claude -p`.

## Proposed fix

Two changes, both small:

1. **Stream + watchdog instead of blocking communicate.** Switch the stage-2 (and stage-1) call to `subprocess.Popen` reading stdout line-by-line. As soon as a JSON line with `"type":"result"` and `"terminal_reason":"completed"` (or any of the other terminal reasons) lands, render it, then `terminate()` the child if it doesn't exit on its own within e.g. 5s, escalating to `kill()` after another 5s. Same approach `run-job.sh` already takes for tombstoning. Don't trust the child to close pipes voluntarily.

2. **Write a `clauck work` log alongside fired-job logs.** Today fired/scheduled jobs are observable via `~/.clauck/<name>-<ts>-<pid>.log`; `clauck work` is not. Mirror the pattern: `~/.clauck/_work-<ts>-<pid>.log` containing the routing decision, the enhanced prompt, the spawned argv, and the streamed JSON. Then `clauck logs --last 1 _work` works for diagnosis. (Alternatively, the work alias could route through `run-job.sh` for free, but that's a bigger refactor.)

The watchdog change is the load-bearing one — it prevents hangs entirely and surfaces results immediately. The log change is an observability backstop: even if a future failure mode escapes the watchdog, the user has somewhere to look.

## Workaround until fixed

If you see `clauck work` spinning past ~5× the expected duration, in another terminal:

```bash
ps -ef | grep "claude -p" | grep -v grep        # find the inner claude PID
ls -lt ~/.clauck/*.log | head                   # check whether a fire it triggered finished
kill <inner-pid>                                # collapse the wrapper
```

The work the inner `claude -p` did is durable; killing the parent doesn't roll it back.

## Environment

- clauck v1.5.x (main, recent)
- Claude CLI 2.1.129
- macOS Darwin 25.4.0, Python 3.14.4 (homebrew), `/usr/bin/python3` for scheduler

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clauck work hangs indefinitely after inner `claude -p` emits terminal JSON #136

Summary

Repro (observed)

Diagnosis

Why it's worse than a bare hang

Proposed fix

Workaround until fixed

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

clauck work hangs indefinitely after inner claude -p emits terminal JSON #136

Description

Summary

Repro (observed)

Diagnosis

Why it's worse than a bare hang

Proposed fix

Workaround until fixed

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

clauck work hangs indefinitely after inner `claude -p` emits terminal JSON #136