Supervisor heartbeat timer crashes pi with stale extension ctx (uncaughtException in startHeartbeat sendMessage)

## Summary

When a batch starts and the supervisor activates, pi can crash with an uncaught exception if a background supervisor timer calls `pi.sendMessage()` after the extension context has been replaced or reloaded.

This kills the entire pi session. The orchestrator engine may continue in a worker thread, but the operator loses the supervisor UI and monitoring.

## Environment

- **taskplane:** 0.30.1 (npm latest)
- **pi:** `@earendil-works/pi-coding-agent@0.77.0`
- **OS:** macOS (darwin 24.6.0)
- **Mode:** repo mode, supervised autonomy

## Observed behavior

1. `/orch all` starts batch and wave 1 successfully
2. Operator sees normal supervisor/orchestrator output, e.g.:
   - `🌊 Wave 1 starting with 8 task(s) across 3 lanes.`
   - `🔀 Orchestrator · repo mode · 1...`
3. Pi exits immediately after with:

```
pi exiting due to uncaughtException:
Error: This extension ctx is stale after session replacement or reload. Do not use a captured pi or command ctx after ctx.newSession(), ctx.fork(), ctx.switchSession(), or ctx.reload(). For newSession, fork, and switchSession, move post-replacement work into withSession and use the ctx passed to withSession. For reload, do not use the old ctx after await ctx.reload().
    at Object.assertActive (file:///usr/local/lib/node_modules/@earendil-works/pi-coding-agent/dist/core/extensions/loader.js:105:19)
    at Object.sendMessage (file:///usr/local/lib/node_modules/@earendil-works/pi-coding-agent/dist/core/extensions/loader.js:197:21)
    at Timeout.<anonymous> (/Users/<user>/.pi/agent/npm/node_modules/taskplane/extensions/taskplane/supervisor.ts:3736:12)
```

## Root cause (analysis)

In `extensions/taskplane/supervisor.ts`, `activateSupervisor()` starts background timers that **capture `pi: ExtensionAPI` in closures**:

- `startHeartbeat()` — 30s interval, line ~3736 calls `pi.sendMessage()` in the takeover/yield branch
- `startEventTailer()` — 10s interval, `notify()` callback also calls `pi.sendMessage()`

Pi forbids using a captured extension API after session replacement/reload. When a timer fires with a stale handle, `assertActive` throws. The exception is **not caught**, so it becomes a process-fatal `uncaughtException`.

Additional lifecycle gap: `activateSupervisor()` assigns `state.heartbeatTimer = startHeartbeat(...)` without clearing any existing heartbeat/event tailer timers first (cleanup only happens in `deactivateSupervisor()`). Re-activation or session churn can leave orphaned timers holding stale `pi` references.

Relevant code on `main` (still present in 0.30.1):

```typescript
// activateSupervisor — no timer teardown before starting new ones
state.heartbeatTimer = startHeartbeat(stateRoot, state, pi);
startEventTailer(pi, state.eventTailer, state, ...);

// startHeartbeat — stale pi.sendMessage on takeover detection
if (currentLock && currentLock.sessionId !== sessionId) {
  clearInterval(timer);
  pi.sendMessage({ customType: "supervisor-yield", ... }, { triggerTurn: false });
  deactivateSupervisor(pi, state);
}
```

## Expected behavior

- Supervisor timers should either resolve a **fresh** extension context (e.g. via `withSession`) or treat stale ctx as a shutdown signal
- Timer callbacks should **never** crash pi — wrap `pi.sendMessage()` in try/catch and silently stop timers on stale ctx
- `activateSupervisor()` should tear down existing heartbeat/event tailer timers before starting new ones

## Suggested fix

1. At top of `activateSupervisor()` (before starting timers):
   ```typescript
   stopEventTailer(state.eventTailer);
   if (state.heartbeatTimer) {
     clearInterval(state.heartbeatTimer);
     state.heartbeatTimer = null;
   }
   ```

2. In `startHeartbeat()` and event tailer `notify()`:
   ```typescript
   try {
     pi.sendMessage(...);
   } catch (err) {
     if (isStaleExtensionCtx(err)) {
       clearInterval(timer);
       // deactivate without rethrowing — do not crash pi
       return;
     }
   }
   ```

3. Consider deferring timer start until after the activation `triggerTurn` completes, or use pi's recommended `withSession` pattern for any post-session-replacement work.

## Impact

- **Severity:** high — process crash during normal batch startup
- **Workaround:** restart pi and `/orch-resume`; batch state may survive but supervisor monitoring is unreliable until fixed
- **Not a task/worker failure** — crash is in supervisor infrastructure, not lane worker code

## Repro notes

Observed during batch startup with 3 lanes / 8 tasks in wave 1. Exact session-replacement trigger not confirmed, but crash site matches pi's stale ctx guard on a captured `pi` in a supervisor timer callback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supervisor heartbeat timer crashes pi with stale extension ctx (uncaughtException in startHeartbeat sendMessage) #597

Summary

Environment

Observed behavior

Root cause (analysis)

Expected behavior

Suggested fix

Impact

Repro notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Supervisor heartbeat timer crashes pi with stale extension ctx (uncaughtException in startHeartbeat sendMessage) #597

Description

Summary

Environment

Observed behavior

Root cause (analysis)

Expected behavior

Suggested fix

Impact

Repro notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions