Resumed sessions can crash with orphaned tool_result when no recent user message exists

## Summary

When a session is resumed (or trimmed mid-session) after a long autonomous run with no operator input in the recent tail, `getResumeMessages` can return a slice whose first message is a `tool` (tool_result). Anthropic rejects every subsequent submit on that session with:

```
messages.0.content.0: unexpected `tool_use_id` found in `tool_result` blocks: <id>.
Each `tool_result` block must have a corresponding `tool_use` block in the previous message.
```

The error is sticky: each new submit fails identically, and the rollback path re-persists the broken slice to `messages.json`, so the session is effectively bricked until the file is manually repaired.

## Reproduction (deterministic)

1. Start a session in operator/auto mode and let the agent run autonomously past ~200 model messages without typing any operator input.
2. Close and reopen the session (or trigger any code path that re-runs `getResumeMessages` on the persisted history — e.g. the post-step sync at `src/tui/components/operator-dashboard/index.tsx:1314`).
3. Type any prompt.
4. The request to the model fails with the orphaned `tool_result` error above. Every retry fails identically with the same `tool_use_id`.

## Root cause

`src/core/session/index.ts` — `getResumeMessages`:

```ts
if (messages.length <= limit) return messages;

let cutIndex = messages.length - limit;
while (cutIndex < messages.length) {
  if (messages[cutIndex].role === "user") break;
  cutIndex++;
}
if (cutIndex >= messages.length) {
  cutIndex = messages.length - limit; // raw fallback — can land on a `tool` message
}
return messages.slice(cutIndex);
```

- The walk only searches forward for a `user` boundary.
- When the recent `limit` messages contain no `user` role (common in long autonomous runs), the fallback does a raw cut at `messages.length - limit`.
- That index can land on a `tool` message, putting an orphaned `tool-result` at `result[0]`. The matching assistant `tool-call` has been trimmed off the front.
- The AI SDK converts a leading `tool` role to an Anthropic `user` message containing a `tool_result` block with no preceding `tool_use` — exactly the error condition.

`normalizeMessages` does not repair this; it only merges consecutive user messages and upgrades raw-string `output` fields to `{ type: "text", value: ... }`. It does not enforce the tool_use/tool_result pairing invariant.

## Why the broken state is sticky

Two paths re-persist the broken slice on disk:

1. **User submit** — `src/tui/components/operator-dashboard/index.tsx:817-826` writes `[...conversationRef.current, { role: "user", content: prompt }]` to `messages.json`. If `conversationRef.current[0]` is a `tool`, the orphan is at the head of the persisted file.
2. **Error rollback** — `src/tui/components/operator-dashboard/index.tsx:1342-1354`. When the API rejects the request, the catch block rolls `conversationRef.current` back to `prevMessages` and writes that to disk — i.e. the orphan-headed state without the new user message. Subsequent submits append a user at the end, but the orphan at the head persists.

## Existing test gap

`src/core/session/persistence.test.ts` ("handles conversations with no user messages after cut point") only asserts `result.length === 5`. It never asserts that `result[0]` is a safe role, so the regression slipped past existing coverage.

## Suggested fix shape

In `getResumeMessages`, after picking `cutIndex`, advance past any leading `tool` messages and any leading `assistant` message that begins with a `tool-call` whose paired `tool-result` was trimmed. The chosen slice must start with either a `user` message or an `assistant` whose content has no orphan `tool-call` parts.

Hardening (defense in depth): have `normalizeMessages` strip leading orphaned `tool` messages and leading `tool-call`-only assistant messages, so any caller that constructs a conversation prefix (not just resume) gets the same invariant for free.

Also worth adding: an explicit test that asserts the slice is API-valid (no orphaned tool_use/tool_result at head) for the all-tool-and-assistant tail case.

## Workaround for affected sessions

The session is recoverable: edit `~/.pensar/sessions/<sessionId>/messages.json` so the array starts at the first `assistant` message with no `tool-call` content parts (or the first `user`/safe-`assistant` further in). Deleting only `messages[0]` is not enough — the next message is typically also an orphan.

## Impact

- Affects any long-running session resumed in auto/autopilot mode without recent operator input.
- Once triggered, the session cannot be used until `messages.json` is repaired by hand.
- Silent for the operator: the failure mode looks like an unrelated 400 from the model provider.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resumed sessions can crash with orphaned tool_result when no recent user message exists #799

Summary

Reproduction (deterministic)

Root cause

Why the broken state is sticky

Existing test gap

Suggested fix shape

Workaround for affected sessions

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Resumed sessions can crash with orphaned tool_result when no recent user message exists #799

Description

Summary

Reproduction (deterministic)

Root cause

Why the broken state is sticky

Existing test gap

Suggested fix shape

Workaround for affected sessions

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions