Skip to content

feat: surface interrupted subagent sessions for recovery#392

Open
SweetSophia wants to merge 1 commit into
masterfrom
feat/session-resume-recovery
Open

feat: surface interrupted subagent sessions for recovery#392
SweetSophia wants to merge 1 commit into
masterfrom
feat/session-resume-recovery

Conversation

@SweetSophia
Copy link
Copy Markdown
Collaborator

@SweetSophia SweetSophia commented Apr 23, 2026

Summary

  • Reworks the original session recovery idea on top of the current task-session-manager instead of adding a separate tracker/hook.
  • Adds a recoverable interruption detector for task outputs with parseable task_id values, including empty <task_result>, provider errors, 429/rate-limit, timeout, quota and resource-exhaustion signals.
  • Appends an informational [task partial state available] note to interrupted task tool output so the orchestrator can decide whether to resume the existing subagent session.
  • Keeps the existing <resumable_sessions> prompt injection and alias/session tracking as the single source of truth.

What changed from the old PR shape

  • Removed the duplicate TaskSessionTracker approach.
  • Removed the separate delegate-task-resume hook.
  • Reused the existing task-session-manager and parseTaskIdFromTaskOutput() path.
  • Left AGENTS.md and unrelated formatting/test changes out of this PR.

Safety / compatibility

  • Recovery notes are only added when the task output includes a parseable task_id.
  • Argument/validation errors and missing-session errors do not get recovery notes.
  • Existing recovery markers are not duplicated.
  • The note is informational only; it does not force auto-resume.

Validation

  • bun test
  • bun run typecheck
  • bun run lint
  • git diff --check

Copilot AI review requested due to automatic review settings April 23, 2026 14:16
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 23, 2026

Greptile Summary

This PR adds a session resumption recovery mechanism for delegated subagent tasks: when a task tool result is empty or interrupted, a [task partial state available] note containing the task_id is appended so the orchestrator can choose to resume rather than spawn a fresh session. The implementation is clean — TaskSessionTracker manages the lifecycle, detectInterruptedTask distinguishes provider errors from parameter validation errors, and the orchestrator prompt is updated with explicit resume/no-resume guidance.

Confidence Score: 5/5

Safe to merge; the single finding is a defensive ordering concern in detectInterruptedTask that doesn't affect any currently-tested real-world output format.

All remaining findings are P2. The prior P1 concerns (agent name never populated, sweepStale never called) have been properly addressed with the updateAgent() two-step pattern and the lazy sweep on session.created. The core logic is sound, tests are thorough, and the mechanism is informational so incorrect detection at worst produces a superfluous recovery note rather than data loss.

src/hooks/delegate-task-resume/detector.ts — parameter-error guard ordering

Important Files Changed

Filename Overview
src/hooks/delegate-task-resume/detector.ts New detection logic for interrupted vs. parameter-error task results; logic ordering places the parameter-error guard after the empty-<task_result> short-circuit, which could cause misclassification.
src/hooks/delegate-task-resume/hook.ts Clean tool.execute.after hook wiring; correctly marks sessions completed or interrupted and builds recovery guidance only when task_id is parseable.
src/hooks/delegate-task-resume/guidance.ts Builds informational recovery note; guard on status is correct and the output format is clear.
src/utils/task-session-tracker.ts Session tracking map with register/update/mark/cleanup/sweep lifecycle; sweepStale() is now called on each child session.created as a lazy safety net, addressing the prior concern.
src/index.ts Correctly wires TaskSessionTracker and delegateTaskResumeHook; register() + updateAgent() two-step pattern properly handles the timing gap between session.created and first chat.message.
src/agents/orchestrator.ts Adds ## 7. Session Resumption guidance to the orchestrator prompt with clear resume/no-resume heuristics and an inline example.
src/hooks/delegate-task-resume/index.test.ts 20 thorough tests covering empty results, provider errors, parameter errors, untracked sessions, and non-task tools; one gap is a missing test for the <task_result></task_result> + [ERROR] combination.
src/utils/task-session-tracker.test.ts 10 unit tests covering all public methods including the sweepStale backdating test; comprehensive coverage.

Sequence Diagram

sequenceDiagram
    participant O as Orchestrator
    participant TT as task tool
    participant Hook as DelegateTaskResumeHook
    participant Tracker as TaskSessionTracker

    O->>TT: invoke task(description, prompt, ...)
    Note over TT: session.created fires
    TT->>Tracker: register(childSessionId, parentSessionId)
    Note over TT: chat.message fires (first message)
    TT->>Tracker: updateAgent(sessionId, "fixer")

    alt Task interrupted / provider error
        TT-->>Hook: tool.execute.after (empty output)
        Hook->>Hook: detectInterruptedTask() → true
        Hook->>Hook: parseTaskId() → "ses_abc"
        Hook->>Tracker: markInterrupted("ses_abc")
        Hook->>Tracker: get("ses_abc") → {agent:"fixer", status:"interrupted"}
        Hook->>Hook: buildResumeGuidance()
        Hook-->>O: output += "[task partial state available]\n  task_id: ses_abc\n  agent: @fixer"
        O->>TT: (optionally) task(..., task_id="ses_abc")
    else Task succeeds
        TT-->>Hook: tool.execute.after (output with content)
        Hook->>Hook: detectInterruptedTask() → false
        Hook->>Tracker: markCompleted("ses_abc")
        Hook-->>O: output unchanged
    end

    Note over TT: session.deleted fires
    TT->>Tracker: cleanup(sessionId)
Loading

Reviews (2): Last reviewed commit: "fix: address PR review — agent name, swe..." | Re-trigger Greptile

Comment thread src/index.ts Outdated
Comment thread src/utils/task-session-tracker.ts Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an informational “recovery note” mechanism so the orchestrator can resume interrupted delegated subagent sessions (via task_id) instead of losing partial state when a task tool result is empty/incomplete.

Changes:

  • Introduces a TaskSessionTracker utility to track subagent sessions by task_id and lifecycle status.
  • Adds a new delegate-task-resume hook that detects interrupted/empty task results, extracts task_id, and appends a [task partial state available] note.
  • Updates the orchestrator prompt with a new “Session Resumption” section and wires the new hook + tracker into plugin bootstrap.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/utils/task-session-tracker.ts New tracker for subagent task sessions (status + metadata).
src/utils/task-session-tracker.test.ts Unit tests for tracker behavior (register/status/cleanup/sweep).
src/hooks/delegate-task-resume/detector.ts Detects interrupted outputs and parses task_id from tool output.
src/hooks/delegate-task-resume/guidance.ts Builds the appended recovery note text.
src/hooks/delegate-task-resume/hook.ts Hook wiring to append guidance and update tracker status.
src/hooks/delegate-task-resume/index.ts Barrel exports for the new hook module.
src/hooks/delegate-task-resume/index.test.ts Tests for detector/guidance and hook integration behavior.
src/hooks/index.ts Exports createDelegateTaskResumeHook for plugin composition.
src/index.ts Instantiates tracker + hook; wires into session.created, session.deleted, and tool.execute.after.
src/agents/orchestrator.ts Adds “Session Resumption” section to orchestrator prompt.
src/interview/interview.test.ts Formatting-only change (Biome).
src/utils/system-collapse.test.ts Import ordering/formatting-only change (Biome).
AGENTS.md Updates contributor/agent guidelines and build/test notes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/utils/task-session-tracker.ts Outdated
Comment on lines +8 to +12
* Lifecycle:
* register() — called when the `task` tool fires via tool.execute.after
* markInterrupted() / markCompleted() — update session status
* cleanup() — called on session.deleted event
*/
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment and register() docstring say register() is called from tool.execute.after, but in the current wiring it's called from the session.created event handler (src/index.ts). Please update the documentation to reflect the actual lifecycle to avoid future confusion/misuse.

Copilot uses AI. Check for mistakes.
Comment thread src/hooks/delegate-task-resume/guidance.ts Outdated
Comment thread src/index.ts Outdated
Comment on lines +557 to +562
// Track subagent task sessions for potential resumption.
// Only register child sessions (parentID set) since those are
// subagent sessions created by the task tool.
if (taskSessionTracker && childSessionId && parentSessionId) {
taskSessionTracker.register(childSessionId, parentSessionId);
}
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TaskSessionTracker.register supports storing an agent name for later inclusion in the recovery note, but the plugin currently registers child sessions without passing any agent info. As a result, the resume guidance will almost always omit the intended "Agent: @..." line even when the session is tracked. Consider deriving the agent from session.created properties (e.g. info.title if it contains the agent) or capturing it from the task tool arguments / a subagent.session.created event and passing it into register().

Copilot uses AI. Check for mistakes.
@dhaern
Copy link
Copy Markdown
Collaborator

dhaern commented Apr 23, 2026

Summary

When a delegated subagent task (e.g. @fixer) hits a provider error or is interrupted mid-run, it may return an empty <task_result> even though the underlying session contains useful partial state. Previously, the orchestrator would treat this as a dead task and spawn a fresh session, losing all context.

This PR adds the recovery note mechanism requested in issue #387:

  • When a task tool result is empty/interrupted, a [task partial state available] note is appended containing the task_id
  • The note is informational — it tells the orchestrator a session exists that can be resumed, without auto-resuming
  • The orchestrator's prompt now includes a Session Resumption section explaining task_id usage

What was built

New files

File Purpose
src/utils/task-session-tracker.ts Tracks active subagent sessions by task_id → metadata
src/hooks/delegate-task-resume/detector.ts Detects empty/interrupted task results; parses task_id from XML output
src/hooks/delegate-task-resume/guidance.ts Builds the [task partial state available] recovery note
src/hooks/delegate-task-resume/hook.ts tool.execute.after hook wiring
src/hooks/delegate-task-resume/index.test.ts 20 tests (detector, guidance, hook integration)
src/utils/task-session-tracker.test.ts 10 tests

Modified files

File Change
src/agents/orchestrator.ts Added ## 7. Session Resumption section to prompt — explains task_id, when to resume vs. fresh
src/hooks/index.ts Export createDelegateTaskResumeHook
src/index.ts Instantiate TaskSessionTracker + hook; wire into tool.execute.after, session.created, session.deleted
src/interview/interview.test.ts Biome format fix (existing)
src/utils/system-collapse.test.ts Biome import sort fix (existing)

Key design decisions

  1. Informational, not automatic — the recovery note is injected into the task output so the LLM decides whether to resume; no auto-resume
  2. task_id parsed from XML output — the task tool returns <task_id>ses_...</task_id> in its output; we parse this rather than using sessionID which refers to the orchestrator's session (not the subagent's)
  3. Distinguishes from delegate-task-retry — that hook handles parameter validation errors; this hook handles provider/interrupt errors
  4. Agent-from-tracker — when the tracker has an entry, the note includes @fixer etc. so the orchestrator knows which specialist session it was

Verification

  • 951 tests pass (32 new tests added)
  • Typecheck: tsc --noEmit clean
  • Lint/format: check:ci zero errors, zero warnings
  • Build: bun run build completes successfully

Related

Fixes the core mechanism for #387 (session resumption).

I noticed a couple of things that may be worth checking before merging:

1. task_id parser may not match OpenCode’s real task output

In this PR, parseTaskId() expects XML-style output:

const match = output.match(/<task_id>([^<]+)<\/task_id>/);

But the OpenCode task tool output I see in practice is line-based:

task_id: ses_xxx (for resuming to continue this task if needed)

<task_result>
...
</task_result>

So for the main interrupted-task case:

task_id: ses_xxx (for resuming to continue this task if needed)

<task_result>
</task_result>

parseTaskId() would return undefined, and the recovery note would not be appended.

PR #390 added parseTaskIdFromTaskOutput() in src/utils/task.ts, which parses this real format:

/^task_id:\s*([^\s()]+)(?:\s*(.*)?$/

Maybe this PR should reuse that helper, or support both formats:

task_id: ses_xxx
<task_id>ses_xxx</task_id>

2. This overlaps with the new session manager from #390

#390 already introduced:

  • src/hooks/task-session-manager/
  • src/utils/session-manager.ts
  • src/utils/task.ts

This PR adds a separate:

  • src/hooks/delegate-task-resume/
  • src/utils/task-session-tracker.ts

The concepts are related but currently split across two systems:

That can work, but it may become harder to reason about over time. It might be cleaner to either:

  • integrate this recovery note into the task-session-manager system, or
  • at least reuse the same task-id parser/helper and make the boundary between the two systems explicit.

3. The recovery note says the session “may contain partial state”, but does not verify it

The current hook infers recoverability from the task output being empty/interrupted. It does not inspect:

ctx.client.session.messages({ path: { id: taskId } })

That is probably fine for a first implementation, but it means the note is heuristic. In my local test, the interrupted subagent session really did contain partial state and resuming the same task_id worked, but in other cases the session may be empty.

Maybe the wording should stay conservative (may contain partial state) unless the hook actually checks session messages.

4. Suggested minimal changes

I think the core idea is good. The minimum I would change before merge is:

  1. Reuse or align with parseTaskIdFromTaskOutput() from feat: add resumable specialist session manager #390.
  2. Add a test for the actual OpenCode task output format:

task_id: ses_child1 (for resuming to continue this task if needed)

<task_result>
</task_result>

  1. Keep the note informational, as it already is.
  2. Consider whether this should live under task-session-manager or remain separate with clearly shared helpers.

Overall: good direction, but the parser mismatch may prevent the PR from covering the real interrupted-task case.

@greptile-apps

SweetSophia added a commit that referenced this pull request Apr 23, 2026
- import parseTaskIdFromTaskOutput from src/utils/task.ts
- use shared line-based parser as canonical behavior
- keep XML parsing only as fallback for compatibility
- adds shared helper file to branch so PR #392 aligns with merged #390

Tests: bun test src/hooks/delegate-task-resume/index.test.ts src/utils/task-session-tracker.test.ts
Typecheck: bun run tsc --noEmit
Lint: bun run check:ci
@SweetSophia
Copy link
Copy Markdown
Collaborator Author

#392 (comment)
It has been addressed with the last 2 commits

@dhaern dhaern force-pushed the feat/session-resume-recovery branch from d79add6 to c2da790 Compare May 3, 2026 06:08
@dhaern dhaern changed the title feat: session resumption recovery for delegated subagent tasks (#387) feat: surface interrupted subagent sessions for recovery May 3, 2026
@dhaern
Copy link
Copy Markdown
Collaborator

dhaern commented May 3, 2026

I rebased/reworked this PR on top of the current master and removed the duplicated architecture from the original version.

The PR now integrates the useful recovery behavior directly into the existing task-session-manager:

  • no separate TaskSessionTracker
  • no separate delegate-task-resume hook
  • no unrelated AGENTS.md or formatting changes
  • recovery notes are only appended for task outputs with a parseable task_id and recoverable interruption signals
  • validation/argument errors and missing-session errors are intentionally ignored

Validated with:

  • bun test
  • bun run typecheck
  • bun run lint
  • git diff --check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants