From 7143515d6e004f6e4d626cb344692ec1668b770b Mon Sep 17 00:00:00 2001 From: mariuszs Date: Thu, 26 Mar 2026 11:54:42 +0100 Subject: [PATCH] =?UTF-8?q?feat:=20add=20model=20escalation=20(sonnet?= =?UTF-8?q?=E2=86=92opus)=20for=20task-group-implementer?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- plugins/maister/CLAUDE.md | 3 + .../maister/agents/task-group-implementer.md | 36 ++++++-- .../implementation-plan-executor/SKILL.md | 88 +++++++++++++++---- .../references/orchestrator-patterns.md | 57 ++++++++++++ 4 files changed, 159 insertions(+), 25 deletions(-) diff --git a/plugins/maister/CLAUDE.md b/plugins/maister/CLAUDE.md index dce7391..405a8c1 100644 --- a/plugins/maister/CLAUDE.md +++ b/plugins/maister/CLAUDE.md @@ -455,6 +455,7 @@ Skills are automatically invoked by Claude when appropriate. Details live in eac |-------|---------|---------| | `codebase-analyzer` | Thin dispatcher: selects agent roles adaptively, launches parallel Explore subagents, delegates report synthesis to `codebase-analysis-reporter` subagent | `skills/codebase-analyzer/SKILL.md` | | `implementer` | Executes plans with **mandatory** standards reading (INDEX.md + implementation-plan.md Standards Compliance section + keyword-triggered) and **test step enforcement** (requires user approval to skip N.1 tests) | `skills/implementer/SKILL.md` | +| `implementation-plan-executor` | Executes implementation plans with two-mode adaptive execution. Mode A (≤5 steps): direct. Mode B (6+ steps): delegates to `task-group-implementer` subagent with **model escalation** (sonnet → opus on BLOCKED) | `skills/implementation-plan-executor/SKILL.md` | | `implementation-verifier` | Read-only QA orchestrator: delegates completeness checks, test execution, code review, and production readiness to specialized subagents; compiles results into verification report | `skills/implementation-verifier/SKILL.md` | | `standards-discover` | Parallel multi-source standards discovery (config, code, docs, PRs/CI) with confidence scoring | `skills/standards-discover/SKILL.md` | | `docs-manager` | Internal engine for doc file operations, INDEX.md generation, CLAUDE.md integration. Not user-invocable — accessed via `docs-operator` agent (Task tool) by init, standards-update, standards-discover | `skills/docs-manager/skill.md` | @@ -601,6 +602,7 @@ Subagents are specialized AI agents invoked by skills and orchestrators. All age | `spec-auditor` | Independent spec audit with senior auditor perspective | orchestrators | `agents/spec-auditor.md` | | `reality-assessor` | Validates work actually solves the problem | implementation-verifier | `agents/reality-assessor.md` | | `implementation-changes-planner` | Creates detailed change plans (no file modifications) | implementer | `agents/implementation-changes-planner.md` | +| `task-group-implementer` | Executes a single task group: writes code, runs tests, reports status. Supports model escalation (sonnet → opus on BLOCKED). | implementation-plan-executor | `agents/task-group-implementer.md` | **See**: Individual `agents/*.md` files for detailed workflows and philosophies. @@ -614,6 +616,7 @@ Subagents are specialized AI agents invoked by skills and orchestrators. All age 6. **Incremental Verification**: Run only new tests after each group, not entire suite 7. **Comprehensive Verification Before Commit**: Run full test suite and create verification report before code review 8. **Task Directory Artifact Anchoring**: ALL workflow artifacts (reports, documentation, screenshots) MUST be saved under the task directory (`.maister/tasks/[type]/[task-name]/`). NEVER save task artifacts to project directories like `docs/`, `src/`, or project root. +9. **Model Escalation**: Subagents start on sonnet; if BLOCKED, automatically retry with opus before asking the user **For detailed workflow documentation, see**: individual skill `SKILL.md` files diff --git a/plugins/maister/agents/task-group-implementer.md b/plugins/maister/agents/task-group-implementer.md index 1578f40..31f1ed1 100644 --- a/plugins/maister/agents/task-group-implementer.md +++ b/plugins/maister/agents/task-group-implementer.md @@ -1,7 +1,7 @@ --- name: task-group-implementer description: Execute a single task group from an implementation plan with continuous standards discovery. Writes code, runs tests, returns structured execution report. Does NOT mark checkboxes - main agent handles progress tracking. -model: inherit +model: sonnet color: green --- @@ -25,6 +25,24 @@ Execute one task group from an implementation plan: write tests, implement code, 4. **Structured reporting**: Return results in expected format for main agent 5. **No progress tracking**: Do NOT mark checkboxes - main agent owns that responsibility +## When You're Stuck + +It is always OK to stop and report that you can't complete the task. Bad work is worse than no work. You will not be penalized for escalating. + +**Report BLOCKED when:** +- The task requires architectural decisions with multiple valid approaches +- You need to understand code beyond what was provided and can't find clarity +- You feel uncertain about whether your approach is correct +- The task involves restructuring existing code in ways the plan didn't anticipate +- You've been reading file after file trying to understand the system without progress + +**Report NEEDS_CONTEXT when:** +- You need information about a specific file, function, or pattern not provided +- The spec is ambiguous about a specific requirement +- You need to know which of two approaches the project prefers + +**How to report:** Set your status to BLOCKED or NEEDS_CONTEXT. Describe specifically what you're stuck on, what you've tried, and what kind of help you need. The coordinator can provide more context, re-dispatch with a more capable model, or break the task into smaller pieces. + ## Decision-Making Framework When facing implementation choices: @@ -139,7 +157,7 @@ Output structured report in expected format (see Output Format section). ```markdown ## Group [N] Execution Report -### Status: [SUCCESS/PARTIAL/FAILED] +### Status: [SUCCESS/SUCCESS_WITH_CONCERNS/PARTIAL/NEEDS_CONTEXT/BLOCKED] ### Steps Completed - [x] N.1 - [brief description] @@ -216,15 +234,21 @@ If you encounter errors during implementation: 1. **Syntax/compile errors**: Fix before proceeding 2. **Missing dependencies**: Note in report, attempt reasonable fix 3. **Unclear requirements**: Make reasonable choice, document in notes -4. **Blocking issues**: Report FAILED status with details +4. **Blocking issues**: Report BLOCKED status with details ### What Triggers Each Status | Status | When to Use | |--------|-------------| | **SUCCESS** | All steps complete, all tests pass | -| **PARTIAL** | Some steps complete, tests failing, or minor issues | -| **FAILED** | Blocking issue prevents completion, needs main agent intervention | +| **SUCCESS_WITH_CONCERNS** | All steps complete, but flagging doubts (e.g., file growing too large, uncertain edge case) | +| **PARTIAL** | Some steps complete, tests failing, or minor issues — you made progress but couldn't finish | +| **NEEDS_CONTEXT** | Missing information that wasn't provided. You know what you need — specify it precisely | +| **BLOCKED** | Cannot complete due to complexity, unclear architecture, or conflicting requirements. Describe what you're stuck on and what you've tried | + +**BLOCKED vs PARTIAL:** Use BLOCKED when the problem is reasoning/understanding (you don't know HOW), not execution (you know how but hit errors). BLOCKED triggers model escalation; PARTIAL triggers main agent investigation. + +**NEEDS_CONTEXT vs BLOCKED:** Use NEEDS_CONTEXT when you can name the specific missing information. Use BLOCKED when you can't articulate a specific ask — you're stuck. ## Integration @@ -279,4 +303,4 @@ During step N.3, realize auth pattern needed → Check INDEX.md → Find and rea ### Scenario 4: Blocking Issue -Can't proceed due to missing dependency or unclear spec → Report FAILED with clear explanation → Main agent will use AskUserQuestion to decide path forward +Can't proceed due to missing dependency or unclear spec → Report BLOCKED with clear explanation → Main agent will escalate model or use AskUserQuestion to decide path forward diff --git a/plugins/maister/skills/implementation-plan-executor/SKILL.md b/plugins/maister/skills/implementation-plan-executor/SKILL.md index 0443bba..25af67e 100644 --- a/plugins/maister/skills/implementation-plan-executor/SKILL.md +++ b/plugins/maister/skills/implementation-plan-executor/SKILL.md @@ -131,12 +131,42 @@ For each task group: 5. Use `TaskUpdate` to set the group task to `status: "completed"` with `metadata: {completed_at, tests_passed, files_modified, standards_applied}` -6. **If subagent reports failure**: - - Do NOT auto-rollback (see Critical Principle in CLAUDE.md) - - Assess: config issue? test setup? logic error? - - Use AskUserQuestion for recovery path +6. **Process subagent status**: + + **SUCCESS / SUCCESS_WITH_CONCERNS**: Proceed normally. If concerns flagged, log them in work-log. + + **PARTIAL**: Subagent made progress but couldn't finish. Assess root cause: + - Test failures → analyze, apply fix if obvious, re-run + - If unclear → AskUserQuestion with recovery options - Keep group task as `in_progress` with `metadata: {failed_at, failure_reason}` + **NEEDS_CONTEXT**: Subagent needs specific information. Read what they're asking for, provide it, and re-dispatch with the **same model** (sonnet): + - Extract the specific ask from subagent output + - Gather the requested context (read files, check standards, etc.) + - Re-dispatch task-group-implementer with original prompt + additional context section + - No model change — the problem is missing data, not reasoning + + **BLOCKED**: Subagent is stuck on complexity/reasoning. **Escalate model**: + - Re-dispatch task-group-implementer with `model: opus` parameter + - Include the original prompt + subagent's BLOCKED explanation as additional context + - If opus also returns BLOCKED → stop and use AskUserQuestion: + ``` + Question: "Task group [N] blocked even with escalated model. [Brief reason from subagent]. How to proceed?" + Header: "Model Escalation Failed" + Options: + - "Break into smaller pieces" - Split this group and retry + - "Provide more context" - I'll give additional information + - "Skip this group" - Mark as skipped, continue + - "Stop implementation" - Pause for investigation + ``` + - Log escalation in work-log: "Group N: escalated sonnet → opus. Reason: [from BLOCKED status]" + + **Key rules:** + - Never retry the same model without changes + - NEEDS_CONTEXT → same model (missing data) + - BLOCKED → opus (reasoning/complexity) + - Opus BLOCKED → always ask user + ## Continuous Standards Discovery **Philosophy**: Standards are discovered when relevant, not memorized upfront. @@ -237,6 +267,34 @@ You have access to `.maister/docs/INDEX.md` for continuous standards discovery. [See Subagent Output Format section] ``` +### Re-dispatch on BLOCKED (Model Escalation) + +When re-dispatching with opus after BLOCKED: + +````markdown +## Task: Execute Task Group [N] (Escalated) + +**Previous attempt status**: BLOCKED +**Previous attempt explanation**: [paste BLOCKED explanation from subagent] +**Model**: opus (escalated from sonnet) + +### Task Group Content +[Same as original dispatch] + +### Specification Excerpt +[Same as original dispatch] + +### Standards +[Same as original dispatch] + +### Additional Context +[Any context gathered based on the BLOCKED explanation] + +### Requirements +[Same as original dispatch, plus:] +5. You are running on a more capable model because the previous attempt was blocked. Use your additional reasoning capability to work through the complexity described above. +```` + ## Subagent Output Format The task-group-implementer returns structured output: @@ -244,7 +302,7 @@ The task-group-implementer returns structured output: ```markdown ## Group [N] Execution Report -### Status: [SUCCESS/PARTIAL/FAILED] +### Status: [SUCCESS/SUCCESS_WITH_CONCERNS/PARTIAL/NEEDS_CONTEXT/BLOCKED] ### Steps Completed - [x] N.1 - [description] @@ -355,22 +413,14 @@ After each task group: ### Subagent Failure (Mode B) -If task-group-implementer reports failure: +Subagent status handling is defined in Mode B step 6 above. Additional rules: 1. **Do NOT auto-rollback** - User-confirmed rollback only -2. **Analyze root cause** from subagent output -3. **Check for easy fixes**: config issues, missing dependencies, test setup -4. **Use AskUserQuestion**: - ``` - Question: "Group [N] implementation failed: [brief reason]. How to proceed?" - Header: "Failure" - Options: - - "Try suggested fix" - [if easy fix identified] - - "Retry group" - Re-invoke subagent - - "Complete manually" - Switch to direct execution for this group - - "Rollback changes" - Revert this group's changes - - "Stop" - Pause for investigation - ``` +2. **Model escalation is automatic** - BLOCKED → opus happens without asking user +3. **User involvement triggers**: + - Opus returns BLOCKED (end of escalation chain) + - PARTIAL status with unclear root cause + - Max 1 NEEDS_CONTEXT re-dispatch per group (if still NEEDS_CONTEXT after providing context → AskUserQuestion) ### Test Failure diff --git a/plugins/maister/skills/orchestrator-framework/references/orchestrator-patterns.md b/plugins/maister/skills/orchestrator-framework/references/orchestrator-patterns.md index 36ab08c..7b231cc 100644 --- a/plugins/maister/skills/orchestrator-framework/references/orchestrator-patterns.md +++ b/plugins/maister/skills/orchestrator-framework/references/orchestrator-patterns.md @@ -324,3 +324,60 @@ If prerequisites missing, use AskUserQuestion: "Start from Phase 1", "Specify di | User chooses "Proceed with known issues" | Proceed with warning logged | | Max iterations (3) reached | Ask user how to proceed | | Critical issues remain unresolved | **MUST NOT proceed** — require user approval first | + +--- + +## 7. Model Escalation Pattern + +When a subagent reports BLOCKED status, the coordinator can re-dispatch with a more capable model. This is an automatic escalation — no user confirmation needed for the first tier. + +### Escalation Chain + +```` +sonnet (default) → BLOCKED → opus → BLOCKED → AskUserQuestion +```` + +### Status-to-Action Mapping + +| Subagent Status | Action | Model Change | +|----------------|--------|--------------| +| SUCCESS / SUCCESS_WITH_CONCERNS | Proceed | None | +| PARTIAL | Investigate, fix if obvious, ask user if unclear | None | +| NEEDS_CONTEXT | Provide requested context, re-dispatch | Same model | +| BLOCKED | Re-dispatch with more capable model | sonnet → opus | + +### Key Rules + +1. **Never retry same model without changes** — if BLOCKED, something must change (model, context, or task scope) +2. **NEEDS_CONTEXT ≠ BLOCKED** — missing data → same model; reasoning limit → higher model +3. **End of chain → user** — when the most capable model is BLOCKED, always AskUserQuestion +4. **Log escalations** — record in work-log for visibility and cost tracking +5. **No automatic rollback** — BLOCKED does not mean "undo what was done" + +### When to Apply + +This pattern applies to any agent that: +- Has `model: sonnet` in frontmatter (not `inherit` or `opus`) +- Implements the enriched status protocol (SUCCESS/SUCCESS_WITH_CONCERNS/PARTIAL/NEEDS_CONTEXT/BLOCKED) +- Is dispatched by a coordinator skill that processes the output + +Currently applies to: +- `task-group-implementer` (dispatched by `implementation-plan-executor`) + +### Re-dispatch Prompt Structure + +When escalating, the coordinator includes: +- Original task prompt (unchanged) +- Previous attempt's BLOCKED explanation +- Any additional context gathered +- Note that this is an escalated dispatch with a more capable model + +### Anti-Patterns + +| Anti-Pattern | Why It's Wrong | +|--------------|----------------| +| Retrying same model on BLOCKED | Wastes tokens, same result | +| Escalating on NEEDS_CONTEXT | Problem is data, not reasoning — provide context first | +| Escalating on PARTIAL | Subagent made progress — investigate the specific failure | +| Skipping user when opus is BLOCKED | End of chain, user must decide next step | +| Auto-rollback on BLOCKED | BLOCKED means "stuck", not "failed" — work may be partially valid |