Context
TP-076/077/078 added supervisor recovery tools (orch_retry_task, orch_skip_task, orch_force_merge) with source-based tests that verify code patterns exist. An external code review identified that these tests don't validate end-to-end behavior — they check for string presence in source, not actual state mutations from running the tool handlers.
This is why several bugs (#1-5 in the review) passed the test suite.
What's needed
Add behavioral tests that:
- Call the actual tool handler functions (or close equivalents) with mock batch state
- Assert the resulting state mutations (counter changes, status transitions, blocked set updates)
- For
orch_force_merge: verify that resume after force-merge actually triggers a git merge (integration test level)
Files affected
extensions/tests/supervisor-recovery-tools.test.ts — add behavioral tests alongside existing source-based tests
extensions/tests/supervisor-force-merge.test.ts — add behavioral merge test
extensions/tests/supervisor-alerts.test.ts — add behavioral alert delivery test
Priority
Medium — the source-based tests provide coverage for code existence, but behavioral tests are needed to catch logic bugs like stale references and missing recomputations.
Context
TP-076/077/078 added supervisor recovery tools (
orch_retry_task,orch_skip_task,orch_force_merge) with source-based tests that verify code patterns exist. An external code review identified that these tests don't validate end-to-end behavior — they check for string presence in source, not actual state mutations from running the tool handlers.This is why several bugs (#1-5 in the review) passed the test suite.
What's needed
Add behavioral tests that:
orch_force_merge: verify that resume after force-merge actually triggers a git merge (integration test level)Files affected
extensions/tests/supervisor-recovery-tools.test.ts— add behavioral tests alongside existing source-based testsextensions/tests/supervisor-force-merge.test.ts— add behavioral merge testextensions/tests/supervisor-alerts.test.ts— add behavioral alert delivery testPriority
Medium — the source-based tests provide coverage for code existence, but behavioral tests are needed to catch logic bugs like stale references and missing recomputations.