chore(prompts): drop vestigial [MUTATION COMPLETE] / [SUMMARY] protocol by KE7 · Pull Request #38 · KE7/helix

KE7 · 2026-05-20T19:07:43Z

Summary

The agent prompt templates instruct each coding agent (Claude Code / Codex / Cursor / Gemini / opencode) to emit a sentinel — [MUTATION COMPLETE], [MERGE COMPLETE], or [SEED GENERATION COMPLETE] — when it's done. HELIX never parses any of these sentinels. Subprocess exit is the actual termination signal; every backend handles its own stop logic internally.

Separately, mutator.parse_mutation_summary scans for [SUMMARY]…[END SUMMARY] key/value blocks that no prompt instructs the agent to emit and no production code path invokes.

So the apparatus is vestigial on both ends:

	Asks the agent for it?	Parses it?	Wired to anything?
`[MUTATION COMPLETE]`	yes	no	no
`[MERGE COMPLETE]`	yes	no	no
`[SEED GENERATION COMPLETE]`	yes	no	no
`[SUMMARY]…[END SUMMARY]`	no	yes (`parse_mutation_summary`)	no — dead code

What changed

src/helix/mutator.py: drop the trailing "print [MUTATION COMPLETE]" / "print [SEED GENERATION COMPLETE]" lines from AUTONOMOUS_SYSTEM_PROMPT, MUTATION_PROMPT_TEMPLATE, SEEDLESS_INIT_PROMPT_TEMPLATE. Delete parse_mutation_summary (zero production callers).
src/helix/merger.py: drop the trailing "print [MERGE COMPLETE]" line from MERGE_PROMPT_TEMPLATE.
tests/unit/test_semlog.py: delete — sole consumer of parse_mutation_summary.
tests/unit/test_mutator.py, tests/unit/test_mutator_seedless.py, tests/unit/test_merger.py: drop three prompt-substring assertions that pinned the presence of the removed sentinels.

Net: -221 lines / +1 line. Every editing instruction in every prompt is preserved verbatim — the only text removed from prompts is the "print sentinel X when done" instruction and the sentinel itself.

Why this is safe

Termination model is unchanged because nothing in HELIX ever depended on the sentinel:

Each agent backend has its own internal stop logic (model finish_reason, opencode's step_finish, Codex CLI exit, etc.). HELIX waits on subprocess exit and treats whatever was written to stdout/stderr as the captured transcript.
parse_mutation_summary was already returning {} in practice because the prompt never asked the agent for [SUMMARY]…[END SUMMARY] blocks. Removing the parser changes no observable behaviour.
Mutator/merger output processing reads agent stdout for tool-call counting (_count_*_tool_events), session-id capture, and rate-limit detection — none of which look for the sentinel.

Test plan

uv run pytest tests/unit/ -q — 851 passed (873 → 851 after removing 22 sentinel-protocol assertions across test_semlog.py and three substring checks).
uv run mypy --strict src/helix/ — clean (29 source files).
grep -rn "MUTATION COMPLETE\|MERGE COMPLETE\|SEED GENERATION COMPLETE\|parse_mutation_summary" src/ tests/ — zero matches.
CI runs the same two commands on Python 3.11 and 3.12.

Independent of PR #37

This PR is off origin/main directly, not stacked on the cache PR. Either can land in either order.

🤖 Generated with Claude Code

The four prompt templates (``AUTONOMOUS_SYSTEM_PROMPT``, ``MUTATION_PROMPT_TEMPLATE``, ``SEEDLESS_INIT_PROMPT_TEMPLATE``, ``MERGE_PROMPT_TEMPLATE``) instructed the agent to emit a ``[MUTATION COMPLETE]`` / ``[MERGE COMPLETE]`` / ``[SEED GENERATION COMPLETE]`` sentinel when finished. HELIX never parsed any of those sentinels — subprocess exit is the actual stop signal and every backend handles termination internally. Separately, ``mutator.parse_mutation_summary`` scanned for ``[SUMMARY]...[END SUMMARY]`` key/value blocks that no prompt asked the agent to emit and no production code path called. Dead code, dead tests, dead protocol on both ends. Removed: - the trailing "print this completion marker" sentence + sentinel line from all four prompt templates (editing instructions preserved). - ``mutator.parse_mutation_summary`` (zero production callers). - ``tests/unit/test_semlog.py`` (sole consumer of the parser). - three prompt-substring assertions in ``test_mutator.py``, ``test_mutator_seedless.py``, and ``test_merger.py`` that pinned the presence of the removed sentinel strings. 851 unit tests pass (873 → 851 after dropping 22 sentinel-protocol assertions); ``mypy --strict src/helix/`` clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Mirror GEPA O.A.'s ``_build_reflection_prompt_template`` accumulator pattern (``gepa/optimize_anything.py:501-596``) in ``build_mutation_prompt`` and ``build_merge_prompt``: each section is appended only when its content is non-empty, instead of rendering a placeholder string like ``"(no additional background provided)"`` / ``"(no scores recorded)"`` / ``"(no diff — candidates are identical)"`` / ``"(no evaluation data)"`` that taught the agent nothing. Sections now optional in ``build_mutation_prompt``: - ``## Objective`` — when ``objective`` is empty. - ``## Current Evaluation Scores`` — when ``eval_result.scores`` is empty. - ``## Diagnostics`` — when neither ``per_example_side_info`` nor ``side_info`` is populated (already conditional pre-PR). - ``## Evaluator Notes`` — when ``asi.log`` is empty (already conditional pre-PR). - ``## Evaluator Output`` — when the evaluator succeeded and both stdout/stderr are empty. Failed evaluator (non-zero ``_returncode``) still emits the section with ``(no stdout)`` / ``(no stderr)`` placeholders, because the agent needs to know the failure produced no output to inspect (a meaningful diagnostic on its own). Partial coverage now renders only the stream that has content instead of padding the empty one. - ``### Extra Evaluator Info`` — when no free-form ASI keys (already conditional pre-PR). - ``## Background / Context`` — when ``background`` is None/empty. Sections now optional in ``build_merge_prompt``: - ``## Objective`` — when ``objective`` is empty. - ``## Candidate A Strengths`` — when ``eval_result_a`` is None. - ``## Candidate B Strengths`` — when ``eval_result_b`` is None. - ``## Diff (B relative to A)`` — when the diff is empty after stripping. - ``## Background / Context`` — when ``background`` is None/empty. Always emitted: - ``AUTONOMOUS_SYSTEM_PROMPT`` (the four "Task instructions" bullets). - ``## Your Task`` (the editing-instruction block). - ``## Turn Budget`` — when ``max_turns`` is provided (already conditional pre-PR). Removed ``MUTATION_PROMPT_TEMPLATE`` and ``MERGE_PROMPT_TEMPLATE`` constants since the prompt is now assembled dynamically. Extracted new helpers ``_render_scores_section``, ``_render_extra_asi``, ``_render_diagnostics`` for consistency with the existing ``_render_evaluator_notes`` / ``_render_evaluator_output_fallback``. Tests updated: - ``test_default_background_when_none`` → ``test_background_section_omitted_when_none`` in both ``test_mutator.py`` and ``test_merger.py``. - ``test_no_scores_fallback`` → ``test_scores_section_omitted_when_empty``. - ``test_handles_none_eval_results`` → ``test_strengths_sections_omitted_when_eval_results_none``. - ``test_empty_diff_shows_fallback`` → ``test_diff_section_omitted_when_empty``. Each updated test now asserts the section is *absent* from the prompt (positive verification of the new behaviour) AND that the previous placeholder string is also absent (so a future regression that reintroduces the placeholder can't pass). 851 unit tests pass; ``mypy --strict src/helix/`` clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…-backend doc Two related changes to ``_turn_budget_section``: 1. **Article-agreement fix.** Pre-fix the section always rendered ``"You have a {N}-turn limit"``, which is ungrammatical for the 8 / 11 / 18 / 80s cases ("a 8-turn", "a 11-turn", "a 18-turn", "a 80-turn"...). New ``_indefinite_article(n)`` helper picks ``"a"`` vs ``"an"`` based on the spoken pronunciation of the leading digit group within HELIX's realistic max-turns range (1 ≤ n ≤ ~1000). 2. **Cross-backend enforcement docs.** ``--max-turns N`` is passed to the Claude Code CLI by ``_build_cli_args`` (``mutator.py:731-732``) and triggers hard subprocess-level enforcement via Claude's runtime (the ``subtype="error_max_turns"`` response handled at ``mutator.py:1667-1669``). None of the other installed backends (``codex``, ``cursor``, ``gemini``, ``opencode``) expose an equivalent CLI flag — verified against their ``--help`` output, none has ``--max-turns`` / ``--max-iterations`` / ``--turn-limit`` / ``--limit``. For those backends the in-prompt ``## Turn Budget`` section is a soft hint only; whether the agent self-honors it is entirely up to its own behaviour. The section is still emitted for every backend (soft hints have some value), but the docstring now states the enforcement asymmetry explicitly so callers depending on hard caps know to use the ``claude`` backend or add subprocess-level mechanisms (wall-clock timeout, sandbox limits) themselves. Tests: new ``TestTurnBudgetArticleAgreement`` covers (a) consonant- leading numbers using ``"a"``, (b) vowel-leading numbers (8, 11, 18, 80s, 800s) using ``"an"``, and (c) ``max_turns=None`` returning empty. 854 unit tests pass (851 → 854); ``mypy --strict src/helix/`` clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…bution When a common ancestor is available, the merge prompt now renders TWO labelled diff sections — ``git diff ancestor..candidate_a`` and ``git diff ancestor..candidate_b`` — instead of the single ``git diff candidate_a..candidate_b``. The agent can read off each parent's contribution directly rather than inferring three-way info from a two-way comparison. This is the file-hunk-level analogue of GEPA's component-wise attribution at ``gepa/proposer/merge.py:163-191``: if pred_anc == pred_id1 or pred_anc == pred_id2: # one parent didn't change this predictor → take the other one's ... elif pred_anc != pred_id1 and pred_anc != pred_id2: # both diverged → tiebreak by score ... GEPA's algorithm has named components. HELIX has a worktree, so we can't pick "component X from parent Y" deterministically — but feeding the agent the three-way diff structure GEPA's algorithm uses gives it the same shape of attribution information for free-form file edits. Behavioural changes: - ``merge()`` gains an optional ``ancestor: Candidate | None = None``. When provided, computes both ancestor-relative diffs and passes them to the prompt builder. When ``None``, falls back to the legacy single A↔B diff. - ``build_merge_prompt`` gains three optional keyword-only parameters: ``ancestor_id``, ``diff_a_from_ancestor``, ``diff_b_from_ancestor``. Two-diff form requires all three; any half-configured combination defensively falls back to the single A↔B path. - A dedicated ``MERGE_TASK_INSTRUCTIONS_TWO_DIFF`` task block accompanies the two-diff form. It explicitly tells the agent that Candidate A's contribution is already in the working tree (so it doesn't re-apply it) and that B's contribution is what needs to be brought in. Single-diff form retains the legacy task framing unchanged. - ``evolution._run_evolution_impl`` resolves the ancestor candidate from the frontier's append-only candidate map (using the public ``frontier.candidates`` view) and passes it to ``merge()``. When the ancestor isn't resolvable (defensive: lineage / frontier drift), logs a warning that names the merge_id and falls back to single-diff. Tests (6 new in ``test_merger.py``): - ``test_emits_two_ancestor_relative_sections`` — happy path renders both ancestor-relative sections and omits the legacy A↔B header. - ``test_two_diff_form_uses_two_diff_task_block`` / ``test_single_diff_form_uses_single_diff_task_block`` — regression pins on which task-instruction block accompanies which diff form. - ``test_single_diff_fallback_when_ancestor_missing`` / ``test_single_diff_fallback_when_ancestor_id_only`` — backward compat plus the half-configured-caller defensive fallback. - ``test_two_diff_form_omits_empty_side`` — one ancestor diff empty → only the populated side renders. - ``test_ancestor_triggers_two_diff_form`` / ``test_no_ancestor_uses_single_diff_form`` — ``merge()``-level assertions on the exact ``get_diff`` call sequence (two ancestor-anchored calls vs one A↔B call) and the resulting prompt content. 862 unit tests pass (860 → 862); ``mypy --strict src/helix/`` clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

KE7 and others added 4 commits May 20, 2026 12:07

KE7 merged commit 394a1b7 into main May 30, 2026
2 checks passed

KE7 deleted the chore/drop-vestigial-mutation-summary branch May 30, 2026 23:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(prompts): drop vestigial [MUTATION COMPLETE] / [SUMMARY] protocol#38

chore(prompts): drop vestigial [MUTATION COMPLETE] / [SUMMARY] protocol#38
KE7 merged 4 commits into
mainfrom
chore/drop-vestigial-mutation-summary

KE7 commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KE7 commented May 20, 2026

Summary

What changed

Why this is safe

Test plan

Independent of PR #37

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant