Message history state machine: invariant by construction#170
Merged
seamus-brady merged 1 commit intomainfrom Apr 26, 2026
Merged
Message history state machine: invariant by construction#170seamus-brady merged 1 commit intomainfrom
seamus-brady merged 1 commit intomainfrom
Conversation
The cog has been dying mid-cycle with API 400s like
"messages.40.content.0: unexpected `tool_use_id`" — orphan tool_result
blocks whose matching tool_use was lost. Each fix patches one new
shape; the next code path introduces another.
Root cause: state.messages was a public List(Message) anyone could
list.append to. Provider-API invariants (alternation, leading-user,
tool_use ↔ tool_result pairing) lived only in a reactive sweep at
the LLM boundary that covered one direction of one rule. New
mutations kept introducing new violations.
Structural fix: opaque MessageHistory with one chokepoint (`add`)
that maintains every invariant by construction.
* `add` enforces:
- leading assistant → silently dropped
- consecutive same-role → coalesced (alternation invariant)
- user message containing tool_result → orphans dropped; message
dropped if it empties
* `from_list` (used at ingest from disk / tests) runs the full
sanitisation pipeline including the opposite direction
(synthesise stub tool_results for orphan tool_uses)
* `for_send` returns wire-ready List(Message) — already valid by
construction, no boundary repair needed
* All ~30 mutation sites across cognitive.gleam, cognitive/agents.gleam,
cognitive/safety.gleam, cognitive/llm.gleam now go through the typed
API. Direct list.append on state.messages is impossible.
Removed:
- llm/message_repair.gleam — its repair pipeline is intrinsic to
MessageHistory.from_list. The reactive sweep at the LLM boundary
(repair_orphans_and_warn) is gone.
Tests:
- 16 new tests in test/llm/message_history_test.gleam covering each
invariant with the exact malformations that caused production bugs.
- 2155 passing (gained 16, lost the message_repair_test cases).
Build clean, format clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The cog has been dying mid-cycle with API 400s of the form
messages.40.content.0: unexpected tool_use_id— orphantool_resultblocks whose matchingtool_usewas lost. Each historical fix patched one new shape; the next code path introduced another. This PR ends the family at the root.Root cause:
state.messageswas a publicList(Message). ~30 handlers across the coglist.append-ed directly. Provider-API invariants (alternation, leading-user,tool_use↔tool_resultpairing) lived only in a reactive sweep at the LLM boundary that covered one direction of one rule. New mutations kept introducing new violations.Structural fix: opaque
MessageHistorywith one chokepoint (add) that maintains every invariant by construction. Directlist.appendonstate.messagesis now impossible — the type prevents it.What's enforced where
addsilently drops a leading Assistant;from_liststrips one at ingestaddcoalesces consecutive same-role messagesaddstrips any tool_result whose tool_use_id has no matching tool_use in the prior assistant message; if that empties the message, the message is droppedfrom_listinjects synthetic stub tool_results at ingest (the opposite direction; useful when loading persisted history with a half-completed cycle)for_sendexports a wire-readyList(Message). The reactive boundary sweep (repair_orphans_and_warn+llm/message_repair.gleam) is deleted — there's nothing left for it to repair.What changed
src/llm/message_history.gleam(442 lines) — opaque type,addchokepoint,from_listingest sanitisation,to_list/for_sendexportstest/llm/message_history_test.gleam(260 lines) — 16 tests, one per invariant, each constructing the exact malformation that caused a production bugsrc/agent/cognitive_state.gleam—state.messages: MessageHistory(wasList(Message))src/agent/cognitive.gleam,src/agent/cognitive/agents.gleam,src/agent/cognitive/llm.gleam,src/agent/cognitive/safety.gleam— every directlist.append(state.messages, ...)rewritten throughmessage_history.add/add_user/add_assistant/add_user_textsrc/llm/message_repair.gleam,test/llm/message_repair_test.gleam— module redundant; its pipeline is intrinsic toMessageHistory.from_listTest plan
gleam buildcleangleam formatcleangleam test— 2155 passing (gained 16 from new tests, lost the deletedmessage_repair_testcases since the module is gone)add_user_with_orphan_tool_result_strips_it_testreproduces the operator-reported cog-killergleam runagainst the live agent; confirm no API 400s after a turn that previously dropped the cog🤖 Generated with Claude Code