docs(skill): list_recent_decisions hardening [skip-runtime-e2e]#61
Merged
Merged
Conversation
Two prescriptive additions to prevent agent-runtime hallucinations observed in 2026-05-08 Cursor IDE GUI evidence (#1982): 1. "Invoke the tool directly — do not pre-flight-check." The Cursor agent invented an "auth required" error path before the MCP tool ever returned. The MCP server's response is authoritative; agents shouldn't speculate about auth or descriptor-file state before making the call. 2. "Reporting integrity." If the user prompt asks for a SMOKE_RESULT marker, derive it from the actual tool response, never from a guess at the response shape. The Cursor agent in the captured run wrote a SMOKE_RESULT for an upgrade envelope when the tool actually returned decisions, and vice versa — substituting one shape for the other corrupts both UX integrity and any release-prep test harness reading the marker. Signed-off-by: Saurabh Jain <saurabhjain1592@gmail.com>
…runtime-e2e] Trims the v1.4.0 telemetry section to the project's standard 2-bullet / 5-line compact format at end of release entry. Same wire-shape contract documented; just shorter prose. Sibling to axonflow-openclaw-plugin#120 + axonflow-claude-plugin#74. Signed-off-by: Saurabh Jain <saurabhjain1592@gmail.com>
Signed-off-by: Saurabh Jain <saurabhjain1592@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds prescriptive language to the V1.1 list_recent_decisions surface that prevents the agent-runtime hallucinations observed in the 2026-05-08 Cursor IDE GUI evidence (
axonflow-cursor-plugin#62).What changed
Two prescriptive additions to the SKILL.md / tool description:
"Invoke the tool directly — do not pre-flight-check." During the Cursor IDE GUI evidence run, the LLM agent invented an "Authentication required (-32001)" error path before the MCP tool ever returned. A direct curl reproduction with the same headers Cursor sends returns the correct V1.1 upgrade envelope. The MCP server is authoritative; agents shouldn captures speculate about auth or descriptor-file state before making the call.
"Reporting integrity." The Cursor agent wrote a SMOKE_RESULT marker for an upgrade envelope when the tool actually returned a decisions array, and vice versa. If the user prompt asks for a structured output marker, derive it from the actual tool response — never substitute one shape for the other.
Three-rule DOD
Reference
axonflow-cursor-plugin/runtime-e2e/list-recent-decisions/EVIDENCE.md— captured Cursor IDE LLM-narration gap that motivated this hardening.Skip-runtime-e2e justification
This PR only modifies the agent-side prose / TS-string description that the LLM agent reads when deciding how to interact with the
list_recent_decisionsMCP tool. No platform code, no SDK code, no plugin runtime path is changed — the existingruntime-e2e/list-recent-decisions/test.shcontinues to pass against the live stack (verified yesterday during the V1.1 DoD sweep — seeaxonflow-enterprise/runtime-e2e/v1_1_full_dod_sweep/plugin_host_runtime/).The hardening is to prevent agent-runtime hallucinations (Cursor agent inventing an "auth required" path before invoking the tool) — that is an agent-prompt concern, not a runtime-wire concern.