Skip to content

docs(skill): list_recent_decisions hardening [skip-runtime-e2e]#61

Merged
saurabhjain1592 merged 3 commits into
mainfrom
docs/skill-list-decisions-llm-hardening
May 8, 2026
Merged

docs(skill): list_recent_decisions hardening [skip-runtime-e2e]#61
saurabhjain1592 merged 3 commits into
mainfrom
docs/skill-list-decisions-llm-hardening

Conversation

@saurabhjain1592
Copy link
Copy Markdown
Member

@saurabhjain1592 saurabhjain1592 commented May 8, 2026

Adds prescriptive language to the V1.1 list_recent_decisions surface that prevents the agent-runtime hallucinations observed in the 2026-05-08 Cursor IDE GUI evidence (axonflow-cursor-plugin#62).

What changed

Two prescriptive additions to the SKILL.md / tool description:

  1. "Invoke the tool directly — do not pre-flight-check." During the Cursor IDE GUI evidence run, the LLM agent invented an "Authentication required (-32001)" error path before the MCP tool ever returned. A direct curl reproduction with the same headers Cursor sends returns the correct V1.1 upgrade envelope. The MCP server is authoritative; agents shouldn captures speculate about auth or descriptor-file state before making the call.

  2. "Reporting integrity." The Cursor agent wrote a SMOKE_RESULT marker for an upgrade envelope when the tool actually returned a decisions array, and vice versa. If the user prompt asks for a structured output marker, derive it from the actual tool response — never substitute one shape for the other.

Three-rule DOD

  • Rule 0 — Runtime proof: ride-along on the existing list_recent_decisions runtime-e2e — no behavior change in the platform or the plugin runtime, only agent-side framing.
  • Rule 1 — Self-review: small textual change, both sentences read cleanly. No code path affected.
  • Rule 2 — No deferred bugs: none in path.

Reference

  • axonflow-cursor-plugin/runtime-e2e/list-recent-decisions/EVIDENCE.md — captured Cursor IDE LLM-narration gap that motivated this hardening.

Skip-runtime-e2e justification

This PR only modifies the agent-side prose / TS-string description that the LLM agent reads when deciding how to interact with the list_recent_decisions MCP tool. No platform code, no SDK code, no plugin runtime path is changed — the existing runtime-e2e/list-recent-decisions/test.sh continues to pass against the live stack (verified yesterday during the V1.1 DoD sweep — see axonflow-enterprise/runtime-e2e/v1_1_full_dod_sweep/plugin_host_runtime/).

The hardening is to prevent agent-runtime hallucinations (Cursor agent inventing an "auth required" path before invoking the tool) — that is an agent-prompt concern, not a runtime-wire concern.

Two prescriptive additions to prevent agent-runtime hallucinations
observed in 2026-05-08 Cursor IDE GUI evidence (#1982):

1. "Invoke the tool directly — do not pre-flight-check." The Cursor
   agent invented an "auth required" error path before the MCP tool
   ever returned. The MCP server's response is authoritative; agents
   shouldn't speculate about auth or descriptor-file state before
   making the call.

2. "Reporting integrity." If the user prompt asks for a SMOKE_RESULT
   marker, derive it from the actual tool response, never from a
   guess at the response shape. The Cursor agent in the captured run
   wrote a SMOKE_RESULT for an upgrade envelope when the tool actually
   returned decisions, and vice versa — substituting one shape for
   the other corrupts both UX integrity and any release-prep test
   harness reading the marker.

Signed-off-by: Saurabh Jain <saurabhjain1592@gmail.com>
…runtime-e2e]

Trims the v1.4.0 telemetry section to the project's standard 2-bullet / 5-line compact format at end of release entry. Same wire-shape contract documented; just shorter prose. Sibling to axonflow-openclaw-plugin#120 + axonflow-claude-plugin#74.

Signed-off-by: Saurabh Jain <saurabhjain1592@gmail.com>
@saurabhjain1592 saurabhjain1592 changed the title docs(skill): tighten list_recent_decisions agent guidance docs(skill): list_recent_decisions hardening [skip-runtime-e2e] May 8, 2026
Signed-off-by: Saurabh Jain <saurabhjain1592@gmail.com>
@saurabhjain1592 saurabhjain1592 merged commit 3c990f6 into main May 8, 2026
8 checks passed
@saurabhjain1592 saurabhjain1592 deleted the docs/skill-list-decisions-llm-hardening branch May 8, 2026 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant