Skip to content

docs(verify): add Step 1 Orient — think out loud on claim + evidence#22

Merged
bensonwong merged 6 commits intomainfrom
claude/verify-thinking-step-Dg8iX
Apr 13, 2026
Merged

docs(verify): add Step 1 Orient — think out loud on claim + evidence#22
bensonwong merged 6 commits intomainfrom
claude/verify-thinking-step-Dg8iX

Conversation

@bensonwong
Copy link
Copy Markdown
Contributor

Every /verify call now opens with a short CoT preamble naming the
claim(s) and the evidence (or primary sources to look up if none was
provided). Gives users scope and progress clarity before the pipeline
runs, and surfaces reasoning rather than jumping straight into prepare.

Existing Prepare/Respond/Verify steps renumbered to 2/3/4 and inline
cross-references updated.

https://claude.ai/code/session_01LqnxD512JufmQd1VJn5F1P

claude added 2 commits April 13, 2026 12:34
Every /verify call now opens with a short CoT preamble naming the
claim(s) and the evidence (or primary sources to look up if none was
provided). Gives users scope and progress clarity before the pipeline
runs, and surfaces reasoning rather than jumping straight into prepare.

Existing Prepare/Respond/Verify steps renumbered to 2/3/4 and inline
cross-references updated.

https://claude.ai/code/session_01LqnxD512JufmQd1VJn5F1P
prepare latency is asymmetric: URLs and office files take ~20-30s,
while images and PDFs finish in ~0.5s after upload. Previously, Step 1
told the model to write the preamble "before running any command,"
which serialized the CoT gate ahead of slow network fetches and made
the user wait on silence during the ~30s prepare window.

Now the Orient step instructs the model to fire prepare in the SAME
assistant turn as the preamble — text streams to the user first, so
they read the orientation while the slow prepare is already running.
Added a latency table and three examples (slow URL, fast PDF, mixed)
to make the parallelism rule concrete.

https://claude.ai/code/session_01LqnxD512JufmQd1VJn5F1P
@claude
Copy link
Copy Markdown

claude bot commented Apr 13, 2026

PR Review: docs(verify): add Step 1 Orient

Overall: This is a well-scoped, focused change that meaningfully improves the UX of /verify by surfacing reasoning upfront and making parallelism intent explicit. The single-file modification is clean and the renumbering is consistently applied throughout.


What works well

  • Parallelism guidance is concrete and actionable. The latency table + the rule to fire prepare in the same turn as the preamble directly addresses a real UX problem (users staring at a blank screen for 20–30s on URL evidence). This is the right call.
  • Three-scenario examples (slow, fast, mixed, no evidence) cover the real cases users will encounter and give the model clear behavioral templates to pattern-match against.
  • Cross-reference audit is complete. All four internal step references were correctly updated:
    • Skip to Step 3Skip to Step 4
    • Step 1 triage tableStep 2 triage table
    • same as Step 1same as Step 2 (×2) ✓
  • "Not a confirmation checkpoint" is an important guardrail — explicitly forbidding the model from pausing for approval after the preamble prevents a common failure mode where a CoT step becomes a blocking round-trip.

Minor issues / suggestions

1. Hardcoded latency estimates may go stale

| URLs (web pages) | ~20–30s per URL |
| Office files (`.docx`, `.xlsx`, `.pptx`) | ~20–30s per file |

These numbers are baked in as constants. If prepare performance changes (faster infra, caching, etc.), the guidance could mislead the model into treating fast evidence as slow. Consider framing these as "can take up to ~30s" or adding a note that they reflect current typical performance, so a future editor knows to revisit.

2. "CoT gate" jargon in a model-facing doc

"This is a CoT gate for user clarity and progress"

"CoT" is meaningful to AI researchers but could be opaque in a skill file that the model uses as instructions. Since the sentence already explains the purpose ("user clarity and progress"), consider dropping "CoT gate" or spelling it out: "Chain-of-Thought step."

3. --html skip phrasing could be tightened

In Step 4, the comment:

If you skipped Step 2–3 because the HTML already had citation markers (Step 2 triage table: "Existing verified HTML"), use --html instead

Reading "skipped Step 2–3" while in "Step 4" is slightly confusing — the reader has to mentally map back. Spelling it out as "skipped the Prepare and Respond steps" would be unambiguous.

4. Example claim may become stale

"Claude 4.6 Sonnet tops SWE-bench at 77.2%"

This is a good illustrative example, but benchmark numbers get superseded quickly. A more time-stable placeholder (e.g., a legislative or scientific claim) would age better. Low priority — it's just an example.


No bugs found

All step numbers, table entries, and verify --html references are correctly renumbered. The new step doesn't introduce any behavioral contradictions with the existing pipeline logic.


Verdict: Ready to merge with the latency-estimate and jargon points treated as optional follow-up. None are blockers.

bensonwong and others added 4 commits April 13, 2026 07:53
Establishes agent guidance for this repo following the same progressive-
disclosure pattern as packages/deepcitation. Includes always-applicable
invariants (product name, generic-examples rule, auth command) and a
guidance router pointing to SKILL.md and the rules/ files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Consistent with auth.md living under rules/. SKILL.md reference updated
to match (rules/cloud-sandbox-constraints.md).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The CLI renamed the login subcommand to auth. Updates auth.md recovery
steps and the --key flag example accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ange split

- Replace specific model/benchmark claim ("Claude 4.6 Sonnet...SWE-bench")
  with a generic placeholder per the AGENTS.md invariant: use domain-
  neutral examples that don't carry unintended product implications
- Clarify Step 4 --html skip note: "skipped the Prepare and Respond steps"
  instead of "skipped Step 2–3" (unambiguous without mental mapping)
- Add stale Cowork proxy clearing snippet before prepare invocations
- Add page-range split algorithm for parallel agent dispatch (eliminates
  f-fabrication by restricting each agent to its own page window)
- Add unique citation ID, Format 2, and bold=k hard rules to agent prompts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bensonwong bensonwong merged commit 827eab7 into main Apr 13, 2026
@bensonwong bensonwong deleted the claude/verify-thinking-step-Dg8iX branch April 13, 2026 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants