docs(verify): add Step 1 Orient — think out loud on claim + evidence by bensonwong · Pull Request #22 · DeepCitation/skills

bensonwong · 2026-04-13T12:41:08Z

Every /verify call now opens with a short CoT preamble naming the
claim(s) and the evidence (or primary sources to look up if none was
provided). Gives users scope and progress clarity before the pipeline
runs, and surfaces reasoning rather than jumping straight into prepare.

Existing Prepare/Respond/Verify steps renumbered to 2/3/4 and inline
cross-references updated.

https://claude.ai/code/session_01LqnxD512JufmQd1VJn5F1P

Every /verify call now opens with a short CoT preamble naming the claim(s) and the evidence (or primary sources to look up if none was provided). Gives users scope and progress clarity before the pipeline runs, and surfaces reasoning rather than jumping straight into prepare. Existing Prepare/Respond/Verify steps renumbered to 2/3/4 and inline cross-references updated. https://claude.ai/code/session_01LqnxD512JufmQd1VJn5F1P

prepare latency is asymmetric: URLs and office files take ~20-30s, while images and PDFs finish in ~0.5s after upload. Previously, Step 1 told the model to write the preamble "before running any command," which serialized the CoT gate ahead of slow network fetches and made the user wait on silence during the ~30s prepare window. Now the Orient step instructs the model to fire prepare in the SAME assistant turn as the preamble — text streams to the user first, so they read the orientation while the slow prepare is already running. Added a latency table and three examples (slow URL, fast PDF, mixed) to make the parallelism rule concrete. https://claude.ai/code/session_01LqnxD512JufmQd1VJn5F1P

claude · 2026-04-13T12:42:19Z

PR Review: docs(verify): add Step 1 Orient

Overall: This is a well-scoped, focused change that meaningfully improves the UX of /verify by surfacing reasoning upfront and making parallelism intent explicit. The single-file modification is clean and the renumbering is consistently applied throughout.

What works well

Parallelism guidance is concrete and actionable. The latency table + the rule to fire prepare in the same turn as the preamble directly addresses a real UX problem (users staring at a blank screen for 20–30s on URL evidence). This is the right call.
Three-scenario examples (slow, fast, mixed, no evidence) cover the real cases users will encounter and give the model clear behavioral templates to pattern-match against.
Cross-reference audit is complete. All four internal step references were correctly updated:
- Skip to Step 3 → Skip to Step 4 ✓
- Step 1 triage table → Step 2 triage table ✓
- same as Step 1 → same as Step 2 (×2) ✓
"Not a confirmation checkpoint" is an important guardrail — explicitly forbidding the model from pausing for approval after the preamble prevents a common failure mode where a CoT step becomes a blocking round-trip.

Minor issues / suggestions

1. Hardcoded latency estimates may go stale

| URLs (web pages) | ~20–30s per URL |
| Office files (`.docx`, `.xlsx`, `.pptx`) | ~20–30s per file |

These numbers are baked in as constants. If prepare performance changes (faster infra, caching, etc.), the guidance could mislead the model into treating fast evidence as slow. Consider framing these as "can take up to ~30s" or adding a note that they reflect current typical performance, so a future editor knows to revisit.

2. "CoT gate" jargon in a model-facing doc

"This is a CoT gate for user clarity and progress"

"CoT" is meaningful to AI researchers but could be opaque in a skill file that the model uses as instructions. Since the sentence already explains the purpose ("user clarity and progress"), consider dropping "CoT gate" or spelling it out: "Chain-of-Thought step."

3. --html skip phrasing could be tightened

In Step 4, the comment:

If you skipped Step 2–3 because the HTML already had citation markers (Step 2 triage table: "Existing verified HTML"), use --html instead

Reading "skipped Step 2–3" while in "Step 4" is slightly confusing — the reader has to mentally map back. Spelling it out as "skipped the Prepare and Respond steps" would be unambiguous.

4. Example claim may become stale

"Claude 4.6 Sonnet tops SWE-bench at 77.2%"

This is a good illustrative example, but benchmark numbers get superseded quickly. A more time-stable placeholder (e.g., a legislative or scientific claim) would age better. Low priority — it's just an example.

No bugs found

All step numbers, table entries, and verify --html references are correctly renumbered. The new step doesn't introduce any behavioral contradictions with the existing pipeline logic.

Verdict: Ready to merge with the latency-estimate and jargon points treated as optional follow-up. None are blockers.

Establishes agent guidance for this repo following the same progressive- disclosure pattern as packages/deepcitation. Includes always-applicable invariants (product name, generic-examples rule, auth command) and a guidance router pointing to SKILL.md and the rules/ files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Consistent with auth.md living under rules/. SKILL.md reference updated to match (rules/cloud-sandbox-constraints.md). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The CLI renamed the login subcommand to auth. Updates auth.md recovery steps and the --key flag example accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ange split - Replace specific model/benchmark claim ("Claude 4.6 Sonnet...SWE-bench") with a generic placeholder per the AGENTS.md invariant: use domain- neutral examples that don't carry unintended product implications - Clarify Step 4 --html skip note: "skipped the Prepare and Respond steps" instead of "skipped Step 2–3" (unambiguous without mental mapping) - Add stale Cowork proxy clearing snippet before prepare invocations - Add page-range split algorithm for parallel agent dispatch (eliminates f-fabrication by restricting each agent to its own page window) - Add unique citation ID, Format 2, and bold=k hard rules to agent prompts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude added 2 commits April 13, 2026 12:34

bensonwong and others added 4 commits April 13, 2026 07:53

refactor(verify): move cloud-sandbox-constraints.md to rules/

5b74d37

Consistent with auth.md living under rules/. SKILL.md reference updated to match (rules/cloud-sandbox-constraints.md). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(verify): update auth command references from login to auth

e6f3710

The CLI renamed the login subcommand to auth. Updates auth.md recovery steps and the --key flag example accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bensonwong merged commit 827eab7 into main Apr 13, 2026

bensonwong deleted the claude/verify-thinking-step-Dg8iX branch April 13, 2026 14:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(verify): add Step 1 Orient — think out loud on claim + evidence#22

docs(verify): add Step 1 Orient — think out loud on claim + evidence#22
bensonwong merged 6 commits intomainfrom
claude/verify-thinking-step-Dg8iX

bensonwong commented Apr 13, 2026

Uh oh!

claude bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bensonwong commented Apr 13, 2026

Uh oh!

claude bot commented Apr 13, 2026

PR Review: docs(verify): add Step 1 Orient

What works well

Minor issues / suggestions

No bugs found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants