feat(verify): CoT gate, tax/regulatory anchors, and citation data overhaul by bensonwong · Pull Request #19 · DeepCitation/skills

bensonwong · 2026-04-12T00:13:45Z

Summary

CoT gate: forces agents to locate the source sentence (f) before picking sourceMatch (k) — eliminates paraphrase failures by making f→k a visible substring check, not a post-hoc rationalization
Citation data field table: replaced bullet-list field docs with an ordered table that encodes the CoT reasoning sequence (n, r, f, k, p, l) — field order now communicates the required reasoning order
Hard f→k substring rule: added as a standalone callout in both the data-block section and the sub-agent instructions; if the planned k doesn't appear word-for-word in f, the model must fix f first
Tax/regulatory anchor examples: added brevity examples and a failure-mode table for dollar amounts, percentages, and named legal tests
Index/appendix/TOC rule: citations must follow refs to actual evidence pages, not structural pages
Standards doc housekeeping: moved deep-citation-standards.md reference to the deepcitation repo; updated concepts doc reference

Test plan

Run /verify on a document with numeric facts (dollar amounts, dates) and confirm citations use terse anchors (≤4 words) that are verbatim substrings of f
Run /verify on a tax/regulatory document and confirm percentages/thresholds are cited with the exact figure as k
Confirm sub-agent instructions include the CoT gate and f→k substring check
Confirm citation data fields appear in n, r, f, k, p, l order in generated output

…table rows Tax and regulatory documents have a distinct anchoring pattern where dollar amounts, percentages, and named legal tests are the correct anchors — not the full qualifying clause. Adds worked examples, BAD/GOOD patterns, three failure table rows, and word-count gate examples for this document class. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Index and table-of-contents pages contain page-number references, not operative evidence. Citing them produces garbage sourceContext and fails verification. Adds a rule with BAD/GOOD examples to steer citations toward body text pages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… symmetry Addresses two review suggestions: 1. Explain the trailing-dash artifact on "$450-" so the anchor stripping is intentional, not accidental. 2. Add prose-placement examples to all three tax/regulatory failure table rows (previously only the "30%" row showed how to rewrite the surrounding sentence). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move citation anchor rules, display label spec, and QA grading rubric from private mono-repo into the skills repo so all SKILL.md references resolve for standalone skills-repo users. Update verify/SKILL.md to reference docs/deep-citation-standards.md (was docs/agents/deep-citation-standards.md in private mono-repo). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The canonical citation format spec now lives in DeepCitation/deepcitation at docs/agents/deep-citation-standards.md. Update SKILL.md references from docs/deep-citation-standards.md to packages/deepcitation/docs/agents/deep-citation-standards.md (the submodule path when used from the mono-repo). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ts.md Also upgrade path to full submodule path for mono-repo resolution. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…urceMatch Forces locate-then-extract reasoning: write `f` (sourceContext) first, then derive `k` (sourceMatch) as a substring. Rewrites citation field docs as a CoT-ordered table; collapses Format 1/2 into a single bold-anchor format; adds f→k hard substring rule to eliminate paraphrase at the source. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude · 2026-04-12T00:17:05Z

Overall this is a well-reasoned set of improvements. The CoT gate is a genuinely clever fix for paraphrase failures, and the Domain A/B framing + Format 1/Format 2 split adds expressiveness without much cognitive overhead. A few things worth flagging:

p field format is a silent breaking change

The old example used "1_0"; the new examples use "page_number_1_index_0". The sub-agent instructions are updated consistently, but any parser or downstream consumer that splits on _ and expects exactly two numeric segments will break silently. Worth confirming the CLI already accepts the new format before this merges — if not, a migration note is needed.

scratch/verify-flow.html — two concerns

CDN dependencies with unpinned versions: the file loads mermaid@10 which resolves to the latest 10.x, so a minor Mermaid release can silently change diagram rendering. Pin to an exact version (e.g. mermaid@10.9.3) or inline the script if stability matters. It also imports Google Fonts via CSS, which requires a live network connection and sends request metadata externally — system font fallbacks are fine for an internal reference doc.

Should this be committed at all? scratch/ implies throwaway work. A 2,438-line generated HTML file in version control shows up in every git log, git diff, and blame. If it is a durable artifact, move it to docs/. If ephemeral, add scratch/ to .gitignore.

Minor: sourceContext terminology collision in SELF-CHECK step 0

"find the sentence in the evidence that proves the claim and hold it in mind as sourceContext" — sourceContext is the UI component name (rendered from l line IDs), while the field that holds the verbatim sentence is f (source_context). The step does say "write it as f" a few lines later, but the opening sentence creates a brief confusion. Changing it to "hold it in mind as f (source_context)" makes it unambiguous.

What is working well

CoT gate — writing f before k and enforcing a substring check is a principled fix that closes the paraphrase loop at the source.
Field-order-as-reasoning-order — the n, r, f, k, p, l table cleanly communicates sequencing constraints without a separate prose explanation.
Tax/regulatory examples — the dollar-amount, percentage, and named-test examples fill a real gap. The failure-mode rows (e.g. "first 5 years of employment" to "5 years") are well-chosen.
Index/appendix/TOC rule — the bad/good example with "A Club dues 47 F Carrying charge 25..." makes the failure mode immediately recognisable.
Format 2 syntax — decouples Domain A voice from Domain B verbatim, which the old single-format model could not handle gracefully.

Summary: Resolve the p format backward-compat question, decide intentionally about scratch/verify-flow.html (pin CDN if it stays, or drop it), and clarify the sourceContext vs f naming in SELF-CHECK step 0. Everything else looks solid.

- Fix: SELF-CHECK step 0 — rename `sourceContext` → `f` (`source_context`) to match the actual data field name; avoids collision with the UI component of the same name - Fix: add `scratch/` to .gitignore and untrack verify-flow.html; scratch files are ephemeral and don't belong in version control Note: `p` field format (`page_number_N_index_I`) is unchanged from the previous SKILL.md version — the diff only moved the docs from a bullet list to a table. No CLI compatibility concern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bensonwong · 2026-04-12T00:18:43Z

Thanks for the detailed review. Addressing each point:

p field format — no breaking change here. The page_number_N_index_I format was already the established format in the previous SKILL.md (the diff only converted the bullet-list docs to a table). The CLI has always used this format; the old "1_0" form was never in this file. No migration needed.

scratch/verify-flow.html — agreed. Added scratch/ to .gitignore and untracked the file in the follow-up commit (a3368d1).

sourceContext terminology in step 0 — fixed. Changed to f (source_context) to make the field reference unambiguous. Also updated the substring check reference from sourceContext to f for consistency.

bensonwong · 2026-04-12T00:19:36Z

@claude review the latest changes

bensonwong and others added 7 commits April 10, 2026 14:54

refactor(verify): update concepts.md reference → deep-citation-concep…

6957eab

…ts.md Also upgrade path to full submodule path for mono-repo resolution. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' into feat/verify-tax-regulatory-anchors

2f5780a

bensonwong merged commit 22c95ab into main Apr 12, 2026

bensonwong deleted the feat/verify-tax-regulatory-anchors branch April 12, 2026 00:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(verify): CoT gate, tax/regulatory anchors, and citation data overhaul#19

feat(verify): CoT gate, tax/regulatory anchors, and citation data overhaul#19
bensonwong merged 9 commits intomainfrom
feat/verify-tax-regulatory-anchors

bensonwong commented Apr 12, 2026

Uh oh!

claude bot commented Apr 12, 2026

Uh oh!

bensonwong commented Apr 12, 2026

Uh oh!

bensonwong commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bensonwong commented Apr 12, 2026

Summary

Test plan

Uh oh!

claude bot commented Apr 12, 2026

Uh oh!

bensonwong commented Apr 12, 2026

Uh oh!

bensonwong commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant