Skip to content

feat(verify): CoT gate, tax/regulatory anchors, and citation data overhaul#19

Merged
bensonwong merged 9 commits intomainfrom
feat/verify-tax-regulatory-anchors
Apr 12, 2026
Merged

feat(verify): CoT gate, tax/regulatory anchors, and citation data overhaul#19
bensonwong merged 9 commits intomainfrom
feat/verify-tax-regulatory-anchors

Conversation

@bensonwong
Copy link
Copy Markdown
Contributor

Summary

  • CoT gate: forces agents to locate the source sentence (f) before picking sourceMatch (k) — eliminates paraphrase failures by making fk a visible substring check, not a post-hoc rationalization
  • Citation data field table: replaced bullet-list field docs with an ordered table that encodes the CoT reasoning sequence (n, r, f, k, p, l) — field order now communicates the required reasoning order
  • Hard fk substring rule: added as a standalone callout in both the data-block section and the sub-agent instructions; if the planned k doesn't appear word-for-word in f, the model must fix f first
  • Tax/regulatory anchor examples: added brevity examples and a failure-mode table for dollar amounts, percentages, and named legal tests
  • Index/appendix/TOC rule: citations must follow refs to actual evidence pages, not structural pages
  • Standards doc housekeeping: moved deep-citation-standards.md reference to the deepcitation repo; updated concepts doc reference

Test plan

  • Run /verify on a document with numeric facts (dollar amounts, dates) and confirm citations use terse anchors (≤4 words) that are verbatim substrings of f
  • Run /verify on a tax/regulatory document and confirm percentages/thresholds are cited with the exact figure as k
  • Confirm sub-agent instructions include the CoT gate and fk substring check
  • Confirm citation data fields appear in n, r, f, k, p, l order in generated output

bensonwong and others added 7 commits April 10, 2026 14:54
…table rows

Tax and regulatory documents have a distinct anchoring pattern where
dollar amounts, percentages, and named legal tests are the correct
anchors — not the full qualifying clause. Adds worked examples,
BAD/GOOD patterns, three failure table rows, and word-count gate
examples for this document class.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Index and table-of-contents pages contain page-number references, not
operative evidence. Citing them produces garbage sourceContext and
fails verification. Adds a rule with BAD/GOOD examples to steer
citations toward body text pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… symmetry

Addresses two review suggestions:
1. Explain the trailing-dash artifact on "$450-" so the anchor
   stripping is intentional, not accidental.
2. Add prose-placement examples to all three tax/regulatory failure
   table rows (previously only the "30%" row showed how to rewrite
   the surrounding sentence).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move citation anchor rules, display label spec, and QA grading rubric
from private mono-repo into the skills repo so all SKILL.md references
resolve for standalone skills-repo users.

Update verify/SKILL.md to reference docs/deep-citation-standards.md
(was docs/agents/deep-citation-standards.md in private mono-repo).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The canonical citation format spec now lives in DeepCitation/deepcitation
at docs/agents/deep-citation-standards.md.

Update SKILL.md references from docs/deep-citation-standards.md to
packages/deepcitation/docs/agents/deep-citation-standards.md (the
submodule path when used from the mono-repo).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ts.md

Also upgrade path to full submodule path for mono-repo resolution.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…urceMatch

Forces locate-then-extract reasoning: write `f` (sourceContext) first,
then derive `k` (sourceMatch) as a substring. Rewrites citation field docs
as a CoT-ordered table; collapses Format 1/2 into a single bold-anchor
format; adds f→k hard substring rule to eliminate paraphrase at the source.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Apr 12, 2026

Overall this is a well-reasoned set of improvements. The CoT gate is a genuinely clever fix for paraphrase failures, and the Domain A/B framing + Format 1/Format 2 split adds expressiveness without much cognitive overhead. A few things worth flagging:


p field format is a silent breaking change

The old example used "1_0"; the new examples use "page_number_1_index_0". The sub-agent instructions are updated consistently, but any parser or downstream consumer that splits on _ and expects exactly two numeric segments will break silently. Worth confirming the CLI already accepts the new format before this merges — if not, a migration note is needed.


scratch/verify-flow.html — two concerns

CDN dependencies with unpinned versions: the file loads mermaid@10 which resolves to the latest 10.x, so a minor Mermaid release can silently change diagram rendering. Pin to an exact version (e.g. mermaid@10.9.3) or inline the script if stability matters. It also imports Google Fonts via CSS, which requires a live network connection and sends request metadata externally — system font fallbacks are fine for an internal reference doc.

Should this be committed at all? scratch/ implies throwaway work. A 2,438-line generated HTML file in version control shows up in every git log, git diff, and blame. If it is a durable artifact, move it to docs/. If ephemeral, add scratch/ to .gitignore.


Minor: sourceContext terminology collision in SELF-CHECK step 0

"find the sentence in the evidence that proves the claim and hold it in mind as sourceContext" — sourceContext is the UI component name (rendered from l line IDs), while the field that holds the verbatim sentence is f (source_context). The step does say "write it as f" a few lines later, but the opening sentence creates a brief confusion. Changing it to "hold it in mind as f (source_context)" makes it unambiguous.


What is working well

  • CoT gate — writing f before k and enforcing a substring check is a principled fix that closes the paraphrase loop at the source.
  • Field-order-as-reasoning-order — the n, r, f, k, p, l table cleanly communicates sequencing constraints without a separate prose explanation.
  • Tax/regulatory examples — the dollar-amount, percentage, and named-test examples fill a real gap. The failure-mode rows (e.g. "first 5 years of employment" to "5 years") are well-chosen.
  • Index/appendix/TOC rule — the bad/good example with "A Club dues 47 F Carrying charge 25..." makes the failure mode immediately recognisable.
  • Format 2 syntax — decouples Domain A voice from Domain B verbatim, which the old single-format model could not handle gracefully.

Summary: Resolve the p format backward-compat question, decide intentionally about scratch/verify-flow.html (pin CDN if it stays, or drop it), and clarify the sourceContext vs f naming in SELF-CHECK step 0. Everything else looks solid.

- Fix: SELF-CHECK step 0 — rename `sourceContext` → `f` (`source_context`)
  to match the actual data field name; avoids collision with the UI
  component of the same name
- Fix: add `scratch/` to .gitignore and untrack verify-flow.html;
  scratch files are ephemeral and don't belong in version control

Note: `p` field format (`page_number_N_index_I`) is unchanged from the
previous SKILL.md version — the diff only moved the docs from a bullet
list to a table. No CLI compatibility concern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bensonwong
Copy link
Copy Markdown
Contributor Author

Thanks for the detailed review. Addressing each point:

p field format — no breaking change here. The page_number_N_index_I format was already the established format in the previous SKILL.md (the diff only converted the bullet-list docs to a table). The CLI has always used this format; the old "1_0" form was never in this file. No migration needed.

scratch/verify-flow.html — agreed. Added scratch/ to .gitignore and untracked the file in the follow-up commit (a3368d1).

sourceContext terminology in step 0 — fixed. Changed to f (source_context) to make the field reference unambiguous. Also updated the substring check reference from sourceContext to f for consistency.

@bensonwong
Copy link
Copy Markdown
Contributor Author

@claude review the latest changes

@bensonwong bensonwong merged commit 22c95ab into main Apr 12, 2026
@bensonwong bensonwong deleted the feat/verify-tax-regulatory-anchors branch April 12, 2026 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant