Skip to content

refactor(verify): auto-generated citations, prose-flow format, and parallel agent merge#14

Merged
bensonwong merged 7 commits intomainfrom
refactor/verify-auto-citation-data
Apr 3, 2026
Merged

refactor(verify): auto-generated citations, prose-flow format, and parallel agent merge#14
bensonwong merged 7 commits intomainfrom
refactor/verify-auto-citation-data

Conversation

@bensonwong
Copy link
Copy Markdown
Contributor

Summary

  • Auto-generated citation data: Replaces manual <<<CITATION_DATA>>> JSON blocks with [label](cite:N) markers; the CLI's verify command now extracts and generates citation data automatically from the prepared summary
  • Two-format citation syntax: Introduces Format 1 (short verbatim term, e.g. [1.80 metres](cite:N)) and Format 2 (short display label + longer verbatim anchor, e.g. [concrete slab surface](cite:N 'upper unfinished surface of the concrete floor slab')) with concrete examples for each
  • Parallel agent merge pipeline: Updates the multi-section workflow to use deepcitation merge + verify --markdown instead of manual JSON renumbering and deduplication
  • Comprehensiveness section: Adds explicit guidance to cover all parts of multi-part questions with specific details, structured sections, and full evidence coverage
  • Simplified authentication failure handling: Condenses verbose stop instructions into a single clear rule
  • Reduced tool-call overhead: Updates invariants to reflect the new single-command merge+verify pipeline

Test plan

  • Run /verify on a single-topic document and confirm citation markers render correctly without a manual JSON block
  • Run /verify on a multi-section document and confirm parallel agents produce section files that merge cleanly
  • Verify Format 2 citations (long anchor phrases) highlight correctly in the output HTML
  • Confirm auth failure path still stops cleanly without generating a partial report

bensonwong and others added 6 commits April 3, 2026 15:41
…tation data

The verify CLI now auto-generates citation data from `[display label](cite:N)`
markers by searching the prepared summary. This removes the need for agents to
manually construct `<<<CITATION_DATA>>>` JSON blocks with fullPhrase, anchorText,
pageId, and lineIds fields.

Key changes:
- Replace `deepTextPromptPortion` with `deepTextPages`
- Remove `<<<CITATION_DATA>>>` block and all JSON citation instructions
- Simplify anchor text rules to "1–5 word display labels" (tool does matching)
- Simplify parallel merge: agents write to files, CLI handles merge+renumber
- Remove manual dedup/renumber logic from agent instructions
- Streamline invariants to reflect body-only output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces Format 2 citation syntax: `[readable label](cite:N "verbatim anchor")`
for cases where the evidence term needs surrounding prose to read naturally.

- Add prose-flow principle: display labels must read as natural sentences
- Add Format 2 with quoted anchor for evidence matching
- Add before/after table showing clause-fragment vs prose-flow rewrites
- Update sub-agent prompt to include both formats and prose-flow rule

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…or semantics

Fixes all blocking and should-fix issues from code review:

- Switch Format 2 title attribute from double to single quotes to avoid
  nested-quote ambiguity in inline code spans
- Unify word-count limit to 1–4 words across all sections (format spec,
  prose-flow principle, and sub-agent prompt)
- Clarify that Format 1 display label doubles as the CLI search anchor
- Add explicit contiguous-substring requirement for Format 2 anchors
- Fix reuse syntax from ambiguous `(cite:N)` to `[label](cite:N)`
- Align invariants wording with Step 2's phrasing
- Fix table example: shorten over-long display label
- Trim redundant GOOD/BAD examples (9 → 8, removed duplicates with table)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Table row 2 used Format 2 with anchor identical to display label, which
contradicts the rule that Format 2 is for different link text. Simplified
to Format 1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ection

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Apr 3, 2026

PR Review

Overall this is a solid simplification. Offloading citation data generation to the CLI removes a large class of model-side errors. The Format 2 syntax and comprehensiveness section are both clear improvements. A few things worth addressing before merging:


Potential bugs / correctness issues

1. deepTextPromptPortion -> deepTextPages is a breaking rename with no migration note

The summary field name changes in the prose but there is no guidance on what happens if a user runs an older CLI version that still emits deepTextPromptPortion. An agent following the new instructions would look for deepTextPages, find nothing, and silently produce uncited output. A sentence noting the minimum CLI version required would prevent this.

2. No guidance on CLI behavior when a cite:N marker has no matching anchor

The old spec enforced bidirectional consistency (every id in CITATION_DATA has a cite:N in the body, and vice versa). That invariant is gone because the CLI now owns JSON generation -- which is fine. But the instructions do not say what the CLI does when it cannot locate an anchor in the summary text. Does it fail loudly? Return a partial result? Knowing this would let the agent handle errors gracefully.

3. Format 2 verbatim requirement has an implicit failure mode

Format 2 requires the anchor to exist verbatim in the evidence but does not say what to do if the agent slightly misquotes. A rule like 'if Format 2 anchor is not an exact substring of the evidence, fall back to Format 1' would close this gap.

4. Placeholder variables in the merge command are unexplained

The bash command block uses {draft} and {topic} as template slots without saying so. First-time readers may treat them as literal strings. The old spec used concrete example filenames (e.g. yc-safe-analysis.md) which helped ground this.


Clarity regressions

5. Loss of the BAD-example catalogue

The removed block contained roughly 12 concrete counter-examples (double-bracket, marker before term, anchor too long, casing mismatch, etc.) covering edge cases the new positive rules do not address. Consider moving them into a collapsed details block rather than deleting them entirely.

6. Security wording is slightly weaker

Old: 'Never use DEEPCITATION_API_KEY=... prefixing. Never print key values in chat.'
New: 'Never expose API keys in commands or output.'

The new phrasing loses the specific anti-pattern (env-var prefixing in the shell command), which is the most likely accidental leak vector.


Minor nits

  • The --a / --b flags on deepcitation merge look non-idiomatic. Worth verifying the CLI accepts them as written.
  • The instruction says each sub-agent 'returns a one-line confirmation' but agents write to files. If an agent fails silently (empty or partial file), the merge succeeds on bad input. Consider asking agents to also print the section heading so the orchestrator can detect a bad write.
  • The removed note about verify --citations being low-level is worth keeping if that flag still exists in the CLI.

What is working well

  • Eliminating the manual CITATION_DATA block is the right call -- it was the largest source of structural errors.
  • Format 2 cleanly solves the long-anchor problem without breaking the existing syntax.
  • The auth failure simplification is appropriate; the old bullet list was over-specified.
  • The comprehensiveness section fills a real gap -- agents were not previously told to mirror evidence structure with subsections or cover all parts of multi-part questions equally.
  • The updated invariants accurately reflect the new two-command pipeline and are easier to scan.

Summary: The core refactor is sound. The main asks before merging are (1) a CLI minimum-version note for the deepTextPages field rename, (2) guidance on what the CLI returns when an anchor cannot be located, and (3) restoring the BAD-example list in some form. Everything else is minor.

…mat 2 fallback

- Add CLI version note for deepTextPages field rename (update prompt)
- Restore compact BAD-example list for common citation anti-patterns
- Add Format 2 verbatim fallback rule (if anchor not exact, use Format 1)
- Explain {draft}/{topic} placeholder syntax with concrete examples
- Restore specific env-var prefixing security wording
- Strengthen sub-agent failure detection (require section heading + line count in confirmation)
- Restore note that verify --citations is low-level and skips format normalization
- Add guidance that CLI flags unmatched anchors in output

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bensonwong bensonwong merged commit 0193a29 into main Apr 3, 2026
@bensonwong bensonwong deleted the refactor/verify-auto-citation-data branch April 3, 2026 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant