refactor(verify): auto-generated citations, prose-flow format, and parallel agent merge#14
Conversation
…tation data The verify CLI now auto-generates citation data from `[display label](cite:N)` markers by searching the prepared summary. This removes the need for agents to manually construct `<<<CITATION_DATA>>>` JSON blocks with fullPhrase, anchorText, pageId, and lineIds fields. Key changes: - Replace `deepTextPromptPortion` with `deepTextPages` - Remove `<<<CITATION_DATA>>>` block and all JSON citation instructions - Simplify anchor text rules to "1–5 word display labels" (tool does matching) - Simplify parallel merge: agents write to files, CLI handles merge+renumber - Remove manual dedup/renumber logic from agent instructions - Streamline invariants to reflect body-only output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces Format 2 citation syntax: `[readable label](cite:N "verbatim anchor")` for cases where the evidence term needs surrounding prose to read naturally. - Add prose-flow principle: display labels must read as natural sentences - Add Format 2 with quoted anchor for evidence matching - Add before/after table showing clause-fragment vs prose-flow rewrites - Update sub-agent prompt to include both formats and prose-flow rule Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…or semantics Fixes all blocking and should-fix issues from code review: - Switch Format 2 title attribute from double to single quotes to avoid nested-quote ambiguity in inline code spans - Unify word-count limit to 1–4 words across all sections (format spec, prose-flow principle, and sub-agent prompt) - Clarify that Format 1 display label doubles as the CLI search anchor - Add explicit contiguous-substring requirement for Format 2 anchors - Fix reuse syntax from ambiguous `(cite:N)` to `[label](cite:N)` - Align invariants wording with Step 2's phrasing - Fix table example: shorten over-long display label - Trim redundant GOOD/BAD examples (9 → 8, removed duplicates with table) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Table row 2 used Format 2 with anchor identical to display label, which contradicts the rule that Format 2 is for different link text. Simplified to Format 1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ection Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
PR Review Overall this is a solid simplification. Offloading citation data generation to the CLI removes a large class of model-side errors. The Format 2 syntax and comprehensiveness section are both clear improvements. A few things worth addressing before merging: Potential bugs / correctness issues 1. deepTextPromptPortion -> deepTextPages is a breaking rename with no migration note The summary field name changes in the prose but there is no guidance on what happens if a user runs an older CLI version that still emits deepTextPromptPortion. An agent following the new instructions would look for deepTextPages, find nothing, and silently produce uncited output. A sentence noting the minimum CLI version required would prevent this. 2. No guidance on CLI behavior when a cite:N marker has no matching anchor The old spec enforced bidirectional consistency (every id in CITATION_DATA has a cite:N in the body, and vice versa). That invariant is gone because the CLI now owns JSON generation -- which is fine. But the instructions do not say what the CLI does when it cannot locate an anchor in the summary text. Does it fail loudly? Return a partial result? Knowing this would let the agent handle errors gracefully. 3. Format 2 verbatim requirement has an implicit failure mode Format 2 requires the anchor to exist verbatim in the evidence but does not say what to do if the agent slightly misquotes. A rule like 'if Format 2 anchor is not an exact substring of the evidence, fall back to Format 1' would close this gap. 4. Placeholder variables in the merge command are unexplained The bash command block uses {draft} and {topic} as template slots without saying so. First-time readers may treat them as literal strings. The old spec used concrete example filenames (e.g. yc-safe-analysis.md) which helped ground this. Clarity regressions 5. Loss of the BAD-example catalogue The removed block contained roughly 12 concrete counter-examples (double-bracket, marker before term, anchor too long, casing mismatch, etc.) covering edge cases the new positive rules do not address. Consider moving them into a collapsed details block rather than deleting them entirely. 6. Security wording is slightly weaker Old: 'Never use DEEPCITATION_API_KEY=... prefixing. Never print key values in chat.' The new phrasing loses the specific anti-pattern (env-var prefixing in the shell command), which is the most likely accidental leak vector. Minor nits
What is working well
Summary: The core refactor is sound. The main asks before merging are (1) a CLI minimum-version note for the deepTextPages field rename, (2) guidance on what the CLI returns when an anchor cannot be located, and (3) restoring the BAD-example list in some form. Everything else is minor. |
…mat 2 fallback
- Add CLI version note for deepTextPages field rename (update prompt)
- Restore compact BAD-example list for common citation anti-patterns
- Add Format 2 verbatim fallback rule (if anchor not exact, use Format 1)
- Explain {draft}/{topic} placeholder syntax with concrete examples
- Restore specific env-var prefixing security wording
- Strengthen sub-agent failure detection (require section heading + line count in confirmation)
- Restore note that verify --citations is low-level and skips format normalization
- Add guidance that CLI flags unmatched anchors in output
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
<<<CITATION_DATA>>>JSON blocks with[label](cite:N)markers; the CLI'sverifycommand now extracts and generates citation data automatically from the prepared summary[1.80 metres](cite:N)) and Format 2 (short display label + longer verbatim anchor, e.g.[concrete slab surface](cite:N 'upper unfinished surface of the concrete floor slab')) with concrete examples for eachdeepcitation merge+verify --markdowninstead of manual JSON renumbering and deduplicationTest plan
/verifyon a single-topic document and confirm citation markers render correctly without a manual JSON block/verifyon a multi-section document and confirm parallel agents produce section files that merge cleanly