scunning1975 · nsmiller2501 · May 8, 2026 · May 23, 2026
diff --git a/.claude/skills/wiki-update/README.md b/.claude/skills/wiki-update/README.md
@@ -0,0 +1,55 @@
+# Wiki-Update (`/wiki-update`)
+
+`/wiki-update` ingests new PDFs from a project's `references/raw/` folder into the project's literature wiki. It summarizes each paper through the lens of the project's research focus, writes or updates wiki pages, records completed ingests, and refreshes BibTeX metadata.
+
+The executable protocol lives in [`SKILL.md`](SKILL.md). This README is the human overview.
+
+## When To Use It
+
+Use `/wiki-update` after adding one or more PDFs to `references/raw/`.
+
+Natural-language triggers include:
+
+- "ingest new references"
+- "update the wiki"
+- "process the new papers I added"
+
+The skill is designed to be safe to re-run. Completed papers are identified from the wiki log; unfinished papers are rediscovered and retried.
+
+## What It Expects
+
+- `references/raw/` for source PDFs.
+- `references/wiki/` for concept pages and the ingest log.
+- `references/CLAUDE.md` for wiki conventions.
+- A project root `CLAUDE.md` with the research question, data sources, and identification strategy filled in.
+
+On first run, the skill can scaffold the references wiki structure. It will not invent missing project context.
+
+## What It Does
+
+- Finds new PDFs and proposes filename normalization.
+- Reads each paper in isolated subagents to avoid PDF image bloat and whole-file markdown reads in the main session.
+- Reuses existing `_text.md` extracts or PDF splits when available.
+- For marker-converted PDFs, writes a neutral `_text.md` first, then runs a separate project-wiki synthesis pass.
+- Applies a project-context relevance filter so important material receives full treatment and less relevant material gets concise page-referenced notes.
+- Writes wiki pages atomically per paper, then logs completion only after edits succeed.
+- Runs the BibTeX update cascade after ingestion.
+
+## Boundaries
+
+`/wiki-update` owns the project-wiki lifecycle. `/read-pdf` owns standalone paper reading, including the `/read-pdf --split` fallback. The two skills share the same batching idea, but `/wiki-update` uses a non-interactive subagent flow because a per-batch confirmation gate would deadlock inside an ingest subagent.
+
+For exact tier rules, destructive-edit handling, filename checks, log format, and BibTeX behavior, read [`SKILL.md`](SKILL.md).
+
+## Related Skills
+
+- `/newproject` — creates the project structure this skill expects.
+- `/read-pdf` — standalone paper reading and reusable `_text.md` extraction.
+- `/read-pdf --split` — standalone batched vision reading for individual papers.
+- `/bib-update` — refreshes `references/references.bib` from extracted metadata.
+
+---
+
+The conceptual foundation for this skill is owed to [Andrej Karpathy's LLMwiki concept](https://x.com/karpathy). `/wiki-update` operationalizes that idea for empirical-economics workflows.
+
+This skill originated in [Scott Cunningham](https://github.com/scunning1975/MixtapeTools)'s MixtapeTools repository.
diff --git a/.claude/skills/wiki-update/SKILL.md b/.claude/skills/wiki-update/SKILL.md
diff --git a/.claude/skills/wiki-update/common.md b/.claude/skills/wiki-update/common.md
@@ -0,0 +1,149 @@
+# Common protocol fragments — wiki-update subagent
+
+These sections are shared across Protocols M, E, and S. The main session passes this file by path into every per-paper subagent prompt, alongside exactly one of `protocol_m.md`, `protocol_e.md`, or `protocol_s.md`.
+
+---
+
+## `_text.md` structure
+
+Protocols that synthesize `_text.md` (Protocol S and the read-pdf fanout synthesis used by Protocol M) use this layout:
+
+```markdown
+## Bibliographic metadata
+doi: <10.xxxx/yyyy if found, else null>
+authors: [LastName1, LastName2, ...]
+title: <verbatim title>
+year: <year>
+venue: <journal/WP series/etc., verbatim>
+venue_type: journal | working_paper | book_chapter | other
+
+## Plain-English synthesis
+[~200 words, see below]
+
+## 1. Research question
+...
+## 2. Audience
+...
+[continue through dimension 12]
+```
+
+## Plain-English synthesis block
+
+Hard cap: ~200 words. No jargon. Cover:
+
+- Research question (1 sentence)
+- Motivation / why it matters (1–2 sentences)
+- What they estimate and how, in plain terms (2–3 sentences)
+- What they found (1–2 sentences)
+- The take-away — what someone should walk away believing or doing differently (1 sentence)
+
+This block is the answer to "what's this paper about?" for someone who will not read the rest. Anyone with a college degree should be able to read it without a glossary. If you find yourself writing "endogeneity" or "LATE" or "first-stage F-stat," rewrite in plainer terms.
+
+## Structured-extraction dimensions
+
+1. **Research question** — what the paper asks and why it matters
+2. **Audience** — sub-community of researchers who care
+3. **Method / identification strategy** — how they answer the question
+4. **Target parameter** — the estimand in plain terms (e.g., "ATE of schooling on log wages, conditional on age and state-by-year FE"). Distinct from method and identification assumptions.
+5. **Data** — sources, unit of observation, sample size, time period
+6. **Statistical methods / specifications** — econometric techniques, key specifications, key equations (extract verbatim in LaTeX math mode where available — Protocol M gets these from the converter; Protocol S extracts them from split text)
+7. **Findings** — key coefficients and standard errors
+8. **Contributions** — what is learned that we didn't know before
+9. **Replication feasibility** — data availability, replication archive
+10. **Tables (project-relevance gated)** — see Tables protocol below
+11. **Figures (project-relevance gated)** — see Figures protocol below
+12. **Equations / formal objects** — labeled equations, model primitives, algorithms, propositions, and other formal objects needed to understand or replicate the paper
+
+## Tables protocol (project-relevance gated)
+
+Apply the project-relevance filter. For tables *directly relevant* to the project's research focus, extract in machine-readable markdown. For non-relevant tables, one-line description with page reference.
+
+For relevant tables:
+
+```
+**Table N:** <verbatim caption> (p. 12)
+
+| Variable | (1) | (2) | (3) |
+|---|---|---|---|
+| Schooling | 0.087*** | 0.091*** | 0.085*** |
+|           | (0.012)  | (0.013)  | (0.011)  |
+| N         | 12,450   | 12,450   | 12,450   |
+| R²        | 0.34     | 0.36     | 0.38     |
+
+Notes: <verbatim table notes — SE clustering, FE structure, etc.>
+```
+
+Preserve column headers verbatim, numerical values verbatim (including SEs in parentheses and significance stars), and table notes verbatim. Pipe-syntax markdown only; no HTML tables. Table notes are part of the table's content — capture them.
+
+*Protocol M advantage:* the converter already produces pipe-syntax tables from the PDF. Extract them with light cleanup rather than re-reading the figures.
+
+## Figures protocol (project-relevance gated, two-tier)
+
+Apply the project-relevance filter. Non-relevant figures: one-line description with page reference only.
+
+For relevant figures, classify as Tier A or Tier B using caption text:
+
+- **Tier A — Data figures**: scatter, line, bar, coefplot, histogram, density, time series, RD/event-study plot. The data IS the content.
+- **Tier B — Schematic figures**: DAGs, conceptual diagrams, maps, flowcharts, theoretical model schematics. Do NOT attempt optical decomposition. Default to Tier B when uncertain — a structured Tier A block written for a schematic is misleading; a Tier B for a data figure just makes the reader look at the image.
+
+**In `_text.md`:**
+
+*Protocol M* — figures are copied to `references/wiki/figures/`. Record:
+
+```
+**Figure N:** <verbatim caption> (p. 12)
+![<short description>](figures/<basename>_figN.<ext>)
+- Type: <for Tier A: scatter / line / bar / etc.>
+- X-axis: <variable, units, range>    [Tier A only]
+- Y-axis: <variable, units, range>    [Tier A only]
+- Series / panels: <brief list>       [Tier A only]
+- Key visual finding: <one sentence>
+- Annotations: <labels, reference lines, shaded regions>  [Tier A only]
+- **Figure notes:** <verbatim notes below the figure, if any>
+[Tier B: replace the structured block with just: One-liner: <what the figure depicts at a glance>]
+```
+
+All wiki source pages and concept pages are written directly under `references/wiki/`, so embedded figure links must be relative to that directory. For Protocol M, use the path printed by `copy_marker_figure.py`, usually `figures/<basename>_figN.jpg` or `figures/<basename>_figN.png`. Do not use `../figures/...` or `../wiki/figures/...` in wiki pages.
+
+*Protocols E and S* — use CLIP placeholders (described in their respective protocol sections).
+
+## Substantive-change rule
+
+The subagent applies non-destructive edits directly. Destructive edits to existing pages must be returned as proposed unified diffs — not applied.
+
+| Edit | Apply directly? |
+|---|---|
+| Create new wiki page | Yes |
+| Append new section / bullet / paragraph to existing page | Yes |
+| Add `[[backlink]]` (inline or under "Related pages") | Yes |
+| Update `**Last updated**` date | Yes |
+| Append a new source to `**Sources**` | Yes |
+| Note a contradiction between sources (additive note) | Yes |
+| Reorganize section order (no content lost) | Yes |
+| Update `wiki/index.md` (append new entries, edit existing one-liners) | Yes |
+| Copy an extracted figure into `references/wiki/figures/` | Yes |
+| Edit the `**Summary**` field on an existing page | **Return as diff** |
+| Delete any existing line | **Return as diff** |
+| Modify the wording of an existing claim | **Return as diff** |
+
+## Concept page disambiguation
+
+Before creating a new concept page, check `wiki/index.md` for existing pages covering the same concept — including obvious synonyms (e.g., "RDD" vs "regression discontinuity"). If a near-match exists but you aren't confident, do **not** create a new page; return the ambiguity to the main session as a question for the user.
+
+## Relevance filtering
+
+Apply "compress, don't omit": sections directly relevant to the project's research focus get full treatment. Less-relevant sections get a one-line description plus page reference. Nothing is fully omitted.
+
+## Subagent return value
+
+```
+Pages created: [list]
+Pages modified non-destructively: [list with brief description]
+Proposed destructive edits: [list of {page, unified diff, rationale}]
+Disambiguation questions: [list of {concept, candidate existing pages}]
+Proposed log entry: [single line for wiki/log.md]
+Pending CLIPs: [list of {target_path, source_paper, page_number, one_liner}]
+[Protocol M only] Figures copied: [list of {source_cache_path, dest_wiki_path, paper_figure_label}]
+[Protocol M only] Equation fallback used: <true/false>
+Errors: [any issues encountered]
+```
diff --git a/.claude/skills/wiki-update/protocol_e.md b/.claude/skills/wiki-update/protocol_e.md
@@ -0,0 +1,23 @@
+# Protocol E — Cached Extract
+
+*Input:* path to `references/raw/<basename>_text.md`.
+
+## Step 1: Read the extract
+
+Read `_text.md` in full. Extract the `## Bibliographic metadata` block for the return value. Note any CLIP placeholders in the figures sections.
+
+Protocol E reads only the cached `_text.md` and any figure files it references. Do not re-read the PDF with `pdftotext` to expand or validate the extract.
+
+## Step 2: Write wiki pages
+
+Use the substantive-change rule and relevance filtering in `common.md`.
+
+For figures: if `_text.md` references wiki figure paths that already exist on disk, embed them in wiki pages using the same lightweight format as Protocol M. If `_text.md` contains CLIP placeholders, pass them through to the wiki and aggregate them into the Pending CLIPs return field.
+
+Do **not** re-synthesize or overwrite `_text.md` — it is the canonical extract for this paper.
+
+## Return value additions for Protocol E
+
+```
+Pending CLIPs: [list of {target_path, source_paper, page_number, one_liner} — forwarded from _text.md]
+```
diff --git a/.claude/skills/wiki-update/protocol_m.md b/.claude/skills/wiki-update/protocol_m.md
@@ -0,0 +1,81 @@
+# Protocol M — Fanout Extract Then Wiki Synthesis
+
+*Input:* path to `manifest.json` produced by `read-pdf/scripts/prepare_substrate.py`, path to the converter cache directory (for figures and `text.md`), canonical paper basename.
+
+Protocol M reads only `manifest.json`, its chunk files, worker notes, cache-local figure files, the neutral `_text.md`, and wiki context files. Do not read the whole converted `markdown.md`. Do not inspect the source PDF with `pdftotext` or any other text extractor for substantive synthesis, even if conversion is slow. If conversion or substrate preparation is still running, wait.
+
+## Step 1: Extract bounded worker notes
+
+The main session spawns one worker agent per `manifest.worker_bundles` entry, sequentially. Each worker receives its bundle excerpt, reads the assigned chunk paths only, follows `~/.claude/skills/read-pdf/fanout_worker.md`, and writes one durable note file under `references/raw/raw_build/<basename>_fanout/worker_notes/`.
+
+If interrupted, completed worker notes are salvageable and should not be deleted.
+
+## Step 2: Synthesize `_text.md`
+
+After all worker notes exist, the main session spawns one read-pdf synthesis agent. The synthesis agent reads `manifest.json` and all worker note files. It uses `~/.claude/skills/read-pdf/fanout_synthesis.md` plus `~/.claude/skills/read-pdf/extraction_schema.md` to produce `references/raw/<basename>_text.md` following the project-neutral `_text.md` structure (bib block, plain-English synthesis, structured dimensions, and formal-object inventories). Gap-reread specific chunk files only when worker notes omit a needed table, figure, equation, result, or ambiguous claim. Write or overwrite if a prior partial file exists.
+
+After the synthesis agent returns, cache the neutral extract with:
+
+```bash
+python3 ~/.claude/skills/read-pdf/scripts/cache_text.py push "<cache-dir>/markdown.md" "references/raw/<basename>_text.md"
+```
+
+This cache-level neutral extract is project-neutral and reusable by future projects that ingest the same PDF hash.
+
+For the bib metadata block, use DOI candidates from `manifest.json` and front-matter worker notes. Extract authors, title, year, and venue from the front-matter chunks and worker notes. Record null for any field not found. Do not read the whole `markdown.md` for metadata.
+
+The read-pdf synthesis agent must not read project wiki pages, project context files, citation-overlap JSON, or downstream wiki prompts. It writes only `_text.md`.
+
+## Step 3: Write project wiki pages
+
+After `_text.md` exists, the main session spawns one wiki synthesis agent. It reads:
+
+- `references/raw/<basename>_text.md`
+- `references/CLAUDE.md`
+- project root `CLAUDE.md`
+- current `references/wiki/index.md`
+- relevant existing wiki pages
+- `references/raw/raw_build/<basename>_citation_overlap.json`, if produced
+- `~/.claude/skills/wiki-update/wiki_synthesis.md`
+- `~/.claude/skills/wiki-update/common.md`
+
+The wiki synthesis agent must not read worker notes or chunk files unless `_text.md` explicitly marks a gap and the main session approves a targeted recovery read.
+
+## Step 4: Copy and classify relevant figures
+
+For each relevant figure listed in `_text.md`:
+
+1. Identify the paper figure number from surrounding caption text.
+2. Apply the project-relevance filter. Non-relevant: one-line description + page ref only; do not copy.
+3. For relevant figures:
+   - Copy with the deterministic helper, not by hand:
+     `python3 ~/.claude/skills/wiki-update/scripts/copy_marker_figure.py <cache-dir>/markdown.md <absolute-project-root>/references/wiki/figures --basename <basename> --figure <M>`
+   - Use the helper's printed wiki-relative path in markdown. The helper preserves the source image format and uses a byte-matching extension, so destinations may be `.jpg` or `.png`.
+   - Verify copied files exist with `ls references/wiki/figures/<basename>_fig<M>.*`.
+   - Classify as Tier A (data figure: scatter, line, bar, coefplot, histogram, density, time series, RD/event-study plot) or Tier B (schematic: DAG, conceptual diagram, map, flowchart, theoretical model). Use the `_text.md` figure description and caption; read the PNG only if genuinely needed for wiki writing.
+
+## Step 5: Wiki figure embeds
+
+Use the substantive-change rule and relevance filtering in `common.md`.
+
+For relevant figures embedded in wiki concept pages, use this format regardless of Tier A/B:
+
+```markdown
+**Figure N:** <verbatim caption> (p. 12)
+
+![<short description>](<helper-printed-path>)
+
+- Key visual finding: <one sentence — what the eye sees / the point of the figure>
+- **Figure notes:** <verbatim notes printed below the figure in the paper, if any>
+```
+
+All wiki pages live directly under `references/wiki/`. Figure links must use the helper-printed path, e.g. `figures/<basename>_figN.jpg` or `figures/<basename>_figN.png`, never `../figures/...`.
+
+The Tier A/B distinction lives in `_text.md` only (full optical decomposition for Tier A; schematic one-liner for Tier B). Wiki pages use the same lightweight embed format for all figures.
+
+## Return value additions for Protocol M
+
+```
+Figures copied: [list of {source_cache_path, dest_wiki_path, paper_figure_label}]
+Equation fallback used: <true/false> (with count and any "[unreadable equation]" instances if true)
+```
diff --git a/.claude/skills/wiki-update/protocol_s.md b/.claude/skills/wiki-update/protocol_s.md
@@ -0,0 +1,52 @@
+# Protocol S — Split-PDF Pipeline
+
+*Input:* absolute path to the PDF, absolute path to the splits directory (`references/raw/raw_build/split_<basename>/`). The main session has already run the splitter — the splits directory is populated with `<basename>_pp<X>-<Y>.pdf` chunks before this subagent is spawned. Do not attempt to split the PDF yourself.
+
+## Step 1: Read splits in batches of 3
+
+Read each split sequentially in batches of 3, without pausing or asking for confirmation. After each batch, append findings to `<splits-dir>/notes.md` under the structured-extraction dimensions in `common.md`, preceded by a batch boundary comment:
+
+```
+<!-- batch N: pp X-Y -->
+```
+
+If `notes.md` already exists (prior interrupted run), read it first and resume from where it left off — do not overwrite earlier content. `notes.md` is append-mostly and permanent; never delete it.
+
+## Step 2: Synthesize `_text.md`
+
+After all splits are read, write `references/raw/<basename>_text.md` from the accumulated `notes.md` content. Follow the `_text.md` structure in `common.md` (bib block, plain-English synthesis, 12 dimensions).
+
+For the bib metadata block: scan the first split for the DOI regex `10\.\d{4,}/\S+`. Extract authors, title, year, and venue from the first-split text. Record null for any field not found.
+
+`notes.md` is permanent — do not delete it after writing `_text.md`.
+
+## Step 3: Write wiki pages
+
+Use the substantive-change rule and relevance filtering in `common.md`.
+
+For figures: Protocol S does not have extracted figure images. Use CLIP placeholders for all Tier B figures and for any Tier A data figures that cannot be adequately described in text. A structured Tier A block suffices when the data description is complete; use a CLIP placeholder when it isn't.
+
+CLIP placeholder format in `_text.md`:
+
+```
+> **Figure N (CLIP):** <verbatim caption> (p. 12)
+> One-liner: <what the figure depicts at a glance>
+> ACTION: clip from PDF, save to references/wiki/figures/<basename>_fig<N>.png
+```
+
+When a wiki page references a CLIP figure, use a broken image link (it renders as a visible TODO):
+
+```markdown
+![<short description>](figures/<basename>_figN.png)
+*<verbatim caption> ([<basename>](../log.md), p. 12)*
+```
+
+All wiki pages live directly under `references/wiki/`. For Protocol S CLIP placeholders, use `figures/<basename>_figN.png` in wiki markdown, never `../figures/...`.
+
+Before writing any CLIP placeholder that references the figures directory, ensure it exists: `mkdir -p references/wiki/figures`.
+
+## Return value additions for Protocol S
+
+```
+Pending CLIPs: [list of {target_path, source_paper, page_number, one_liner}]
+```