MixtapeTools

Tools for empirical economics research with Claude Code — audit protocols, presentation systems, PDF workflows, and a research GTD harness.

About This Repo

This is a collection of tools, templates, and philosophies I've developed while using Claude Code for:

Coding (data analysis scripts, replication code, automation)
Teaching (course materials, lecture decks, pedagogical tools)
Presentations (Beamer decks, slides for talks and seminars)
Research management (hypothesis tracking, evidence courtroom, live dashboards)

As I develop new approaches, I'll add them here. Anyone is free to use them.

Take everything with a grain of salt. These are workflows that work for me. Your mileage may vary.

A Note on Collaboration Style

The tools here reflect a specific way of working with Claude Code: Claude as a thinking partner, not Claude as an autonomous agent. I steer iteratively; Claude executes when the next move is clear and surfaces ambiguity when it isn't. This approach has distinctly less automation than many people prefer, and that is deliberate — I think more clearly when I am thinking through Claude rather than at Claude. The dialogue is the work, not just the delivery mechanism for the work.

If your taste runs toward highly automated pipelines, or toward AI that never takes initiative without permission, a lot of what's here may feel too loose on one axis and too presumptuous on the other. Both styles are legitimate; they're not what these tools are optimized for.

YMMV. Take what's useful; leave the rest.

Who I Am

Scott Cunningham — Professor of Economics at Baylor University

Website: www.scunning.com
Substack: causalinf.substack.com — I write regularly about causal inference, Claude Code, and random things
Free book: Causal Inference: The Mixtape — available online

Start Here: My Workflow

Location: workflow.md | Deck: presentations/examples/workflow_deck/

Before diving into specific tools, read my workflow document. It explains how I think about using Claude Code for empirical research — not just the tools, but the philosophy behind them.

Key concepts:

Concept	What It Means
Thinking partner, not code monkey	Claude is a collaborator who reasons about problems, not just a code generator
External memory via markdown	Claude has amnesia between sessions; markdown files provide institutional memory
Cross-software replication	R = Stata = Python to 6 decimal places, or something is wrong
Adversarial review (Referee 2)	Fresh Claude instance audits your work; you can't grade your own homework
Verification through visualization	Trust pictures over numbers; errors become visible
Documentation as first-class output	If it's not documented, it didn't happen

Everything else in this repo implements these principles.

The Tools

1. Referee 2 (Systematic Audit & Replication Protocol)

Location: skills/referee2/ | personas/referee2.md (full protocol)

Referee 2 is a health inspector for empirical research — a systematic five-audit protocol with cross-language replication, formal referee reports, and a revise & resubmit process. It runs after a project is complete, in a fresh terminal, by a Claude instance that has never seen the work. The separation is what makes it independent: the Claude that built the pipeline cannot objectively audit it.

The Five Audits:

Audit	What It Does
Code Audit	Coding errors, missing value handling, merge diagnostics, variable construction
Cross-Language Replication	Replication scripts in 2 other languages (R/Stata/Python), results compared to 6 decimal places
Directory Audit	Folder structure, relative paths, naming conventions — replication-package ready?
Output Automation Audit	Are tables and figures programmatically generated or manually created?
Econometrics Audit	Identification strategy, standard errors, fixed effects, parallel trends, first stage

Critical Rule: Referee 2 NEVER modifies author code. It only creates its own replication scripts. Only the author modifies the author's code.

Referee 2 is one of two complementary audit tools. See below.

2. Blindspot (Make the Stone Stony Again)

Location: skills/blindspot/ | .claude/skills/blindspot/SKILL.md (actual skill)

Blindspot is a peripheral vision audit for empirical output — a structured protocol for finding what the author cannot see. Problems hiding in plain sight (vices) and opportunities being overlooked (virtues).

The Shklovsky principle: Viktor Shklovsky, the Soviet literary theorist, argued that art exists to restore perception. A man who walks barefoot up a mountain eventually cannot feel his feet. Art exists to make the stone stony again. Research has the same problem: by the time you've spent months on a paper, the main finding has collapsed your attention, and everything else — the spike at t=1, the missing subgroup, the heterogeneity richer than the average — has become invisible.

Blindspot makes the stone stony again.

The Blindspot Grid — four quadrants, two vices and two virtues:

	What's there but unseen	What's absent but unnoticed
Problems	Vice 1: The Unexplained Feature — a spike, a sign flip, a sample-size drop nobody asked about	Vice 2: The Convenient Absence — a robustness check never run, a subgroup never examined, a dog that didn't bark
Opportunities	Virtue 1: The Unasked Question — heterogeneity richer than the average, a mechanism visible in the data but absent from the hypothesis	Virtue 2: The Unexploited Strength — an identification argument stronger than the paper claims, a falsification test that would crush the main objection

Ruling: CLEAR / CONDITIONAL / HOLD

Usage: /blindspot path/to/figure-or-table "what I think the main finding is"

Read the full documentation: skills/blindspot/README.md

Referee 2 and Blindspot: Complements, Not Substitutes

These two tools address different failure modes at different stages of the research process. Both should be run. Neither replaces the other.

	Referee 2	Blindspot
Core question	Is this implemented correctly?	Can you see what's in front of you?
Failure mode it catches	Coding errors, bad merges, wrong SEs, non-replicating results	Overlooked problems (vices) and overlooked opportunities (virtues)
When it runs	After the project is complete	When output first appears, before writing begins
Session	Fresh terminal — independence is structural	Same session — you need the person closest to the work
Persona	Health inspector with a checklist	Shklovsky — restoring perception
Would have caught a merge error?	Yes	Maybe
Would have caught the t=1 spike?	No	Yes

Why separate sessions for Referee 2 but not Blindspot?

Referee 2 needs a fresh session because it's auditing implementation — the Claude that built the code will rationalize its own choices. True independence requires structural separation.

Blindspot doesn't need separation because it's auditing perception — your own understanding of output you produced. You're the right person to do that, with a structured forcing function to look past what you expect to see.

The workflow:

Produce output → /blindspot → interpret and write
Complete project → open fresh terminal → /referee2

3. Bibcheck (Many-Agent Bibliography Audit)

Location: skills/bibcheck/ | .claude/skills/bibcheck/SKILL.md (actual skill)

A third verification skill, narrower than Referee 2 and bounded to a single artifact: the .bib file. /bibcheck audits a bibliography by spawning many narrow-focus agents — one per citation, or one per field across all citations — to verify each entry against canonical sources (DOI, journal landing page, author working paper).

Why narrow agents: A single agent asked to audit 80 citations in one pass tends to drift. Early entries get careful treatment; later entries get pattern-matched. Splitting the work — one agent per entry, full attention budget on a small task, parallel siblings for the next — moves the bottleneck from agent attention to orchestration. Orchestration is what cheap parallel agents are for.

Two modes:

Mode	Best for	How it works
Per-citation (default)	Catching mixed-up entries (title of paper A with authors of paper B)	One Agent subagent per `@article{}` entry; each fully audits its one entry; reviewer agent consolidates
Per-field (`--by-field`)	Catching systematic transcription errors (a journal name consistently wrong, swapped volume/issue, leaked working-paper years)	One specialist per field (title, year, journal, authors, volume/issue, pages, DOI); each launched as an isolated `claude --dangerously-skip-permissions -p` subprocess so they cannot peek at each other

Outputs: a timestamped bibcheck_<ts>/ folder containing bibcheck_report.md (per-entry findings, Clean / Corrected / Unverifiable summary) and corrected.bib (drop-in replacement). The skill never auto-overwrites the source — you review and merge.

Critical Rule: /bibcheck is bounded. It checks whether .bib entries are accurate descriptions of papers that exist. It does NOT check whether each citation supports the claim it's attached to in the manuscript — that is a /referee2 literature audit.

Usage: /bibcheck path/to/refs.bib (per-citation) or /bibcheck --by-field path/to/refs.bib (per-field). Optional --max-parallel N (default 8).

Read the full documentation: skills/bibcheck/README.md

4. The Rhetoric of Decks

Location: presentations/

My philosophy of slide design, plus a tested prompt for generating Beamer presentations. The key insight: aim for MB/MC equivalence across slides (smoothness), not maximum density.

Core principles:

Beauty earns attention; attention enables communication
Titles are assertions, not labels
One idea per slide
Bullets are defeat — find the structure hiding in your list

5. Split-PDF Skill (Download, Split, and Deep-Read Papers)

Location: skills/split-pdf/ (human-readable guide) | .claude/skills/split-pdf/SKILL.md (actual skill)

A Claude Code skill — an invocable /split-pdf command that automates the full pipeline for reading academic papers:

Acquire the PDF (web search + download, or use a local file in place)
Check for an existing _text.md extract or existing splits — offer to reuse
Split into 4-page chunks via PyPDF2, stored in a _build/ directory
Read 3 chunks at a time (~12 pages), pausing between batches
Extract structured reading notes across 8 dimensions into notes.md
Persist the final extraction as <basename>_text.md alongside the source PDF

Why not just read the full PDF? Long PDFs either crash the session ("prompt too long" — unrecoverable) or produce shallow, hallucinated output. Splitting forces Claude to attend carefully to every section and externalizes understanding into markdown notes incrementally.

Key features: In-place PDF handling (no centralized articles/ folder), persistent _text.md extracts (skip re-reading on future invocations), split reuse, and an agent isolation protocol that prevents context bloat when other skills call /split-pdf.

Usage: Type /split-pdf path/to/paper.pdf or /split-pdf "search query for paper"

Read the full documentation: skills/split-pdf/README.md

5b. Beautiful Deck (End-to-End Deck Creation)

Location: skills/beautiful_deck/ | .claude/skills/beautiful_deck/SKILL.md (actual skill)

A Claude Code skill — invoke with /beautiful_deck — that runs the full deck-generation pipeline. This is the operational version of the prompt that used to live at presentations/deck_generation_prompt.md.

What the skill enforces:

Audience triage before any slide is written — commits to a rhetorical balance (ethos / pathos / logos) that fits the audience (academic seminar, teaching lecture, conference talk, working deck, external non-academic)
Original theme, never boilerplate — a custom .sty tuned to the audience. May build on metropolis, moloch, focus, etc. as a foundation, but a reader should not be able to tell what theme package is underneath
Pedagogical movement: Narrative → Application → Picture → Codeblock → Technical — intuition first, technical last. The anti-pattern is the lecture that opens with definitions and ends with an example "for intuition"
Format flexibility — Beamer by default. Accepts Quarto, Typst, reveal.js, Marp on explicit user request
Code-first figure generation — standalone scripts run before \includegraphics{} is written
Zero-warning compile loop — Overfull / Underfull / font / reference warnings all must return zero at every checkpoint, not just the final compile
/tikz cleanup — invoked automatically to catch label collisions and coordinate drift
Rhetoric audit (sub-agent) — checks titles-as-assertions, one-idea-per-slide, MB/MC balance, narrative arc, Devil's Advocate presence
Graphics audit (sub-agent) — checks numerical accuracy, label positioning, axis coherence, color consistency, font sizing

Usage: /beautiful_deck [optional content path or description]

6. Compile Deck (Beamer Presentations with the Rhetoric of Decks)

Location: .claude/commands/compiledeck.md

A Claude Code command — invoke with /compiledeck — that embeds the full Rhetoric of Decks philosophy so you don't have to explain it each time.

The skill asks two questions:

Who is the audience?
- External (seminar, conference, teaching) — sparse, performative, one idea per slide
- Working (coauthors, yourself) — can be more detailed, documents reasoning
What's the tone?
- Professional/Academic — your consistent "house style" for outward-facing work
- Colorful/Expressive — unique, creative design each time

What's embedded:

The Three Laws (Beauty is Function, Cognitive Load is Enemy, Slide Serves Spoken Word)
Titles as assertions, not labels
MB/MC equivalence across slides
The compile loop (compile → fix errors → fix warnings → visual check → repeat)
TikZ coordinate checking and figure label verification

Usage: Type /compiledeck when creating or editing a Beamer deck.

7. TikZ Collision Audit

Location: skills/tikz/ | .claude/skills/tikz/SKILL.md (actual skill)

A Claude Code skill — invoke with /tikz path/to/file.tex — that systematically audits and fixes every visual collision in every TikZ figure in a LaTeX file. Labels sitting on arrows, text inside boxes, arrows crossing each other — found and fixed using measurement, not intuition.

The problem it solves: TikZ compiles silently even when labels overlap arrows or text bleeds into box edges. The compiler catches nothing. This skill catches everything.

How it works: Six ordered passes, each targeting a specific class of collision:

Pass	What it checks
Pass 0	Cross-slide consistency — same diagram on multiple slides must be identical except for deliberate changes
Pass 1	Bézier curves first — computes max curve depth using `(chord/2) × tan(bend/2)`, checks every label against the danger zone
Pass 2	Gap calculations — estimates label width in cm, compares against usable space between nodes
Pass 3	Arrow label keywords — every label must have `above`, `below`, `left`, or `right`
Pass 4	Boundary rule — labels within 0.4cm of any circle, rectangle, or filled shape are a collision
Pass 5	Margin check — minimum clearances between all object pairs

Most common pattern it catches: Step labels on flow diagrams that are wider than the arrow between boxes — they look right in code but overlap box text when rendered.

Full formulas and reference tables: compiledeck/tikz_rules.md

Usage: /tikz path/to/deck.tex

8. Additional Commands

Location: .claude/commands/

Command	Description
`/compiletex [file.tex]`	Compile any LaTeX file and report errors/warnings. Aims for zero warnings.
`/newproject [name]`	Scaffold a new research project with standard folder structure and CLAUDE.md. Also available as a skill.
`/newbook [slug]`	Scaffold a book-shaped project: `memoir`-based LaTeX skeleton, Palatino body, Gov 2001 palette, voiced-sidebar callouts, one chapter per file, bibliography stub, CLAUDE.md with voice cast. Parallel to `/newproject`. See documentation.

9. CLAUDE.md Template

Location: claude/CLAUDE.md

A template for giving Claude persistent memory within a project. Copy it to your project root and fill in the specifics. Claude Code will automatically read it every session.

10. GTD Research Harness (work in progress)

Location: gtd/

A warrant-first GTD system for conducting empirical research with an AI thinking partner. Makes research dialogue recoverable, claims falsifiable, and the current state of knowledge always visible.

The system has three components:

The protocol — how to file hypotheses, insights, and decisions so they accumulate into a coherent knowledge base rather than a pile of chat history
The dashboard — a live Python server (gtd/templates/dashboard_server.py) that reads the filesystem on every request and renders: a re-entry page, narrative with drift detection, a five-stage evidence courtroom, pipeline freshness tracking, hypothesis DAG, and referee report viewer
The interrogation — a structured prompt (gtd/INTERROGATION.md) for periodic reviews that stress-test whether the narrative is actually earned

Core idea: Every claim must have a warrant. Every warrant must have a falsification test. Every falsification test must be filed as an insight. The courtroom tab enforces this by requiring evidence at all five stages before a claim can appear in the narrative or manuscript.

Status: Under active development through use on a live R&R project. Stable enough to copy and adapt; not stable enough to treat as finished.

Read: gtd/README.md | Dashboard template: gtd/templates/dashboard_server.py

Repository Structure

MixtapeTools/
├── README.md                 # You are here
├── workflow.md               # How I use Claude Code for research (START HERE)
├── gtd/                      # GTD research harness (work in progress)
│   ├── README.md            # Full protocol documentation
│   ├── INTERROGATION.md     # Periodic review prompt
│   ├── SKILL.md             # Claude Code skill definition
│   ├── workflow.md          # GTD-specific workflow
│   ├── docs/                # Design rationale and tab-by-tab guide
│   ├── examples/            # Example hypothesis/insight/decision files
│   └── templates/           # Dashboard server + static dashboard
├── skills/                   # Human-readable guides to Claude Code skills
│   ├── README.md            # What skills are, how to use them, how to install
│   ├── blindspot/           # Blindspot: peripheral vision audit for output
│   ├── split-pdf/           # Split-PDF: deep-read academic papers
│   ├── newproject/          # Newproject: scaffold new research projects
│   └── tikz/                # TikZ: collision audit for LaTeX figures
├── .claude/
│   ├── commands/             # Slash commands (invoke with /command-name)
│   │   ├── compiledeck.md   # /compiledeck — Beamer with Rhetoric of Decks
│   │   ├── compiletex.md    # /compiletex — Compile LaTeX, report warnings
│   │   └── newproject.md    # /newproject — Scaffold new research project
│   └── skills/
│       ├── blindspot/        # Skill: make the stone stony again
│       ├── tikz/             # Skill: audit and fix TikZ visual collisions
│       ├── split-pdf/        # Skill: download, split, and deep-read PDFs
│       └── newproject/       # Skill: scaffold new research projects
├── claude/                   # Templates for working with Claude
│   └── CLAUDE.md            # Project context template (copy to your projects)
├── personas/                 # Systematic audit & replication protocols
│   └── referee2.md          # The 5-audit protocol for empirical research
└── presentations/            # Everything about slide decks
    ├── rhetoric_of_decks.md           # Practical principles (condensed)
    ├── rhetoric_of_decks_full_essay.md # Full intellectual framework (600+ lines)
    ├── deck_generation_prompt.md      # The prompt + iterative workflow
    └── examples/
        ├── workflow_deck/             # Visual presentation of the workflow
        ├── rhetoric_of_decks/         # The philosophy deck (45 slides)
        └── gov2001_probability/       # A lecture deck

The Philosophy

Design Before Results

During estimation and analysis, focus entirely on whether the specification is correct. Results are meaningless until the "experiment" is designed on purpose. Don't get excited or worried about point estimates until the design is intentional.

Trust But Verify (Heavily on Verify)

AI makes confident mistakes. Cross-software replication (R = Stata = Python) catches bugs that single-language analysis misses. If results aren't identical to 6+ decimal places across implementations, something is wrong.

Adversarial Review Requires Separation

If you ask the same Claude that wrote code to review it, you're asking a student to grade their own exam. True adversarial review requires a new terminal with fresh context and no prior commitments.

Referee 2 Never Modifies Author Code

The audit must be independent. Referee 2 creates its own replication scripts but never touches the author's code. Only the author modifies the author's code. This separation ensures the audit is truly external.

Formal Process > Informal Vibes

Checklists beat intuition. The Referee 2 protocol works because it specifies exactly what to check, requires concrete deliverables (replication scripts, comparison tables, referee reports), and creates a paper trail.

Documentation Is First-Class Output

If it's not documented, it didn't happen. Every audit produces a dated referee report filed in correspondence/. Every response is documented. Replication scripts are permanent artifacts. Future you (or your collaborators) can reconstruct exactly what happened.

Quick Start

1. Read the Workflow

Start with workflow.md to understand the philosophy.

2. Set Up a Project

Copy claude/CLAUDE.md to your project root. Fill in your project specifics.

3. Do Your Analysis

Work with Claude as a thinking partner, not a code generator. Ask it to explain its understanding. Verify outputs visually. Document as you go.

4. Invoke Referee 2

When you have results worth checking:

Open a new terminal (fresh context is essential)
Paste the contents of personas/referee2.md
Say: "Please audit and replicate the project at [path]. Primary language is [R/Stata/Python]."
Respond to the referee report (fix or justify each concern)
Iterate until verdict is Accept

Project Directory Structure

For the Referee 2 workflow to function properly, your research projects should include:

your_project/
├── CLAUDE.md                 # Project context for Claude
├── correspondence/
│   └── referee2/
│       ├── 2026-02-01_round1_report.md      # Detailed written report
│       ├── 2026-02-01_round1_deck.pdf       # Visual presentation of findings
│       ├── 2026-02-02_round1_response.md    # Author response
│       └── ...
├── code/
│   ├── R/                    # Author's code (ONLY author modifies)
│   ├── stata/
│   ├── python/
│   └── replication/          # Referee 2's replication scripts
├── data/
│   ├── raw/
│   └── clean/
└── output/
    ├── tables/
    └── figures/

Contributing

Have improvements or additions? PRs welcome. I'm particularly interested in:

Additional audit protocols (security reviewer, pedagogy reviewer, etc.)
Examples showing the Referee 2 workflow catching real bugs
Tools for other aspects of coding and teaching

Acknowledgments

Inspired by Boris Cherny's ChernyCode template for AI coding best practices.

License

Use freely. Attribution appreciated but not required.

Last updated: May 2026

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.claude		.claude
claude		claude
gtd		gtd
personas		personas
presentations		presentations
skills		skills
.DS_Store		.DS_Store
README.md		README.md
workflow.md		workflow.md

Folders and files

Latest commit

History

Repository files navigation

MixtapeTools

About This Repo

A Note on Collaboration Style

Who I Am

Start Here: My Workflow

The Tools

1. Referee 2 (Systematic Audit & Replication Protocol)

2. Blindspot (Make the Stone Stony Again)

Referee 2 and Blindspot: Complements, Not Substitutes

3. Bibcheck (Many-Agent Bibliography Audit)

4. The Rhetoric of Decks

5. Split-PDF Skill (Download, Split, and Deep-Read Papers)

5b. Beautiful Deck (End-to-End Deck Creation)

6. Compile Deck (Beamer Presentations with the Rhetoric of Decks)

7. TikZ Collision Audit

8. Additional Commands

9. CLAUDE.md Template

10. GTD Research Harness (work in progress)

Repository Structure

The Philosophy

Design Before Results

Trust But Verify (Heavily on Verify)

Adversarial Review Requires Separation

Referee 2 Never Modifies Author Code

Formal Process > Informal Vibes

Documentation Is First-Class Output

Quick Start

1. Read the Workflow

2. Set Up a Project

3. Do Your Analysis

4. Invoke Referee 2

Project Directory Structure

Contributing

Acknowledgments

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages