Skip to content

Add /read-pdf skill with cached marker-based PDF-to-markdown conversion#6

Open
nsmiller2501 wants to merge 1 commit into
scunning1975:mainfrom
nsmiller2501:pr/read-pdf-marker-pipeline
Open

Add /read-pdf skill with cached marker-based PDF-to-markdown conversion#6
nsmiller2501 wants to merge 1 commit into
scunning1975:mainfrom
nsmiller2501:pr/read-pdf-marker-pipeline

Conversation

@nsmiller2501
Copy link
Copy Markdown

Summary

This PR adds /read-pdf, a local-conversion alternative to /split-pdf for academic-paper ingestion.

What changed

  • Adds a new /read-pdf skill that converts PDFs to markdown with marker-pdf.
  • Adds helper scripts for one-time converter install and cached conversion.
  • Caches conversions by SHA-256 under ~/.cache/claude-pdf-converter.
  • Writes the same _text.md structured output contract used by /split-pdf: bibliographic metadata followed by research notes.
  • Documents the marker-only backend, cache layout, GPU behavior, and failure mode.

Why

/split-pdf is useful for vision-based paper reading, but local markdown conversion can preserve equations, table structure, and figure references with much lower conversation-context cost after setup.

Testing

  • Reviewed the branch diff against the PR notes.
  • Re-audited the helper scripts and documentation after fixing portability/doc drift.
  • Ran /usr/local/bin/python3 -m py_compile .claude/skills/read-pdf/convert.py .claude/skills/read-pdf/install.py.
  • Did not run a live marker-pdf install/conversion smoke test in this PR branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant