A Codex skill for turning PDFs and images into user-facing Markdown sidecars with internal JSON and quality reports.
The skill preserves source files. It does not add an OCR layer to PDFs, rewrite PDFs, or create searchable replacement PDFs. Instead, it extracts visible or embedded document content into <file>_ocr.md next to the source, while JSON and quality artifacts stay under the hidden .ocr_work/ directory.
Copy and paste this into Codex:
Install the Codex skill from https://github.com/jseook11/codex-pdf-ocr-to-markdown-skill. Copy the repository's `pdf-ocr-to-markdown` skill folder into my user Codex skills directory, preferably `~/.agents/skills/pdf-ocr-to-markdown`. If my Codex setup uses `~/.codex/skills`, use that instead. After copying, run `python3 scripts/setup_dependencies.py --install` from the installed skill folder once, then verify it with `python3 scripts/setup_dependencies.py --check`. If package installation needs approval, ask me during installation. Verify that the installed `SKILL.md` has `name: pdf-ocr-to-markdown`, reload skills if needed, and tell me when it is ready to use.
- Handles PDFs and common image files.
- Extracts embedded PDF text when available.
- Renders pages for visual inspection when layout or OCR quality requires it.
- Routes pages as
text_only,visual_focus,verify_text_and_visual, orfull_vision_ocr. - Produces a Markdown sidecar, internal JSON, page artifacts, route reports, and quality reports.
- Keeps internal run artifacts under
.ocr_work/and leaves only the final Markdown sidecar next to the source. - Uses bundled scripts instead of ad hoc OCR code.
- Codex CLI with user skill support.
- Python 3.
- Local filesystem access to the PDFs or images being processed.
- Python package installation permission if using
scripts/setup_dependencies.py --install.
The exact Python dependencies are managed by scripts/setup_dependencies.py.
The easiest installation path is to paste the prompt above into Codex.
Manual installation:
git clone https://github.com/jseook11/codex-pdf-ocr-to-markdown-skill.git
cd codex-pdf-ocr-to-markdown-skill
INSTALL_DIR="$HOME/.agents/skills/pdf-ocr-to-markdown"
if [ -e "$INSTALL_DIR" ]; then
mv "$INSTALL_DIR" "${INSTALL_DIR}.backup.$(date +%Y%m%d_%H%M%S)"
fi
mkdir -p "$(dirname "$INSTALL_DIR")"
cp -R pdf-ocr-to-markdown "$INSTALL_DIR"
python3 "$INSTALL_DIR/scripts/setup_dependencies.py" --install
python3 "$INSTALL_DIR/scripts/setup_dependencies.py" --checkIf your Codex installation still uses the legacy ~/.codex/skills location, copy the folder there instead:
INSTALL_DIR="$HOME/.codex/skills/pdf-ocr-to-markdown"
if [ -e "$INSTALL_DIR" ]; then
mv "$INSTALL_DIR" "${INSTALL_DIR}.backup.$(date +%Y%m%d_%H%M%S)"
fi
mkdir -p "$(dirname "$INSTALL_DIR")"
cp -R pdf-ocr-to-markdown "$INSTALL_DIR"
python3 "$INSTALL_DIR/scripts/setup_dependencies.py" --install
python3 "$INSTALL_DIR/scripts/setup_dependencies.py" --checkAfter installation, ask Codex to use $pdf-ocr-to-markdown on PDFs or images.
When a user asks an agent to install this skill from https://github.com/jseook11/codex-pdf-ocr-to-markdown-skill, the agent should:
- Clone or download the repository into a temporary location.
- Copy the repository's
pdf-ocr-to-markdown/directory to~/.agents/skills/pdf-ocr-to-markdown/, unless the user's Codex setup uses~/.codex/skills. - If that destination already exists, preserve a timestamped backup or ask before replacing it.
- From the installed skill directory, run
python3 scripts/setup_dependencies.py --install, thenpython3 scripts/setup_dependencies.py --check. - If package installation requires network or pip approval, ask during installation rather than deferring dependency setup to every skill invocation.
- Verify that the installed
SKILL.mdexists and its frontmatter containsname: pdf-ocr-to-markdown. - Do not copy generated
.ocr_work/,*_ocr.md, legacy*_ocr.json, legacy*_ocr_quality.md,__pycache__/, or.DS_Storefiles. - Report the installed path, dependency-check result, and tell the user they can invoke
$pdf-ocr-to-markdown.
From Codex:
Use $pdf-ocr-to-markdown to OCR these files into Markdown.
Examples:
Use $pdf-ocr-to-markdown to convert ./docs/sample.pdf into a Markdown sidecar file.
Use $pdf-ocr-to-markdown on these scanned Korean lecture notes. Preserve headings, reading order, and tables when possible.
Direct script invocation from the installed skill folder:
python3 scripts/batch_ocr.py path/to/file.pdfFor Korean and English documents, the default language hint is kor+eng.
For an input named sample.pdf, final user-facing files are written next to the source:
sample.pdf
sample_ocr.md
Dots inside the source filename are normalized to underscores in generated sidecar names. For example, sample.v1.pdf writes sample_v1_ocr.md, not sample.v1.ocr.md.
If multiple files in the same folder normalize to the same sidecar stem, later files are disambiguated with _2, _3, and so on. For example, a.b.png and a_b.png produce a_b_ocr.md and a_b_2_ocr.md.
Internal artifacts are written under hidden work directories:
.ocr_work/
run_YYYYMMDD_HHMMSS/
ocr_output/run_YYYYMMDD_HHMMSS/
Internal final.json and quality_report.md files are kept under .ocr_work/ocr_output/run_YYYYMMDD_HHMMSS/files/.../. Codex should ask before deleting hidden work files after the final Markdown sidecar exists.
The Python scripts cannot attach page images to a vision-capable model by themselves. For visual routes, Codex must inspect the rendered image through an image-capable path before writing raw visual OCR output. The skill must report text_extraction_complete_visual_pending when visual pages remain unanalyzed.
- This is a workflow-oriented Codex skill, not a general-purpose OCR engine.
- The skill does not modify source PDFs or create searchable replacement PDFs.
- Vision-based OCR requires Codex to inspect rendered page images through an image-capable path.
- OCR quality depends on scan resolution, page rotation, handwriting quality, and document layout.
- Complex tables, diagrams, formulas, and handwritten notes may require manual correction.
- Password-protected or encrypted PDFs must be made readable before processing.
- Add a faster mode for frequent use that reduces vision-pass latency.
- Generate prompts only for pages that actually need visual inspection.
- Make diagnostic artifacts such as detailed JSON, quality reports, and generated prompts opt-in debug outputs.
- Support a Markdown-first visual pass so routine OCR tasks do not spend extra time producing structured internal review files.
Run all tests:
python3 -m unittest discover -s tests -vRun individual tests:
python3 tests/test_batch_ocr.py -v
python3 tests/test_merge_outputs.py -vCompile-check scripts:
PYTHONPYCACHEPREFIX=/private/tmp/pdf-ocr-to-markdown-pycache python3 -m py_compile \
pdf-ocr-to-markdown/scripts/batch_ocr.py \
pdf-ocr-to-markdown/scripts/classify_pages.py \
pdf-ocr-to-markdown/scripts/cleanup_workdir.py \
pdf-ocr-to-markdown/scripts/extract_pdf_text.py \
pdf-ocr-to-markdown/scripts/merge_outputs.py \
pdf-ocr-to-markdown/scripts/render_pages.py \
pdf-ocr-to-markdown/scripts/setup_dependencies.pypdf-ocr-to-markdown/
SKILL.md
agents/openai.yaml
prompts/
scripts/
examples/
tests/
docs/
prompts/ contains route-specific transcription instructions used by the skill, such as text_only, visual_focus, verify_text_and_visual, and full_vision_ocr.
This project was originally created by jseook11.
If you copy, modify, or redistribute substantial portions of this project, please retain the original copyright notice, the LICENSE file, and the NOTICE file if present.
MIT. See LICENSE.