Document parser orchestrator — auto-routes to the optimal OSS parser for each document.
Try the demo | API Docs | GitHub
- Auto-routing — detects document type and picks the best parser automatically
- 5 parsers, 1 interface — PyMuPDF, Kreuzberg, Docling, MinerU, Marker
- Every interface — CLI, REST API, MCP server, Web UI
- Image extraction + VLM description — BYOK, auto-detects provider from key prefix
- Compare mode — run all parsers on the same doc, pick the best output
- Cost comparison — shows savings vs AWS Textract, Google Document AI, etc.
- Zero config —
pip install parsemux[pymupdf,cli]and go
curl -fsSL https://raw.githubusercontent.com/vericontext/parsemux/main/install.sh | shOr with pip:
pip install parsemux[pymupdf,kreuzberg,cli]parsemux parse document.pdf # auto-route, markdown output
parsemux parse document.pdf --parser kreuzberg # explicit parser
parsemux parse document.pdf --format json # JSON output
parsemux parse document.pdf --extract-images # extract images as base64
parsemux parse document.pdf --dry-run # preview routing without parsing
parsemux parse ./docs/ --batch # batch directoryparsemux parse doc.pdf --extract-images --describe-images --vlm-key sk-...Provider is auto-detected from key prefix (sk- → OpenAI, sk-ant- → Anthropic, AI → Google).
Default models: gpt-5.4-nano, claude-haiku-4.5, gemini-2.5-flash, qwen2.5vl:7b (local).
parsemux serve # REST API at :8000 + MCP at /mcp# Local (Claude Desktop, Cursor — stdio transport)
parsemux mcp
# Remote (Streamable HTTP)
parsemux mcp --remote --port 8000Claude Desktop config (local):
{
"mcpServers": {
"parsemux": {
"command": "parsemux",
"args": ["mcp"]
}
}
}parsemux schema # machine-readable command schemas
parsemux parse doc.pdf --dry-run # preview routing
parsemux list-parsers --json # available parsers as JSON
parsemux detect doc.pdf --json # MIME + recommended parser| Parser | Best For | Speed |
|---|---|---|
| PyMuPDF | Digital PDFs | 1,000+ pages/sec |
| Kreuzberg | 91+ formats, OCR | Rust core |
| Docling | Tables (97.9%) | CPU |
| MinerU | Scanned docs | GPU |
| Marker | Batch + LLM-enhanced | GPU |
| Self-hosted | Demo | |
|---|---|---|
| File limit | 100 MB | 10 MB |
| Rate limit | None | 10 req/min |
| MCP remote | /mcp |
Disabled |
| VLM key | .env |
BYOK (enter in UI) |
docker compose up
# API: http://localhost:8000
# MCP: http://localhost:8000/mcpContributions welcome!
- Fork the repo
- Create a feature branch (
git checkout -b feat/my-feature) - Run tests (
pytest -q) - Open a PR
See issues for ideas.
Built on the shoulders of these excellent open-source parsers: