-
Notifications
You must be signed in to change notification settings - Fork 0
Configuration
Rana Faraz edited this page Jun 23, 2026
·
1 revision
| Env var | Default | Options | Effect |
|---|---|---|---|
DOCUMIND_DOC_BACKEND |
synthetic |
synthetic, pdf
|
Document source: synthetic generator (offline) or pdfplumber (requires [pdf]) |
DOCUMIND_EXTRACTOR_BACKEND |
layout |
layout, text, ollama, openai
|
Extraction strategy |
DOCUMIND_VERIFY |
1 |
0, 1
|
Enable/disable schema verifier |
DOCUMIND_DOCTYPE |
invoice |
invoice, form, receipt
|
Document type for single-document commands |
DOCUMIND_SEED |
0 |
integer | Random seed for synthetic generator |
DOCUMIND_SCRAMBLE |
0 |
0, 1
|
Scramble bounding boxes (null test) |
DOCUMIND_OCR_NOISE |
0.15 |
float 0–1 | Fraction of character positions to corrupt |
| Component | Offline default | Optional real backend | Install extra |
|---|---|---|---|
| Source |
synthetic (deterministic generator, boxes + ground truth) |
pdf (pdfplumber) |
pip install -e ".[pdf]" |
| Extractor |
layout (geometry-based) |
ollama, openai
|
pip install -e ".[ollama]" / ".[openai]"
|
| Verifier | on (DOCUMIND_VERIFY=1) |
— | — |
Optional backends are imported lazily and fall back to the offline path if a dependency, server, or key is missing. Selecting an optional backend can never crash the pipeline.
# Document source backend: synthetic (offline) | pdf (requires [pdf] extra)
DOCUMIND_DOC_BACKEND=synthetic
# Extractor backend: layout | text | ollama | openai
DOCUMIND_EXTRACTOR_BACKEND=layout
# Enable schema verifier (1 = on, 0 = off)
DOCUMIND_VERIFY=1
# Document type for single-document CLI commands
DOCUMIND_DOCTYPE=invoice
# Random seed for synthetic generator
DOCUMIND_SEED=0
# Scramble bounding boxes for null test (1 = scramble)
DOCUMIND_SCRAMBLE=0
# OCR noise fraction (0.0–1.0)
DOCUMIND_OCR_NOISE=0.15
# Required only if DOCUMIND_EXTRACTOR_BACKEND=openai
# OPENAI_API_KEY=sk-...
# Required only if DOCUMIND_EXTRACTOR_BACKEND=ollama (default port)
# OLLAMA_BASE_URL=http://localhost:11434pip install -e "." # offline core only (zero runtime deps)
pip install -e ".[dev]" # + pytest, ruff, mypy
pip install -e ".[pdf]" # + pdfplumber (real PDF source)
pip install -e ".[ollama]" # + httpx (Ollama extractor)
pip install -e ".[openai]" # + openai (OpenAI extractor)
pip install -e ".[all]" # all extrasdocker build -t documind .
docker run --rm documind # offline benchmark
docker run --rm -e DOCUMIND_DOCTYPE=form documind documind compare --doctype form --seed 0