Skip to content

Configuration

Rana Faraz edited this page Jun 23, 2026 · 1 revision

Configuration

Environment variables

Env var Default Options Effect
DOCUMIND_DOC_BACKEND synthetic synthetic, pdf Document source: synthetic generator (offline) or pdfplumber (requires [pdf])
DOCUMIND_EXTRACTOR_BACKEND layout layout, text, ollama, openai Extraction strategy
DOCUMIND_VERIFY 1 0, 1 Enable/disable schema verifier
DOCUMIND_DOCTYPE invoice invoice, form, receipt Document type for single-document commands
DOCUMIND_SEED 0 integer Random seed for synthetic generator
DOCUMIND_SCRAMBLE 0 0, 1 Scramble bounding boxes (null test)
DOCUMIND_OCR_NOISE 0.15 float 0–1 Fraction of character positions to corrupt

Backend matrix

Component Offline default Optional real backend Install extra
Source synthetic (deterministic generator, boxes + ground truth) pdf (pdfplumber) pip install -e ".[pdf]"
Extractor layout (geometry-based) ollama, openai pip install -e ".[ollama]" / ".[openai]"
Verifier on (DOCUMIND_VERIFY=1)

Optional backends are imported lazily and fall back to the offline path if a dependency, server, or key is missing. Selecting an optional backend can never crash the pipeline.

.env.example

# Document source backend: synthetic (offline) | pdf (requires [pdf] extra)
DOCUMIND_DOC_BACKEND=synthetic

# Extractor backend: layout | text | ollama | openai
DOCUMIND_EXTRACTOR_BACKEND=layout

# Enable schema verifier (1 = on, 0 = off)
DOCUMIND_VERIFY=1

# Document type for single-document CLI commands
DOCUMIND_DOCTYPE=invoice

# Random seed for synthetic generator
DOCUMIND_SEED=0

# Scramble bounding boxes for null test (1 = scramble)
DOCUMIND_SCRAMBLE=0

# OCR noise fraction (0.0–1.0)
DOCUMIND_OCR_NOISE=0.15

# Required only if DOCUMIND_EXTRACTOR_BACKEND=openai
# OPENAI_API_KEY=sk-...

# Required only if DOCUMIND_EXTRACTOR_BACKEND=ollama (default port)
# OLLAMA_BASE_URL=http://localhost:11434

pip extras

pip install -e "."           # offline core only (zero runtime deps)
pip install -e ".[dev]"      # + pytest, ruff, mypy
pip install -e ".[pdf]"      # + pdfplumber (real PDF source)
pip install -e ".[ollama]"   # + httpx (Ollama extractor)
pip install -e ".[openai]"   # + openai (OpenAI extractor)
pip install -e ".[all]"      # all extras

Docker

docker build -t documind .
docker run --rm documind                    # offline benchmark
docker run --rm -e DOCUMIND_DOCTYPE=form documind documind compare --doctype form --seed 0

Clone this wiki locally