Home

DocuMind

Layout-aware document key-information extraction — with a measured proof that layout is doing the work.

DocuMind extracts structured records (key fields and line-item tables) from invoices, forms, and receipts using the page geometry an OCR engine emits, not just the text. A schema verifier reconciles arithmetic relationships and repairs OCR-corrupted values. Every extraction is scored against a ground-truth record, so the claims are measured, not asserted.

The interesting question is not "can a model read a document?" — it is what is the layout actually buying you? DocuMind separates two effects:

Geometry buys field-association accuracy. Holding the verifier fixed, the layout-aware extractor lifts field accuracy from 59% to 100%. On two-column forms, text-only extraction scores 0%; layout scores 100%.
The schema verifier buys arithmetic validity. Holding the extractor fixed, the verifier lifts record validity 91% to 100% by recomputing an OCR-corrupted total from subtotal + tax.

Architecture overview

flowchart LR
    subgraph Source["Document source (env-selectable)"]
        SYN[synthetic<br/>boxes + ground truth · offline]
        PDF[pdf<br/>pdfplumber · optional]
    end
    Source --> TOK[Tokens with bounding boxes]
    subgraph Extract["Extractor (env-selectable)"]
        LAY[layout<br/>geometry: right-of / below / columns]
        TXT[text<br/>reading-order · ablation]
        LLM[ollama / openai<br/>optional extras]
    end
    TOK --> Extract
    Extract --> REC[Record: fields + line items]
    REC --> VER[Schema verifier<br/>amounts → subtotal → +tax → total]
    VER --> FINAL[Reconciled record]
    SYN -.ground truth.-> SCORE[Score vs. ground-truth record]
    FINAL --> SCORE
    SCORE --> M[field acc · cell F1 · doc exact · validity]

Quick start

python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -e ".[dev]"
pytest -q                                            # 38 tests, all offline
documind compare --doctype invoice --seed 1          # four configs head-to-head

Wiki pages

Architecture — document types, extractor design, verifier, null control, synthetic data
Evaluation — benchmark setup, results table, dissociation, reproduce commands
Configuration — env vars, backend matrix, .env.example
Development — setup, code structure, how to add a new document type or extractor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

DocuMind

Architecture overview

Quick start

Wiki pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally