CarExtract - Multi-Provider Medical Document Field Extraction

Define your fields. Upload care documents. Run any vision model. Compare extraction accuracy side by side.

Project Overview

CarExtract is a full-stack platform for extracting structured fields from medical and care documents — patient intake forms, clinical notes, prescriptions, referrals, and more — using vision-capable language models.

Developed as an open-source reference implementation under the Cloud2 Labs Innovation Hub, CarExtract demonstrates how user-defined field schemas, OpenAI-compatible provider routing, and inline ground truth editing can be packaged into a production-grade microservices architecture. Upload typed or handwritten care documents, define the fields you need to extract, connect any vision model via a single endpoint config, and measure extraction accuracy across providers — against ground truth you review and correct yourself.

How It Works

Define Fields: Users define extraction fields directly in the UI — key, display name, data type (string, date, phone, address, number), description, and optional example. Fields are stored in config_data/fields.json and injected into prompts at runtime.
Upload Documents: Upload care documents (JPG, PNG, PDF). Supports typed and handwritten documents.
Connect Providers: Configure one or more OpenAI-compatible providers — GPT-4o, Claude, Gemini, local Ollama, vLLM, OpenRouter, or custom inference servers. Live connectivity tests confirm reachability before a run.
Extract & Review: Run on-demand extraction per document. Extracted values display inline as editable inputs — review and correct before saving as verified ground truth.
Analyse Accuracy: Trigger a multi-provider analysis run. Type-aware comparison (date normalisation, phone digit stripping, fuzzy address scoring, numeric parsing) produces per-field and per-model accuracy metrics.
Export Results: Full audit CSV exports extracted values, ground truth values, and field-level status for every field, document, and provider combination.

Architecture

The application follows a two-service containerised architecture with a FastAPI backend handling all field schema management, provider routing, extraction, and evaluation logic — paired with a React + TypeScript frontend for document management, live analysis runs, result visualisation, and CSV export.

graph LR

  %% ====== FRONTEND ======
  subgraph FE[Frontend]
    A[React + TypeScript<br/>Port 3000]
  end

  %% ====== BACKEND ======
  subgraph BE[Backend - FastAPI<br/>Port 8000]
    B[API Router]
    FS[Fields Store<br/>fields.json]
    PS[Providers Store<br/>providers.json]
    PB[Prompt Builder]
    EX[Extraction Engine<br/>Async + Concurrency]
    GT[Ground Truth Store<br/>ground_truth.json]
    EV[Evaluator<br/>Type-Aware Scoring]
    AR[Analysis Runner<br/>Multi-Provider]
    RR[Results Store<br/>results/]
  end

  %% ====== EXTERNAL ======
  subgraph EXT[External Providers]
    P1[OpenAI / GPT-4o]
    P2[Claude / Gemini]
    P3[Ollama / vLLM<br/>Local]
    P4[OpenRouter /<br/>Custom Endpoint]
  end

  %% ====== CONNECTIONS ======
  A -->|HTTP /api| B

  B --> FS
  B --> PS
  B --> PB
  B --> EX
  B --> GT
  B --> AR
  B --> RR

  FS -->|Field Schema| PB
  PS -->|Provider Configs| EX
  PB -->|Dynamic Prompt| EX
  EX -->|Extracted Fields| GT
  GT -->|Ground Truth| EV
  EX -->|Extraction Results| EV
  EV -->|Scored Results| AR
  AR -->|Run Results| RR

  EX -->|API Call| P1
  EX -->|API Call| P2
  EX -->|API Call| P3
  EX -->|API Call| P4

  B -->|JSON| A

  %% ====== STYLES ======
  style A fill:#e1f5ff
  style B fill:#fff4e1
  style FS fill:#e8f5e9
  style PS fill:#e8f5e9
  style PB fill:#ffe1f5
  style EX fill:#ffe1f5
  style GT fill:#e8f5e9
  style EV fill:#ffe1f5
  style AR fill:#ffe1f5
  style RR fill:#e8f5e9
  style P1 fill:#fff3cd
  style P2 fill:#fff3cd
  style P3 fill:#fff3cd
  style P4 fill:#fff3cd

Architecture Components

Frontend (React + TypeScript)

Document management — upload, list, delete care documents
Fields management — define, reorder, edit extraction field schemas
Providers management — configure, test, and manage LLM provider endpoints
Document Extract page — per-document on-demand extraction with inline ground truth editing
Analysis page — trigger multi-provider runs with live progress polling
Results dashboard — per-model accuracy cards, radar charts, latency percentiles, cost telemetry
CSV export for offline analysis and clinical audit trails

Backend Services

Fields Store: Persists field schema definitions to config_data/fields.json; serves as the source of truth for all prompt construction
Providers Store: Manages OpenAI-compatible provider configurations (base URL, model ID, API key, temperature, max tokens); stored in config_data/providers.json
Prompt Builder: Constructs system and user prompts dynamically at runtime from the current field schema and per-run extraction instructions
Extraction Engine: Async multi-provider extraction with bounded concurrency; encodes document images, calls vision LLMs, and parses structured JSON responses
Ground Truth Store: Persists human-reviewed and corrected extraction values per document; supports inline editing and partial patch updates
Evaluator: Applies type-aware field comparison — exact string matching, date normalisation, phone digit stripping, fuzzy address scoring (RapidFuzz), numeric parsing
Analysis Runner: Orchestrates multi-provider runs, tracks run status and progress, aggregates per-field and per-model accuracy metrics
Results Store: Persists completed analysis results per run ID; supports retrieval and CSV export

External Integration

Any OpenAI-compatible API — GPT-4o, Claude (via proxy), Gemini (via proxy), Ollama (local), vLLM (local/on-prem), OpenRouter, or custom inference servers

Get Started

Prerequisites

Before you begin, ensure you have the following installed:

Docker and Docker Compose (v20.10+)
- Install Docker
- Install Docker Compose
At least one vision-capable LLM provider (any of the following):
- OpenAI API key (GPT-4o or GPT-4o-mini with vision)
- Anthropic API key (Claude with vision, via OpenAI-compatible proxy)
- Local Ollama with a vision model (e.g. llava, llama3.2-vision)
- vLLM endpoint with a vision model
- Any OpenRouter or custom OpenAI-compatible endpoint

Verify Installation

# Check Docker
docker --version
docker compose version

# Verify Docker is running
docker ps

Quick Start

1. Clone the Repository

git clone https://github.com/cld2labs/CareXtract.git
cd CareXtract

2. Configure Environment Variables

Copy the example environment file:

cp backend/.env.example backend/.env

Edit backend/.env — only Langfuse observability settings are required here. Provider API keys and endpoints are configured in the UI after startup.

# Langfuse Observability (optional — set LANGFUSE_ENABLED=false to skip)
LANGFUSE_ENABLED=false

3. Launch the Application

# Build and start all services
docker compose up --build

# Or run in detached mode (background)
docker compose up -d --build

4. Access the Application

Once containers are running:

Frontend UI: http://localhost:3000
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs
API Redoc: http://localhost:8000/redoc

5. Add Your First Provider

Open http://localhost:3000
Navigate to the Providers page
Click Add Provider
Enter your provider details:

Name: e.g. OpenAI GPT-4o
Base URL: e.g. https://api.openai.com/v1 (leave blank for OpenAI default)
API Key: your API key
Model: e.g. gpt-4o

Click Test Connection to verify
Save the provider

6. Upload Documents and Extract

Navigate to Documents and upload care documents (JPG, PNG, or PDF)
Navigate to Fields and define the fields you want to extract
Navigate to Documents → select a document → click Extract to run on-demand extraction
Review and correct extracted values inline — click Save as Ground Truth when verified
Navigate to Analysis → click Run Analysis to compare all providers
View results in the Results dashboard

7. Stop the Application

docker compose down

Project Structure

CareXtract/
├── backend/
│   ├── api/
│   │   └── routes.py             # All API endpoints (fields, providers, docs, analysis, results)
│   ├── extractors/
│   │   ├── base.py               # Base extractor interface
│   │   └── dynamic_extractor.py  # Vision LLM extraction with dynamic prompts
│   ├── analysis/
│   │   ├── evaluator.py          # Type-aware field comparison logic
│   │   ├── metrics.py            # Per-model accuracy metric aggregation
│   │   └── runner.py             # Async multi-provider analysis runner
│   ├── models/
│   │   └── schemas.py            # Pydantic models (FieldDefinition, ProviderConfig, etc.)
│   ├── main.py                   # FastAPI application entry point
│   ├── config.py                 # Configuration and settings
│   ├── fields_store.py           # Field schema persistence
│   ├── providers_store.py        # Provider config persistence
│   ├── prompt_builder.py         # Dynamic prompt construction
│   ├── requirements.txt          # Python dependencies
│   └── Dockerfile                # Backend container
├── frontend/
│   ├── src/
│   │   ├── pages/
│   │   │   ├── LandingPage.tsx   # Home / overview
│   │   │   ├── DocumentsPage.tsx # Document management and extraction
│   │   │   ├── FieldsPage.tsx    # Field schema management
│   │   │   ├── ProvidersPage.tsx # Provider configuration and testing
│   │   │   ├── AnalysisPage.tsx  # Analysis run management
│   │   │   └── ResultsPage.tsx   # Results visualisation and export
│   │   ├── api/
│   │   │   └── client.ts         # API request utilities
│   │   ├── types/
│   │   │   └── index.ts          # TypeScript interfaces
│   │   └── App.tsx               # Application root and routing
│   ├── package.json              # npm dependencies
│   ├── vite.config.ts            # Vite bundler config
│   ├── tsconfig.json             # TypeScript configuration
│   ├── tailwind.config.js        # Tailwind CSS theme
│   └── Dockerfile                # Frontend container (multi-stage nginx)
├── docs/
│   └── assets/                   # Documentation images
├── .github/
│   └── workflows/
│       └── code-scans.yaml       # Security scanning (Trivy + Bandit)
├── docker-compose.yml            # Service orchestration
├── .gitignore                    # Git exclusions
├── LICENSE.md                    # MIT License
├── README.md                     # Project documentation
├── CONTRIBUTING.md               # Contributing guidelines
├── DISCLAIMER.md                 # Legal disclaimer
├── SECURITY.md                   # Security policy
└── TERMS_AND_CONDITIONS.md       # Terms of use

Usage Guide

Defining Fields

Navigate to Fields to define what to extract from your documents.

Each field has:

Key: Unique identifier used in prompts and exports (e.g. patient_name)
Display Name: Human-readable label (e.g. Patient Name)
Type: string, date, phone, address, or number
Description: Instructions for the model (e.g. Full legal name as it appears on the form)
Example (optional): A sample value to guide the model

Fields are injected into system prompts at runtime. Reorder fields to control prompt priority.

Configuring Providers

Navigate to Providers to add and manage LLM endpoints.

Each provider has:

Name: Display label (e.g. OpenAI GPT-4o)
Base URL: OpenAI-compatible endpoint (https://api.openai.com/v1, Ollama: http://localhost:11434/v1, etc.)
API Key: Authentication token
Model: Vision model ID (e.g. gpt-4o, llava, llama3.2-vision)
Temperature / Max Tokens: Generation parameters

Use Test Connection to confirm provider reachability before running analysis.

Extracting and Labelling Ground Truth

Navigate to Documents and upload care documents
Select a document and choose a provider → click Extract
Review extracted field values inline — edit any incorrect or missing values
Click Save as Ground Truth to persist corrected values as the reference for accuracy evaluation

Ground truth can be updated at any time. Re-run analysis after corrections to refresh accuracy scores.

Running Multi-Provider Analysis

Navigate to Analysis
(Optional) Add extraction instructions to guide the model on document-specific context — date format conventions, abbreviation standards, handwriting characteristics
Select providers to include in the run
Click Run Analysis
Monitor live progress — current model and document count update in real time
Results redirect automatically to the visualisation dashboard on completion

Reading Results

The Results dashboard shows:

Per-model accuracy cards — overall accuracy, TRUE_POSITIVE rate, FALSE_POSITIVE, FALSE_NEGATIVE, INCORRECT, PARSE_ERROR counts
Per-field radar charts — compare field-level accuracy across providers
Latency metrics — avg, P50, P95 per provider
Cost telemetry — estimated cost per extraction
Per-document detail — drill down into individual document extraction results

Click Export CSV to download a full audit file with extracted values, ground truth, and field-level status for every combination.

Environment Variables

Configure application behaviour using environment variables in backend/.env:

Variable	Description	Default	Type
`LANGFUSE_ENABLED`	Enable Langfuse observability tracing	`false`	boolean
`LANGFUSE_PUBLIC_KEY`	Langfuse project public key	-	string
`LANGFUSE_SECRET_KEY`	Langfuse project secret key	-	string
`LANGFUSE_HOST`	Langfuse server host (use `http://host.docker.internal:3001` for local Docker)	`https://cloud.langfuse.com`	string

Note: Provider API keys and endpoints are not stored in .env. They are configured in the UI and persisted to config_data/providers.json, which is excluded from version control.

Provider Configuration

CarExtract supports any OpenAI-compatible vision API. Common provider setups:

OpenAI

Field	Value
Base URL	(leave blank — uses OpenAI default)
Model	`gpt-4o` or `gpt-4o-mini`
API Key	`sk-...`

Ollama (Local)

# Pull a vision model
ollama pull llava
# or
ollama pull llama3.2-vision

Field	Value
Base URL	`http://host.docker.internal:11434/v1`
Model	`llava` or `llama3.2-vision`
API Key	`ollama` (any non-empty string)

OpenRouter

Field	Value
Base URL	`https://openrouter.ai/api/v1`
Model	e.g. `anthropic/claude-3.5-sonnet`
API Key	your OpenRouter key

Intel OPEA Enterprise Inference

Field	Value
Base URL	your EI gateway endpoint
Model	model identifier from EI catalog
API Key	Keycloak bearer token

Inference Metrics

Operational performance across multiple providers — zero shot, without optimisations.

Workload: Extracting 12 fields from 50 patient intake forms
Use Case: CarExtract

Provider	Model	Deployment	Context Window	Avg Tokens / Request	P50 Latency	P95 Latency	Concurrency	Throughput	Docs / hr
OpenAI	`gpt-4o-mini`	Cloud API	128K	26,187	4.5s	5.8s	c=1	10.6 docs/min	636 docs/hr
							c=2	8.5 docs/min	510 docs/hr
							c=3	8.0 docs/min	480 docs/hr
Intel OPEA EI	`Qwen 2.5 VL 3B Instruct`	On-Prem (Xeon)	32,768	2,249	14.3s	15.2s	c=1	4.8 docs/min	288 docs/hr
							c=2	8.5 docs/min	510 docs/hr
							c=3	12.5 docs/min	750 docs/hr
							c=5	17.1 docs/min	1,026 docs/hr

Notes:

All figures use the same 12-field extraction schema against a consistent set of patient intake forms. Token counts vary per document type and handwriting quality.

Intel OPEA Enterprise Inference runs on Intel Xeon CPUs without GPU acceleration.

Model Capabilities

GPT-4o-mini

OpenAI's cost-efficient multimodal vision model, used for structured field extraction from care document images via the cloud API.

Attribute	Details
Parameters	Not publicly disclosed
Architecture	Multimodal Transformer (text + image input, text output)
Context Window	128,000 tokens input / 16,384 tokens max output
Vision Input	JPEG, PNG, GIF, WEBP — native image understanding
Structured Output	JSON mode and strict JSON schema adherence supported
Multilingual	Broad multilingual support
Pricing	$0.15 / 1M input tokens, $0.60 / 1M output tokens (Batch API: 50% discount)
Fine-Tuning	Supervised fine-tuning via OpenAI API
License	Proprietary (OpenAI Terms of Use)
Deployment	Cloud-only — OpenAI API or Azure OpenAI Service. No self-hosted or on-prem option

Qwen 2.5 VL 3B Instruct

A 3-billion-parameter open-weight vision-language model from Alibaba, optimised for document understanding and structured extraction. Deployed on-prem via Intel OPEA Enterprise Inference on Xeon CPUs.

Attribute	Details
Parameters	3B
Architecture	Vision-Language Transformer — Qwen2.5 language backbone with visual encoder
Context Window	32,768 tokens
Vision Input	Images and video; strong document and chart understanding
Structured Output	JSON mode supported
Multilingual	Strong multilingual support (English, Chinese, and others)
Quantization Formats	GGUF, AWQ, GPTQ
Inference Runtimes	vLLM, Ollama, llama.cpp, Intel OPEA Enterprise Inference
License	Apache 2.0
Deployment	Local, on-prem, air-gapped — full data sovereignty

Comparison Summary

Capability	GPT-4o-mini	Qwen 2.5 VL 3B Instruct
Medical document field extraction	Yes	Yes
Native vision / image input	Yes	Yes
On-prem / air-gapped deployment	No	Yes
Data sovereignty	No (data sent to cloud API)	Full (weights run locally)
Open weights	No (proprietary)	Yes (Apache 2.0)
Custom fine-tuning	Supervised fine-tuning (API only)	Full fine-tuning + LoRA adapters
Structured JSON output	Yes	Yes
Context window	128K	32K
Inference on CPU (no GPU required)	No	Yes (via Intel OPEA on Xeon)

GPT-4o-mini offers higher context, lower latency, and simpler deployment via cloud API — well-suited for teams without on-prem infrastructure. Qwen 2.5 VL 3B Instruct offers open weights, data sovereignty, and CPU-based local deployment — making it suitable for air-gapped, regulated, or cost-sensitive healthcare environments where patient data cannot leave the network.

Technology Stack

Backend

Framework: FastAPI (Python 3.11)
Server: Uvicorn (ASGI)
Document Processing: pypdfium2 (PDF-to-image), Pillow (image encoding)
Extraction: OpenAI SDK (openai Python library) — works with any OpenAI-compatible provider
Accuracy Evaluation: RapidFuzz (fuzzy address matching), custom type-aware comparison
State Management: JSON file persistence (config_data/, ground_truth/, results/)
Config Management: pydantic-settings + python-dotenv
Observability: Langfuse (optional tracing)

Frontend

Framework: React 18 + TypeScript
Build Tool: Vite
Styling: Tailwind CSS + PostCSS
State & Data Fetching: TanStack Query (React Query)
Routing: React Router v6
Charts: Recharts
Icons: Lucide React
Server: nginx (multi-stage Docker build)

Troubleshooting

Common Issues

Issue: Backend API not responding

# Check service health
curl http://localhost:8000/docs

# View backend logs
docker compose logs backend

Issue: Provider test connection fails

Verify the Base URL includes /v1 if required by the provider
Confirm the API key is valid and has vision model access
For Ollama inside Docker: use http://host.docker.internal:11434/v1 not localhost
For self-hosted endpoints: confirm the container network can reach the endpoint

Issue: Extraction returns empty or malformed JSON

Ensure the model supports vision input (image-capable model ID)
Try a lower temperature (0.1–0.2) for more deterministic structured output
Add explicit extraction instructions: Respond only with a JSON object. Do not include markdown.

Issue: PDF documents not extracting

PDFs are converted to images page by page — only the first page is used by default
Ensure the PDF is not password-protected
Max upload size: 20MB per file, 50 documents total

Issue: Frontend can't connect to backend

Verify both services are running: docker compose ps
The frontend proxies /api to http://backend:8000 inside Docker — do not change the nginx config base path

Debug Mode

Enable debug logging:

# Update backend/.env
LOG_LEVEL=DEBUG

# Restart services
docker compose restart backend
docker compose logs -f backend

License

This project is licensed under the MIT License.

Disclaimer

CarExtract is provided for demonstration and informational purposes only. It is not a certified medical device and does not constitute clinical decision support. Always validate extracted data before use in any patient-facing or regulated workflow. AI-extracted field values must be reviewed by a qualified clinician or administrator before use in official records.

For full disclaimer details, see DISCLAIMER.md

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
backend		backend
docs/assets		docs/assets
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DISCLAIMER.md		DISCLAIMER.md
LICENSE.md		LICENSE.md
README.md		README.md
SECURITY.md		SECURITY.md
TERMS_AND_CONDITIONS.md		TERMS_AND_CONDITIONS.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

CarExtract - Multi-Provider Medical Document Field Extraction

Table of Contents

Project Overview

How It Works

Architecture

Architecture Components

Get Started

Prerequisites

Verify Installation

Quick Start

1. Clone the Repository

2. Configure Environment Variables

3. Launch the Application

4. Access the Application

5. Add Your First Provider

6. Upload Documents and Extract

7. Stop the Application

Project Structure

Usage Guide

Defining Fields

Configuring Providers

Extracting and Labelling Ground Truth

Running Multi-Provider Analysis

Reading Results

Environment Variables

Provider Configuration

OpenAI

Ollama (Local)

OpenRouter

Intel OPEA Enterprise Inference

Inference Metrics

Model Capabilities

GPT-4o-mini

Qwen 2.5 VL 3B Instruct

Comparison Summary

Technology Stack

Backend

Frontend

Troubleshooting

Common Issues

Debug Mode

License

Disclaimer

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages