Define your fields. Upload care documents. Run any vision model. Compare extraction accuracy side by side.
- Project Overview
- Architecture
- Get Started
- Project Structure
- Usage Guide
- Environment Variables
- Provider Configuration
- Inference Metrics
- Model Capabilities
- Technology Stack
- Troubleshooting
- License
CarExtract is a full-stack platform for extracting structured fields from medical and care documents — patient intake forms, clinical notes, prescriptions, referrals, and more — using vision-capable language models.
Developed as an open-source reference implementation under the Cloud2 Labs Innovation Hub, CarExtract demonstrates how user-defined field schemas, OpenAI-compatible provider routing, and inline ground truth editing can be packaged into a production-grade microservices architecture. Upload typed or handwritten care documents, define the fields you need to extract, connect any vision model via a single endpoint config, and measure extraction accuracy across providers — against ground truth you review and correct yourself.
- Define Fields: Users define extraction fields directly in the UI — key, display name, data type (string, date, phone, address, number), description, and optional example. Fields are stored in
config_data/fields.jsonand injected into prompts at runtime. - Upload Documents: Upload care documents (JPG, PNG, PDF). Supports typed and handwritten documents.
- Connect Providers: Configure one or more OpenAI-compatible providers — GPT-4o, Claude, Gemini, local Ollama, vLLM, OpenRouter, or custom inference servers. Live connectivity tests confirm reachability before a run.
- Extract & Review: Run on-demand extraction per document. Extracted values display inline as editable inputs — review and correct before saving as verified ground truth.
- Analyse Accuracy: Trigger a multi-provider analysis run. Type-aware comparison (date normalisation, phone digit stripping, fuzzy address scoring, numeric parsing) produces per-field and per-model accuracy metrics.
- Export Results: Full audit CSV exports extracted values, ground truth values, and field-level status for every field, document, and provider combination.
The application follows a two-service containerised architecture with a FastAPI backend handling all field schema management, provider routing, extraction, and evaluation logic — paired with a React + TypeScript frontend for document management, live analysis runs, result visualisation, and CSV export.
graph LR
%% ====== FRONTEND ======
subgraph FE[Frontend]
A[React + TypeScript<br/>Port 3000]
end
%% ====== BACKEND ======
subgraph BE[Backend - FastAPI<br/>Port 8000]
B[API Router]
FS[Fields Store<br/>fields.json]
PS[Providers Store<br/>providers.json]
PB[Prompt Builder]
EX[Extraction Engine<br/>Async + Concurrency]
GT[Ground Truth Store<br/>ground_truth.json]
EV[Evaluator<br/>Type-Aware Scoring]
AR[Analysis Runner<br/>Multi-Provider]
RR[Results Store<br/>results/]
end
%% ====== EXTERNAL ======
subgraph EXT[External Providers]
P1[OpenAI / GPT-4o]
P2[Claude / Gemini]
P3[Ollama / vLLM<br/>Local]
P4[OpenRouter /<br/>Custom Endpoint]
end
%% ====== CONNECTIONS ======
A -->|HTTP /api| B
B --> FS
B --> PS
B --> PB
B --> EX
B --> GT
B --> AR
B --> RR
FS -->|Field Schema| PB
PS -->|Provider Configs| EX
PB -->|Dynamic Prompt| EX
EX -->|Extracted Fields| GT
GT -->|Ground Truth| EV
EX -->|Extraction Results| EV
EV -->|Scored Results| AR
AR -->|Run Results| RR
EX -->|API Call| P1
EX -->|API Call| P2
EX -->|API Call| P3
EX -->|API Call| P4
B -->|JSON| A
%% ====== STYLES ======
style A fill:#e1f5ff
style B fill:#fff4e1
style FS fill:#e8f5e9
style PS fill:#e8f5e9
style PB fill:#ffe1f5
style EX fill:#ffe1f5
style GT fill:#e8f5e9
style EV fill:#ffe1f5
style AR fill:#ffe1f5
style RR fill:#e8f5e9
style P1 fill:#fff3cd
style P2 fill:#fff3cd
style P3 fill:#fff3cd
style P4 fill:#fff3cd
Frontend (React + TypeScript)
- Document management — upload, list, delete care documents
- Fields management — define, reorder, edit extraction field schemas
- Providers management — configure, test, and manage LLM provider endpoints
- Document Extract page — per-document on-demand extraction with inline ground truth editing
- Analysis page — trigger multi-provider runs with live progress polling
- Results dashboard — per-model accuracy cards, radar charts, latency percentiles, cost telemetry
- CSV export for offline analysis and clinical audit trails
Backend Services
- Fields Store: Persists field schema definitions to
config_data/fields.json; serves as the source of truth for all prompt construction - Providers Store: Manages OpenAI-compatible provider configurations (base URL, model ID, API key, temperature, max tokens); stored in
config_data/providers.json - Prompt Builder: Constructs system and user prompts dynamically at runtime from the current field schema and per-run extraction instructions
- Extraction Engine: Async multi-provider extraction with bounded concurrency; encodes document images, calls vision LLMs, and parses structured JSON responses
- Ground Truth Store: Persists human-reviewed and corrected extraction values per document; supports inline editing and partial patch updates
- Evaluator: Applies type-aware field comparison — exact string matching, date normalisation, phone digit stripping, fuzzy address scoring (RapidFuzz), numeric parsing
- Analysis Runner: Orchestrates multi-provider runs, tracks run status and progress, aggregates per-field and per-model accuracy metrics
- Results Store: Persists completed analysis results per run ID; supports retrieval and CSV export
External Integration
- Any OpenAI-compatible API — GPT-4o, Claude (via proxy), Gemini (via proxy), Ollama (local), vLLM (local/on-prem), OpenRouter, or custom inference servers
Before you begin, ensure you have the following installed:
- Docker and Docker Compose (v20.10+)
- At least one vision-capable LLM provider (any of the following):
- OpenAI API key (GPT-4o or GPT-4o-mini with vision)
- Anthropic API key (Claude with vision, via OpenAI-compatible proxy)
- Local Ollama with a vision model (e.g.
llava,llama3.2-vision) - vLLM endpoint with a vision model
- Any OpenRouter or custom OpenAI-compatible endpoint
# Check Docker
docker --version
docker compose version
# Verify Docker is running
docker psgit clone https://github.com/cld2labs/CareXtract.git
cd CareXtractCopy the example environment file:
cp backend/.env.example backend/.envEdit backend/.env — only Langfuse observability settings are required here. Provider API keys and endpoints are configured in the UI after startup.
# Langfuse Observability (optional — set LANGFUSE_ENABLED=false to skip)
LANGFUSE_ENABLED=false# Build and start all services
docker compose up --build
# Or run in detached mode (background)
docker compose up -d --buildOnce containers are running:
- Frontend UI: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- API Redoc: http://localhost:8000/redoc
- Open http://localhost:3000
- Navigate to the Providers page
- Click Add Provider
- Enter your provider details:
- Name: e.g.
OpenAI GPT-4o - Base URL: e.g.
https://api.openai.com/v1(leave blank for OpenAI default) - API Key: your API key
- Model: e.g.
gpt-4o
- Click Test Connection to verify
- Save the provider
- Navigate to Documents and upload care documents (JPG, PNG, or PDF)
- Navigate to Fields and define the fields you want to extract
- Navigate to Documents → select a document → click Extract to run on-demand extraction
- Review and correct extracted values inline — click Save as Ground Truth when verified
- Navigate to Analysis → click Run Analysis to compare all providers
- View results in the Results dashboard
docker compose downCareXtract/
├── backend/
│ ├── api/
│ │ └── routes.py # All API endpoints (fields, providers, docs, analysis, results)
│ ├── extractors/
│ │ ├── base.py # Base extractor interface
│ │ └── dynamic_extractor.py # Vision LLM extraction with dynamic prompts
│ ├── analysis/
│ │ ├── evaluator.py # Type-aware field comparison logic
│ │ ├── metrics.py # Per-model accuracy metric aggregation
│ │ └── runner.py # Async multi-provider analysis runner
│ ├── models/
│ │ └── schemas.py # Pydantic models (FieldDefinition, ProviderConfig, etc.)
│ ├── main.py # FastAPI application entry point
│ ├── config.py # Configuration and settings
│ ├── fields_store.py # Field schema persistence
│ ├── providers_store.py # Provider config persistence
│ ├── prompt_builder.py # Dynamic prompt construction
│ ├── requirements.txt # Python dependencies
│ └── Dockerfile # Backend container
├── frontend/
│ ├── src/
│ │ ├── pages/
│ │ │ ├── LandingPage.tsx # Home / overview
│ │ │ ├── DocumentsPage.tsx # Document management and extraction
│ │ │ ├── FieldsPage.tsx # Field schema management
│ │ │ ├── ProvidersPage.tsx # Provider configuration and testing
│ │ │ ├── AnalysisPage.tsx # Analysis run management
│ │ │ └── ResultsPage.tsx # Results visualisation and export
│ │ ├── api/
│ │ │ └── client.ts # API request utilities
│ │ ├── types/
│ │ │ └── index.ts # TypeScript interfaces
│ │ └── App.tsx # Application root and routing
│ ├── package.json # npm dependencies
│ ├── vite.config.ts # Vite bundler config
│ ├── tsconfig.json # TypeScript configuration
│ ├── tailwind.config.js # Tailwind CSS theme
│ └── Dockerfile # Frontend container (multi-stage nginx)
├── docs/
│ └── assets/ # Documentation images
├── .github/
│ └── workflows/
│ └── code-scans.yaml # Security scanning (Trivy + Bandit)
├── docker-compose.yml # Service orchestration
├── .gitignore # Git exclusions
├── LICENSE.md # MIT License
├── README.md # Project documentation
├── CONTRIBUTING.md # Contributing guidelines
├── DISCLAIMER.md # Legal disclaimer
├── SECURITY.md # Security policy
└── TERMS_AND_CONDITIONS.md # Terms of use
Navigate to Fields to define what to extract from your documents.
Each field has:
- Key: Unique identifier used in prompts and exports (e.g.
patient_name) - Display Name: Human-readable label (e.g.
Patient Name) - Type:
string,date,phone,address, ornumber - Description: Instructions for the model (e.g.
Full legal name as it appears on the form) - Example (optional): A sample value to guide the model
Fields are injected into system prompts at runtime. Reorder fields to control prompt priority.
Navigate to Providers to add and manage LLM endpoints.
Each provider has:
- Name: Display label (e.g.
OpenAI GPT-4o) - Base URL: OpenAI-compatible endpoint (
https://api.openai.com/v1, Ollama:http://localhost:11434/v1, etc.) - API Key: Authentication token
- Model: Vision model ID (e.g.
gpt-4o,llava,llama3.2-vision) - Temperature / Max Tokens: Generation parameters
Use Test Connection to confirm provider reachability before running analysis.
- Navigate to Documents and upload care documents
- Select a document and choose a provider → click Extract
- Review extracted field values inline — edit any incorrect or missing values
- Click Save as Ground Truth to persist corrected values as the reference for accuracy evaluation
Ground truth can be updated at any time. Re-run analysis after corrections to refresh accuracy scores.
- Navigate to Analysis
- (Optional) Add extraction instructions to guide the model on document-specific context — date format conventions, abbreviation standards, handwriting characteristics
- Select providers to include in the run
- Click Run Analysis
- Monitor live progress — current model and document count update in real time
- Results redirect automatically to the visualisation dashboard on completion
The Results dashboard shows:
- Per-model accuracy cards — overall accuracy, TRUE_POSITIVE rate, FALSE_POSITIVE, FALSE_NEGATIVE, INCORRECT, PARSE_ERROR counts
- Per-field radar charts — compare field-level accuracy across providers
- Latency metrics — avg, P50, P95 per provider
- Cost telemetry — estimated cost per extraction
- Per-document detail — drill down into individual document extraction results
Click Export CSV to download a full audit file with extracted values, ground truth, and field-level status for every combination.
Configure application behaviour using environment variables in backend/.env:
| Variable | Description | Default | Type |
|---|---|---|---|
LANGFUSE_ENABLED |
Enable Langfuse observability tracing | false |
boolean |
LANGFUSE_PUBLIC_KEY |
Langfuse project public key | - | string |
LANGFUSE_SECRET_KEY |
Langfuse project secret key | - | string |
LANGFUSE_HOST |
Langfuse server host (use http://host.docker.internal:3001 for local Docker) |
https://cloud.langfuse.com |
string |
Note: Provider API keys and endpoints are not stored in .env. They are configured in the UI and persisted to config_data/providers.json, which is excluded from version control.
CarExtract supports any OpenAI-compatible vision API. Common provider setups:
| Field | Value |
|---|---|
| Base URL | (leave blank — uses OpenAI default) |
| Model | gpt-4o or gpt-4o-mini |
| API Key | sk-... |
# Pull a vision model
ollama pull llava
# or
ollama pull llama3.2-vision| Field | Value |
|---|---|
| Base URL | http://host.docker.internal:11434/v1 |
| Model | llava or llama3.2-vision |
| API Key | ollama (any non-empty string) |
| Field | Value |
|---|---|
| Base URL | https://openrouter.ai/api/v1 |
| Model | e.g. anthropic/claude-3.5-sonnet |
| API Key | your OpenRouter key |
| Field | Value |
|---|---|
| Base URL | your EI gateway endpoint |
| Model | model identifier from EI catalog |
| API Key | Keycloak bearer token |
Operational performance across multiple providers — zero shot, without optimisations.
Workload: Extracting 12 fields from 50 patient intake forms
Use Case: CarExtract
| Provider | Model | Deployment | Context Window | Avg Tokens / Request | P50 Latency | P95 Latency | Concurrency | Throughput | Docs / hr |
|---|---|---|---|---|---|---|---|---|---|
| OpenAI | gpt-4o-mini |
Cloud API | 128K | 26,187 | 4.5s | 5.8s | c=1 | 10.6 docs/min | 636 docs/hr |
| c=2 | 8.5 docs/min | 510 docs/hr | |||||||
| c=3 | 8.0 docs/min | 480 docs/hr | |||||||
| Intel OPEA EI | Qwen 2.5 VL 3B Instruct |
On-Prem (Xeon) | 32,768 | 2,249 | 14.3s | 15.2s | c=1 | 4.8 docs/min | 288 docs/hr |
| c=2 | 8.5 docs/min | 510 docs/hr | |||||||
| c=3 | 12.5 docs/min | 750 docs/hr | |||||||
| c=5 | 17.1 docs/min | 1,026 docs/hr |
Notes:
- All figures use the same 12-field extraction schema against a consistent set of patient intake forms. Token counts vary per document type and handwriting quality.
- Intel OPEA Enterprise Inference runs on Intel Xeon CPUs without GPU acceleration.
OpenAI's cost-efficient multimodal vision model, used for structured field extraction from care document images via the cloud API.
| Attribute | Details |
|---|---|
| Parameters | Not publicly disclosed |
| Architecture | Multimodal Transformer (text + image input, text output) |
| Context Window | 128,000 tokens input / 16,384 tokens max output |
| Vision Input | JPEG, PNG, GIF, WEBP — native image understanding |
| Structured Output | JSON mode and strict JSON schema adherence supported |
| Multilingual | Broad multilingual support |
| Pricing | $0.15 / 1M input tokens, $0.60 / 1M output tokens (Batch API: 50% discount) |
| Fine-Tuning | Supervised fine-tuning via OpenAI API |
| License | Proprietary (OpenAI Terms of Use) |
| Deployment | Cloud-only — OpenAI API or Azure OpenAI Service. No self-hosted or on-prem option |
A 3-billion-parameter open-weight vision-language model from Alibaba, optimised for document understanding and structured extraction. Deployed on-prem via Intel OPEA Enterprise Inference on Xeon CPUs.
| Attribute | Details |
|---|---|
| Parameters | 3B |
| Architecture | Vision-Language Transformer — Qwen2.5 language backbone with visual encoder |
| Context Window | 32,768 tokens |
| Vision Input | Images and video; strong document and chart understanding |
| Structured Output | JSON mode supported |
| Multilingual | Strong multilingual support (English, Chinese, and others) |
| Quantization Formats | GGUF, AWQ, GPTQ |
| Inference Runtimes | vLLM, Ollama, llama.cpp, Intel OPEA Enterprise Inference |
| License | Apache 2.0 |
| Deployment | Local, on-prem, air-gapped — full data sovereignty |
| Capability | GPT-4o-mini | Qwen 2.5 VL 3B Instruct |
|---|---|---|
| Medical document field extraction | Yes | Yes |
| Native vision / image input | Yes | Yes |
| On-prem / air-gapped deployment | No | Yes |
| Data sovereignty | No (data sent to cloud API) | Full (weights run locally) |
| Open weights | No (proprietary) | Yes (Apache 2.0) |
| Custom fine-tuning | Supervised fine-tuning (API only) | Full fine-tuning + LoRA adapters |
| Structured JSON output | Yes | Yes |
| Context window | 128K | 32K |
| Inference on CPU (no GPU required) | No | Yes (via Intel OPEA on Xeon) |
GPT-4o-mini offers higher context, lower latency, and simpler deployment via cloud API — well-suited for teams without on-prem infrastructure. Qwen 2.5 VL 3B Instruct offers open weights, data sovereignty, and CPU-based local deployment — making it suitable for air-gapped, regulated, or cost-sensitive healthcare environments where patient data cannot leave the network.
- Framework: FastAPI (Python 3.11)
- Server: Uvicorn (ASGI)
- Document Processing: pypdfium2 (PDF-to-image), Pillow (image encoding)
- Extraction: OpenAI SDK (
openaiPython library) — works with any OpenAI-compatible provider - Accuracy Evaluation: RapidFuzz (fuzzy address matching), custom type-aware comparison
- State Management: JSON file persistence (
config_data/,ground_truth/,results/) - Config Management: pydantic-settings + python-dotenv
- Observability: Langfuse (optional tracing)
- Framework: React 18 + TypeScript
- Build Tool: Vite
- Styling: Tailwind CSS + PostCSS
- State & Data Fetching: TanStack Query (React Query)
- Routing: React Router v6
- Charts: Recharts
- Icons: Lucide React
- Server: nginx (multi-stage Docker build)
Issue: Backend API not responding
# Check service health
curl http://localhost:8000/docs
# View backend logs
docker compose logs backendIssue: Provider test connection fails
- Verify the Base URL includes
/v1if required by the provider - Confirm the API key is valid and has vision model access
- For Ollama inside Docker: use
http://host.docker.internal:11434/v1notlocalhost - For self-hosted endpoints: confirm the container network can reach the endpoint
Issue: Extraction returns empty or malformed JSON
- Ensure the model supports vision input (image-capable model ID)
- Try a lower temperature (0.1–0.2) for more deterministic structured output
- Add explicit extraction instructions:
Respond only with a JSON object. Do not include markdown.
Issue: PDF documents not extracting
- PDFs are converted to images page by page — only the first page is used by default
- Ensure the PDF is not password-protected
- Max upload size: 20MB per file, 50 documents total
Issue: Frontend can't connect to backend
- Verify both services are running:
docker compose ps - The frontend proxies
/apitohttp://backend:8000inside Docker — do not change the nginx config base path
Enable debug logging:
# Update backend/.env
LOG_LEVEL=DEBUG
# Restart services
docker compose restart backend
docker compose logs -f backendThis project is licensed under the MIT License.
CarExtract is provided for demonstration and informational purposes only. It is not a certified medical device and does not constitute clinical decision support. Always validate extracted data before use in any patient-facing or regulated workflow. AI-extracted field values must be reviewed by a qualified clinician or administrator before use in official records.
For full disclaimer details, see DISCLAIMER.md
