Skip to content

cld2labs/CareXtract

Repository files navigation

Cloud2 Labs Innovation Hub

CarExtract - Multi-Provider Medical Document Field Extraction

Define your fields. Upload care documents. Run any vision model. Compare extraction accuracy side by side.


Table of Contents


Project Overview

CarExtract is a full-stack platform for extracting structured fields from medical and care documents — patient intake forms, clinical notes, prescriptions, referrals, and more — using vision-capable language models.

Developed as an open-source reference implementation under the Cloud2 Labs Innovation Hub, CarExtract demonstrates how user-defined field schemas, OpenAI-compatible provider routing, and inline ground truth editing can be packaged into a production-grade microservices architecture. Upload typed or handwritten care documents, define the fields you need to extract, connect any vision model via a single endpoint config, and measure extraction accuracy across providers — against ground truth you review and correct yourself.

How It Works

  1. Define Fields: Users define extraction fields directly in the UI — key, display name, data type (string, date, phone, address, number), description, and optional example. Fields are stored in config_data/fields.json and injected into prompts at runtime.
  2. Upload Documents: Upload care documents (JPG, PNG, PDF). Supports typed and handwritten documents.
  3. Connect Providers: Configure one or more OpenAI-compatible providers — GPT-4o, Claude, Gemini, local Ollama, vLLM, OpenRouter, or custom inference servers. Live connectivity tests confirm reachability before a run.
  4. Extract & Review: Run on-demand extraction per document. Extracted values display inline as editable inputs — review and correct before saving as verified ground truth.
  5. Analyse Accuracy: Trigger a multi-provider analysis run. Type-aware comparison (date normalisation, phone digit stripping, fuzzy address scoring, numeric parsing) produces per-field and per-model accuracy metrics.
  6. Export Results: Full audit CSV exports extracted values, ground truth values, and field-level status for every field, document, and provider combination.

Architecture

The application follows a two-service containerised architecture with a FastAPI backend handling all field schema management, provider routing, extraction, and evaluation logic — paired with a React + TypeScript frontend for document management, live analysis runs, result visualisation, and CSV export.

graph LR

  %% ====== FRONTEND ======
  subgraph FE[Frontend]
    A[React + TypeScript<br/>Port 3000]
  end

  %% ====== BACKEND ======
  subgraph BE[Backend - FastAPI<br/>Port 8000]
    B[API Router]
    FS[Fields Store<br/>fields.json]
    PS[Providers Store<br/>providers.json]
    PB[Prompt Builder]
    EX[Extraction Engine<br/>Async + Concurrency]
    GT[Ground Truth Store<br/>ground_truth.json]
    EV[Evaluator<br/>Type-Aware Scoring]
    AR[Analysis Runner<br/>Multi-Provider]
    RR[Results Store<br/>results/]
  end

  %% ====== EXTERNAL ======
  subgraph EXT[External Providers]
    P1[OpenAI / GPT-4o]
    P2[Claude / Gemini]
    P3[Ollama / vLLM<br/>Local]
    P4[OpenRouter /<br/>Custom Endpoint]
  end

  %% ====== CONNECTIONS ======
  A -->|HTTP /api| B

  B --> FS
  B --> PS
  B --> PB
  B --> EX
  B --> GT
  B --> AR
  B --> RR

  FS -->|Field Schema| PB
  PS -->|Provider Configs| EX
  PB -->|Dynamic Prompt| EX
  EX -->|Extracted Fields| GT
  GT -->|Ground Truth| EV
  EX -->|Extraction Results| EV
  EV -->|Scored Results| AR
  AR -->|Run Results| RR

  EX -->|API Call| P1
  EX -->|API Call| P2
  EX -->|API Call| P3
  EX -->|API Call| P4

  B -->|JSON| A

  %% ====== STYLES ======
  style A fill:#e1f5ff
  style B fill:#fff4e1
  style FS fill:#e8f5e9
  style PS fill:#e8f5e9
  style PB fill:#ffe1f5
  style EX fill:#ffe1f5
  style GT fill:#e8f5e9
  style EV fill:#ffe1f5
  style AR fill:#ffe1f5
  style RR fill:#e8f5e9
  style P1 fill:#fff3cd
  style P2 fill:#fff3cd
  style P3 fill:#fff3cd
  style P4 fill:#fff3cd
Loading

Architecture Components

Frontend (React + TypeScript)

  • Document management — upload, list, delete care documents
  • Fields management — define, reorder, edit extraction field schemas
  • Providers management — configure, test, and manage LLM provider endpoints
  • Document Extract page — per-document on-demand extraction with inline ground truth editing
  • Analysis page — trigger multi-provider runs with live progress polling
  • Results dashboard — per-model accuracy cards, radar charts, latency percentiles, cost telemetry
  • CSV export for offline analysis and clinical audit trails

Backend Services

  • Fields Store: Persists field schema definitions to config_data/fields.json; serves as the source of truth for all prompt construction
  • Providers Store: Manages OpenAI-compatible provider configurations (base URL, model ID, API key, temperature, max tokens); stored in config_data/providers.json
  • Prompt Builder: Constructs system and user prompts dynamically at runtime from the current field schema and per-run extraction instructions
  • Extraction Engine: Async multi-provider extraction with bounded concurrency; encodes document images, calls vision LLMs, and parses structured JSON responses
  • Ground Truth Store: Persists human-reviewed and corrected extraction values per document; supports inline editing and partial patch updates
  • Evaluator: Applies type-aware field comparison — exact string matching, date normalisation, phone digit stripping, fuzzy address scoring (RapidFuzz), numeric parsing
  • Analysis Runner: Orchestrates multi-provider runs, tracks run status and progress, aggregates per-field and per-model accuracy metrics
  • Results Store: Persists completed analysis results per run ID; supports retrieval and CSV export

External Integration

  • Any OpenAI-compatible API — GPT-4o, Claude (via proxy), Gemini (via proxy), Ollama (local), vLLM (local/on-prem), OpenRouter, or custom inference servers

Get Started

Prerequisites

Before you begin, ensure you have the following installed:

  • Docker and Docker Compose (v20.10+)
  • At least one vision-capable LLM provider (any of the following):
    • OpenAI API key (GPT-4o or GPT-4o-mini with vision)
    • Anthropic API key (Claude with vision, via OpenAI-compatible proxy)
    • Local Ollama with a vision model (e.g. llava, llama3.2-vision)
    • vLLM endpoint with a vision model
    • Any OpenRouter or custom OpenAI-compatible endpoint

Verify Installation

# Check Docker
docker --version
docker compose version

# Verify Docker is running
docker ps

Quick Start

1. Clone the Repository

git clone https://github.com/cld2labs/CareXtract.git
cd CareXtract

2. Configure Environment Variables

Copy the example environment file:

cp backend/.env.example backend/.env

Edit backend/.env — only Langfuse observability settings are required here. Provider API keys and endpoints are configured in the UI after startup.

# Langfuse Observability (optional — set LANGFUSE_ENABLED=false to skip)
LANGFUSE_ENABLED=false

3. Launch the Application

# Build and start all services
docker compose up --build

# Or run in detached mode (background)
docker compose up -d --build

4. Access the Application

Once containers are running:

5. Add Your First Provider

  1. Open http://localhost:3000
  2. Navigate to the Providers page
  3. Click Add Provider
  4. Enter your provider details:
  • Name: e.g. OpenAI GPT-4o
  • Base URL: e.g. https://api.openai.com/v1 (leave blank for OpenAI default)
  • API Key: your API key
  • Model: e.g. gpt-4o
  1. Click Test Connection to verify
  2. Save the provider

6. Upload Documents and Extract

  1. Navigate to Documents and upload care documents (JPG, PNG, or PDF)
  2. Navigate to Fields and define the fields you want to extract
  3. Navigate to Documents → select a document → click Extract to run on-demand extraction
  4. Review and correct extracted values inline — click Save as Ground Truth when verified
  5. Navigate to Analysis → click Run Analysis to compare all providers
  6. View results in the Results dashboard

7. Stop the Application

docker compose down

Project Structure

CareXtract/
├── backend/
│   ├── api/
│   │   └── routes.py             # All API endpoints (fields, providers, docs, analysis, results)
│   ├── extractors/
│   │   ├── base.py               # Base extractor interface
│   │   └── dynamic_extractor.py  # Vision LLM extraction with dynamic prompts
│   ├── analysis/
│   │   ├── evaluator.py          # Type-aware field comparison logic
│   │   ├── metrics.py            # Per-model accuracy metric aggregation
│   │   └── runner.py             # Async multi-provider analysis runner
│   ├── models/
│   │   └── schemas.py            # Pydantic models (FieldDefinition, ProviderConfig, etc.)
│   ├── main.py                   # FastAPI application entry point
│   ├── config.py                 # Configuration and settings
│   ├── fields_store.py           # Field schema persistence
│   ├── providers_store.py        # Provider config persistence
│   ├── prompt_builder.py         # Dynamic prompt construction
│   ├── requirements.txt          # Python dependencies
│   └── Dockerfile                # Backend container
├── frontend/
│   ├── src/
│   │   ├── pages/
│   │   │   ├── LandingPage.tsx   # Home / overview
│   │   │   ├── DocumentsPage.tsx # Document management and extraction
│   │   │   ├── FieldsPage.tsx    # Field schema management
│   │   │   ├── ProvidersPage.tsx # Provider configuration and testing
│   │   │   ├── AnalysisPage.tsx  # Analysis run management
│   │   │   └── ResultsPage.tsx   # Results visualisation and export
│   │   ├── api/
│   │   │   └── client.ts         # API request utilities
│   │   ├── types/
│   │   │   └── index.ts          # TypeScript interfaces
│   │   └── App.tsx               # Application root and routing
│   ├── package.json              # npm dependencies
│   ├── vite.config.ts            # Vite bundler config
│   ├── tsconfig.json             # TypeScript configuration
│   ├── tailwind.config.js        # Tailwind CSS theme
│   └── Dockerfile                # Frontend container (multi-stage nginx)
├── docs/
│   └── assets/                   # Documentation images
├── .github/
│   └── workflows/
│       └── code-scans.yaml       # Security scanning (Trivy + Bandit)
├── docker-compose.yml            # Service orchestration
├── .gitignore                    # Git exclusions
├── LICENSE.md                    # MIT License
├── README.md                     # Project documentation
├── CONTRIBUTING.md               # Contributing guidelines
├── DISCLAIMER.md                 # Legal disclaimer
├── SECURITY.md                   # Security policy
└── TERMS_AND_CONDITIONS.md       # Terms of use

Usage Guide

Defining Fields

Navigate to Fields to define what to extract from your documents.

Each field has:

  • Key: Unique identifier used in prompts and exports (e.g. patient_name)
  • Display Name: Human-readable label (e.g. Patient Name)
  • Type: string, date, phone, address, or number
  • Description: Instructions for the model (e.g. Full legal name as it appears on the form)
  • Example (optional): A sample value to guide the model

Fields are injected into system prompts at runtime. Reorder fields to control prompt priority.

Configuring Providers

Navigate to Providers to add and manage LLM endpoints.

Each provider has:

  • Name: Display label (e.g. OpenAI GPT-4o)
  • Base URL: OpenAI-compatible endpoint (https://api.openai.com/v1, Ollama: http://localhost:11434/v1, etc.)
  • API Key: Authentication token
  • Model: Vision model ID (e.g. gpt-4o, llava, llama3.2-vision)
  • Temperature / Max Tokens: Generation parameters

Use Test Connection to confirm provider reachability before running analysis.

Extracting and Labelling Ground Truth

  1. Navigate to Documents and upload care documents
  2. Select a document and choose a provider → click Extract
  3. Review extracted field values inline — edit any incorrect or missing values
  4. Click Save as Ground Truth to persist corrected values as the reference for accuracy evaluation

Ground truth can be updated at any time. Re-run analysis after corrections to refresh accuracy scores.

Running Multi-Provider Analysis

  1. Navigate to Analysis
  2. (Optional) Add extraction instructions to guide the model on document-specific context — date format conventions, abbreviation standards, handwriting characteristics
  3. Select providers to include in the run
  4. Click Run Analysis
  5. Monitor live progress — current model and document count update in real time
  6. Results redirect automatically to the visualisation dashboard on completion

Reading Results

The Results dashboard shows:

  • Per-model accuracy cards — overall accuracy, TRUE_POSITIVE rate, FALSE_POSITIVE, FALSE_NEGATIVE, INCORRECT, PARSE_ERROR counts
  • Per-field radar charts — compare field-level accuracy across providers
  • Latency metrics — avg, P50, P95 per provider
  • Cost telemetry — estimated cost per extraction
  • Per-document detail — drill down into individual document extraction results

Click Export CSV to download a full audit file with extracted values, ground truth, and field-level status for every combination.


Environment Variables

Configure application behaviour using environment variables in backend/.env:

Variable Description Default Type
LANGFUSE_ENABLED Enable Langfuse observability tracing false boolean
LANGFUSE_PUBLIC_KEY Langfuse project public key - string
LANGFUSE_SECRET_KEY Langfuse project secret key - string
LANGFUSE_HOST Langfuse server host (use http://host.docker.internal:3001 for local Docker) https://cloud.langfuse.com string

Note: Provider API keys and endpoints are not stored in .env. They are configured in the UI and persisted to config_data/providers.json, which is excluded from version control.


Provider Configuration

CarExtract supports any OpenAI-compatible vision API. Common provider setups:

OpenAI

Field Value
Base URL (leave blank — uses OpenAI default)
Model gpt-4o or gpt-4o-mini
API Key sk-...

Ollama (Local)

# Pull a vision model
ollama pull llava
# or
ollama pull llama3.2-vision
Field Value
Base URL http://host.docker.internal:11434/v1
Model llava or llama3.2-vision
API Key ollama (any non-empty string)

OpenRouter

Field Value
Base URL https://openrouter.ai/api/v1
Model e.g. anthropic/claude-3.5-sonnet
API Key your OpenRouter key

Intel OPEA Enterprise Inference

Field Value
Base URL your EI gateway endpoint
Model model identifier from EI catalog
API Key Keycloak bearer token

Inference Metrics

Operational performance across multiple providers — zero shot, without optimisations.

Workload: Extracting 12 fields from 50 patient intake forms
Use Case: CarExtract

Provider Model Deployment Context Window Avg Tokens / Request P50 Latency P95 Latency Concurrency Throughput Docs / hr
OpenAI gpt-4o-mini Cloud API 128K 26,187 4.5s 5.8s c=1 10.6 docs/min 636 docs/hr
c=2 8.5 docs/min 510 docs/hr
c=3 8.0 docs/min 480 docs/hr
Intel OPEA EI Qwen 2.5 VL 3B Instruct On-Prem (Xeon) 32,768 2,249 14.3s 15.2s c=1 4.8 docs/min 288 docs/hr
c=2 8.5 docs/min 510 docs/hr
c=3 12.5 docs/min 750 docs/hr
c=5 17.1 docs/min 1,026 docs/hr

Notes:

  • All figures use the same 12-field extraction schema against a consistent set of patient intake forms. Token counts vary per document type and handwriting quality.
  • Intel OPEA Enterprise Inference runs on Intel Xeon CPUs without GPU acceleration.

Model Capabilities

GPT-4o-mini

OpenAI's cost-efficient multimodal vision model, used for structured field extraction from care document images via the cloud API.

Attribute Details
Parameters Not publicly disclosed
Architecture Multimodal Transformer (text + image input, text output)
Context Window 128,000 tokens input / 16,384 tokens max output
Vision Input JPEG, PNG, GIF, WEBP — native image understanding
Structured Output JSON mode and strict JSON schema adherence supported
Multilingual Broad multilingual support
Pricing $0.15 / 1M input tokens, $0.60 / 1M output tokens (Batch API: 50% discount)
Fine-Tuning Supervised fine-tuning via OpenAI API
License Proprietary (OpenAI Terms of Use)
Deployment Cloud-only — OpenAI API or Azure OpenAI Service. No self-hosted or on-prem option

Qwen 2.5 VL 3B Instruct

A 3-billion-parameter open-weight vision-language model from Alibaba, optimised for document understanding and structured extraction. Deployed on-prem via Intel OPEA Enterprise Inference on Xeon CPUs.

Attribute Details
Parameters 3B
Architecture Vision-Language Transformer — Qwen2.5 language backbone with visual encoder
Context Window 32,768 tokens
Vision Input Images and video; strong document and chart understanding
Structured Output JSON mode supported
Multilingual Strong multilingual support (English, Chinese, and others)
Quantization Formats GGUF, AWQ, GPTQ
Inference Runtimes vLLM, Ollama, llama.cpp, Intel OPEA Enterprise Inference
License Apache 2.0
Deployment Local, on-prem, air-gapped — full data sovereignty

Comparison Summary

Capability GPT-4o-mini Qwen 2.5 VL 3B Instruct
Medical document field extraction Yes Yes
Native vision / image input Yes Yes
On-prem / air-gapped deployment No Yes
Data sovereignty No (data sent to cloud API) Full (weights run locally)
Open weights No (proprietary) Yes (Apache 2.0)
Custom fine-tuning Supervised fine-tuning (API only) Full fine-tuning + LoRA adapters
Structured JSON output Yes Yes
Context window 128K 32K
Inference on CPU (no GPU required) No Yes (via Intel OPEA on Xeon)

GPT-4o-mini offers higher context, lower latency, and simpler deployment via cloud API — well-suited for teams without on-prem infrastructure. Qwen 2.5 VL 3B Instruct offers open weights, data sovereignty, and CPU-based local deployment — making it suitable for air-gapped, regulated, or cost-sensitive healthcare environments where patient data cannot leave the network.


Technology Stack

Backend

  • Framework: FastAPI (Python 3.11)
  • Server: Uvicorn (ASGI)
  • Document Processing: pypdfium2 (PDF-to-image), Pillow (image encoding)
  • Extraction: OpenAI SDK (openai Python library) — works with any OpenAI-compatible provider
  • Accuracy Evaluation: RapidFuzz (fuzzy address matching), custom type-aware comparison
  • State Management: JSON file persistence (config_data/, ground_truth/, results/)
  • Config Management: pydantic-settings + python-dotenv
  • Observability: Langfuse (optional tracing)

Frontend

  • Framework: React 18 + TypeScript
  • Build Tool: Vite
  • Styling: Tailwind CSS + PostCSS
  • State & Data Fetching: TanStack Query (React Query)
  • Routing: React Router v6
  • Charts: Recharts
  • Icons: Lucide React
  • Server: nginx (multi-stage Docker build)

Troubleshooting

Common Issues

Issue: Backend API not responding

# Check service health
curl http://localhost:8000/docs

# View backend logs
docker compose logs backend

Issue: Provider test connection fails

  • Verify the Base URL includes /v1 if required by the provider
  • Confirm the API key is valid and has vision model access
  • For Ollama inside Docker: use http://host.docker.internal:11434/v1 not localhost
  • For self-hosted endpoints: confirm the container network can reach the endpoint

Issue: Extraction returns empty or malformed JSON

  • Ensure the model supports vision input (image-capable model ID)
  • Try a lower temperature (0.1–0.2) for more deterministic structured output
  • Add explicit extraction instructions: Respond only with a JSON object. Do not include markdown.

Issue: PDF documents not extracting

  • PDFs are converted to images page by page — only the first page is used by default
  • Ensure the PDF is not password-protected
  • Max upload size: 20MB per file, 50 documents total

Issue: Frontend can't connect to backend

  • Verify both services are running: docker compose ps
  • The frontend proxies /api to http://backend:8000 inside Docker — do not change the nginx config base path

Debug Mode

Enable debug logging:

# Update backend/.env
LOG_LEVEL=DEBUG

# Restart services
docker compose restart backend
docker compose logs -f backend

License

This project is licensed under the MIT License.


Disclaimer

CarExtract is provided for demonstration and informational purposes only. It is not a certified medical device and does not constitute clinical decision support. Always validate extracted data before use in any patient-facing or regulated workflow. AI-extracted field values must be reviewed by a qualified clinician or administrator before use in official records.

For full disclaimer details, see DISCLAIMER.md


About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors