Self-Hosted OCR API for Documents and Images

A self-hosted OCR API powered by DeepSeek-OCR-2. Runs on GPU via Docker, processes PDFs and images, and returns clean markdown content in JSON responses. Supports multiple languages with high accuracy.

Model


Model	DeepSeek-OCR-2
Architecture	DeepSeek-VL-v2 based
Model size	6.4GB
GPU VRAM	~8GB
Input formats	PDF, PNG, JPG, JPEG, BMP, TIFF, WEBP

Requirements

Docker with NVIDIA Container Toolkit
NVIDIA GPU with ~8GB VRAM

Quick start

Using Docker Hub image:

services:
  deepseek-ocr2:
    image: edgaras0x4e/deepseek-ocr-2-api:latest
    container_name: deepseek-ocr2
    ports:
      - "9713:7860"
    volumes:
      - ocr-data:/data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped

volumes:
  ocr-data:

docker compose up -d

Or build from source:

docker compose up --build -d

The API will be available at http://localhost:9713. On first startup the model (~6.4GB) is downloaded and loaded into GPU memory (~8GB VRAM). The API accepts requests immediately, but jobs will start processing once the model is ready.

Usage

Submit a PDF or image

# PDF
curl -X POST http://localhost:9713/ocr -F "file=@document.pdf"

# Image
curl -X POST http://localhost:9713/ocr -F "file=@scan.jpg"

{
  "job_id": "994e7b398bb44d8ab5eade4d2ef57a15",
  "filename": "document.pdf",
  "status": "queued"
}

Check progress

curl http://localhost:9713/ocr/{job_id}

{
  "job_id": "994e7b398bb44d8ab5eade4d2ef57a15",
  "filename": "document.pdf",
  "status": "processing",
  "total_pages": 185,
  "processed_pages": 42,
  "error": null
}

Get a single page

curl http://localhost:9713/ocr/{job_id}/pages/1

{
  "job_id": "994e7b398bb44d8ab5eade4d2ef57a15",
  "page_num": 1,
  "markdown": "## Chapter 1\n\nLorem ipsum dolor sit amet, consectetur adipiscing elit..."
}

Get all pages

curl http://localhost:9713/ocr/{job_id}/result

{
  "job_id": "994e7b398bb44d8ab5eade4d2ef57a15",
  "filename": "document.pdf",
  "status": "completed",
  "total_pages": 185,
  "processed_pages": 185,
  "pages": [
    {"page_num": 1, "markdown": "## Chapter 1\n\nLorem ipsum dolor sit amet..."},
    {"page_num": 2, "markdown": "..."}
  ]
}

List all jobs

curl http://localhost:9713/jobs

{
  "jobs": [
    {
      "job_id": "994e7b398bb44d8ab5eade4d2ef57a15",
      "filename": "document.pdf",
      "status": "completed",
      "total_pages": 185,
      "processed_pages": 185
    }
  ]
}

Cancel a job

curl -X POST http://localhost:9713/ocr/{job_id}/cancel

{
  "job_id": "994e7b398bb44d8ab5eade4d2ef57a15",
  "status": "cancelling"
}

Delete a job

curl -X DELETE http://localhost:9713/ocr/{job_id}

{
  "status": "deleted"
}

API reference

Method	Endpoint	Description
`POST`	`/ocr`	Upload a PDF or image for processing
`GET`	`/ocr/{job_id}`	Get job status and progress
`GET`	`/ocr/{job_id}/pages/{page_num}`	Get markdown for a specific page
`GET`	`/ocr/{job_id}/result`	Get all completed pages
`POST`	`/ocr/{job_id}/cancel`	Cancel a queued or running job
`DELETE`	`/ocr/{job_id}`	Delete a job and its data
`GET`	`/jobs`	List all jobs
`GET`	`/health`	Check API and model status

Configuration

Environment variables in docker-compose.yml:

Variable	Default	Description
`API_KEY`	(empty)	Optional API key. When set, all requests must include an `X-API-Key` header
`OCR_DPI`	`300`	DPI for PDF page rendering
`DB_PATH`	`/data/ocr.db`	SQLite database path
`UPLOAD_DIR`	`/data/uploads`	Upload storage path

Enabling API key authentication

Uncomment the environment section in docker-compose.yml:

environment:
  - API_KEY=your-secret-key

Then restart:

docker compose down && docker compose up -d

All requests must then include the header:

curl -H "X-API-Key: your-secret-key" http://localhost:9713/jobs

docker-compose.yml

services:
  deepseek-ocr2:
    build: .
    container_name: deepseek-ocr2
    ports:
      - "9713:7860"
    # environment:
    #   - API_KEY=your-secret-key
    volumes:
      - ocr-data:/data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped

volumes:
  ocr-data:

How it works

A PDF or image is uploaded and saved to disk
A background worker picks up queued jobs in order
PDFs: Each page is converted to an image using PyMuPDF
Images: Used directly as a single-page job
DeepSeek-OCR-2 extracts text and converts it to markdown with grounding
Special markup tags and bounding box data are stripped from the output
Results are stored in SQLite and available per-page as they complete
Jobs interrupted by a restart are automatically re-queued

Supported formats

PDFs: Multi-page processing, each page processed independently

Images: Single-page processing

PNG
JPG/JPEG
BMP
TIFF
WEBP

Data persistence

The /data volume stores the SQLite database and uploaded files. This is a named Docker volume (ocr-data) that persists across container restarts and rebuilds.

Performance notes

Model loading takes ~2-5 minutes on first startup
Processing speed depends on GPU and image complexity
PDF pages are rendered at 300 DPI by default (configurable)
Jobs are processed sequentially in order of submission

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Dockerfile		Dockerfile
README.md		README.md
api.py		api.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Hosted OCR API for Documents and Images

Model

Requirements

Quick start

Usage

Submit a PDF or image

Check progress

Get a single page

Get all pages

List all jobs

Cancel a job

Delete a job

API reference

Configuration

Enabling API key authentication

docker-compose.yml

How it works

Supported formats

Data persistence

Performance notes

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Self-Hosted OCR API for Documents and Images

Model

Requirements

Quick start

Usage

Submit a PDF or image

Check progress

Get a single page

Get all pages

List all jobs

Cancel a job

Delete a job

API reference

Configuration

Enabling API key authentication

docker-compose.yml

How it works

Supported formats

Data persistence

Performance notes

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages