Skip to content

Edgaras0x4E/deepseek-ocr-2-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Self-Hosted OCR API for Documents and Images

A self-hosted OCR API powered by DeepSeek-OCR-2. Runs on GPU via Docker, processes PDFs and images, and returns clean markdown content in JSON responses. Supports multiple languages with high accuracy.

Model

Model DeepSeek-OCR-2
Architecture DeepSeek-VL-v2 based
Model size 6.4GB
GPU VRAM ~8GB
Input formats PDF, PNG, JPG, JPEG, BMP, TIFF, WEBP

Requirements

Quick start

Using Docker Hub image:

services:
  deepseek-ocr2:
    image: edgaras0x4e/deepseek-ocr-2-api:latest
    container_name: deepseek-ocr2
    ports:
      - "9713:7860"
    volumes:
      - ocr-data:/data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped

volumes:
  ocr-data:
docker compose up -d

Or build from source:

docker compose up --build -d

The API will be available at http://localhost:9713. On first startup the model (~6.4GB) is downloaded and loaded into GPU memory (~8GB VRAM). The API accepts requests immediately, but jobs will start processing once the model is ready.

Usage

Submit a PDF or image

# PDF
curl -X POST http://localhost:9713/ocr -F "file=@document.pdf"

# Image
curl -X POST http://localhost:9713/ocr -F "file=@scan.jpg"
{
  "job_id": "994e7b398bb44d8ab5eade4d2ef57a15",
  "filename": "document.pdf",
  "status": "queued"
}

Check progress

curl http://localhost:9713/ocr/{job_id}
{
  "job_id": "994e7b398bb44d8ab5eade4d2ef57a15",
  "filename": "document.pdf",
  "status": "processing",
  "total_pages": 185,
  "processed_pages": 42,
  "error": null
}

Get a single page

curl http://localhost:9713/ocr/{job_id}/pages/1
{
  "job_id": "994e7b398bb44d8ab5eade4d2ef57a15",
  "page_num": 1,
  "markdown": "## Chapter 1\n\nLorem ipsum dolor sit amet, consectetur adipiscing elit..."
}

Get all pages

curl http://localhost:9713/ocr/{job_id}/result
{
  "job_id": "994e7b398bb44d8ab5eade4d2ef57a15",
  "filename": "document.pdf",
  "status": "completed",
  "total_pages": 185,
  "processed_pages": 185,
  "pages": [
    {"page_num": 1, "markdown": "## Chapter 1\n\nLorem ipsum dolor sit amet..."},
    {"page_num": 2, "markdown": "..."}
  ]
}

List all jobs

curl http://localhost:9713/jobs
{
  "jobs": [
    {
      "job_id": "994e7b398bb44d8ab5eade4d2ef57a15",
      "filename": "document.pdf",
      "status": "completed",
      "total_pages": 185,
      "processed_pages": 185
    }
  ]
}

Cancel a job

curl -X POST http://localhost:9713/ocr/{job_id}/cancel
{
  "job_id": "994e7b398bb44d8ab5eade4d2ef57a15",
  "status": "cancelling"
}

Delete a job

curl -X DELETE http://localhost:9713/ocr/{job_id}
{
  "status": "deleted"
}

API reference

Method Endpoint Description
POST /ocr Upload a PDF or image for processing
GET /ocr/{job_id} Get job status and progress
GET /ocr/{job_id}/pages/{page_num} Get markdown for a specific page
GET /ocr/{job_id}/result Get all completed pages
POST /ocr/{job_id}/cancel Cancel a queued or running job
DELETE /ocr/{job_id} Delete a job and its data
GET /jobs List all jobs
GET /health Check API and model status

Configuration

Environment variables in docker-compose.yml:

Variable Default Description
API_KEY (empty) Optional API key. When set, all requests must include an X-API-Key header
OCR_DPI 300 DPI for PDF page rendering
DB_PATH /data/ocr.db SQLite database path
UPLOAD_DIR /data/uploads Upload storage path

Enabling API key authentication

Uncomment the environment section in docker-compose.yml:

environment:
  - API_KEY=your-secret-key

Then restart:

docker compose down && docker compose up -d

All requests must then include the header:

curl -H "X-API-Key: your-secret-key" http://localhost:9713/jobs

docker-compose.yml

services:
  deepseek-ocr2:
    build: .
    container_name: deepseek-ocr2
    ports:
      - "9713:7860"
    # environment:
    #   - API_KEY=your-secret-key
    volumes:
      - ocr-data:/data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped

volumes:
  ocr-data:

How it works

  1. A PDF or image is uploaded and saved to disk
  2. A background worker picks up queued jobs in order
  3. PDFs: Each page is converted to an image using PyMuPDF
  4. Images: Used directly as a single-page job
  5. DeepSeek-OCR-2 extracts text and converts it to markdown with grounding
  6. Special markup tags and bounding box data are stripped from the output
  7. Results are stored in SQLite and available per-page as they complete
  8. Jobs interrupted by a restart are automatically re-queued

Supported formats

PDFs: Multi-page processing, each page processed independently

Images: Single-page processing

  • PNG
  • JPG/JPEG
  • BMP
  • TIFF
  • WEBP

Data persistence

The /data volume stores the SQLite database and uploaded files. This is a named Docker volume (ocr-data) that persists across container restarts and rebuilds.

Performance notes

  • Model loading takes ~2-5 minutes on first startup
  • Processing speed depends on GPU and image complexity
  • PDF pages are rendered at 300 DPI by default (configurable)
  • Jobs are processed sequentially in order of submission

License

MIT

About

A self-hosted OCR API powered by DeepSeek-OCR-2.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors