Visual AI API

High-performance visual analysis API with NVIDIA Triton Inference Server.

Object detection, face recognition, visual search, OCR, and embeddings - all through a unified REST API with TensorRT acceleration.

Quick Start

Copy & Run (One Line)

git clone https://github.com/davidamacey/OpenProcessor.git && cd OpenProcessor && ./scripts/setup.sh

That's it! The setup script automatically:

Detects your GPU and selects the optimal profile
Downloads required models (~500MB)
Exports models to TensorRT (45-60 min first time)
Starts all services

Non-Interactive Setup

# Clone and setup with defaults (no prompts)
git clone https://github.com/davidamacey/OpenProcessor.git && cd OpenProcessor && ./scripts/setup.sh --yes

# Or specify a profile explicitly
./scripts/setup.sh --profile=standard --gpu=0 --yes

Verify Installation

curl http://localhost:4603/health
# {"status":"healthy","version":"0.1.0",...}

# Quick test with an image
curl -X POST http://localhost:4603/detect -F "image=@your-image.jpg"

Management Commands

./scripts/openprocessor.sh status    # Check service health
./scripts/openprocessor.sh logs -f   # View live logs
./scripts/openprocessor.sh restart   # Restart all services
./scripts/openprocessor.sh help      # See all commands

See INSTALLATION.md for manual installation, troubleshooting, and advanced options.

GPU Compatibility

Profile	VRAM	GPUs	Throughput
minimal	6-8GB	RTX 3060, RTX 4060	~5 RPS
standard	12-24GB	RTX 3080, RTX 4090	~15 RPS
full	48GB+	A6000, A100	~50 RPS

Switch profiles: ./scripts/openprocessor.sh profile <name>

Recent Updates

Latest:

One-command setup with automatic GPU detection
Multi-GPU profile support (minimal/standard/full)
Automated TensorRT export with progress feedback
Comprehensive test suite (32 tests, 100% passing)
Management CLI for service control

API Endpoints

All endpoints available on port 4603.

Object Detection

Endpoint	Method	Description
`/detect`	POST	YOLO object detection (single image)
`/detect/batch`	POST	Batch detection (up to 64 images)

Face Recognition

Endpoint	Method	Description
`/faces/detect`	POST	Face detection with landmarks (SCRFD)
`/faces/recognize`	POST	Detection + ArcFace 512-dim embeddings
`/faces/verify`	POST	1:1 face comparison (two images)
`/faces/search`	POST	Find similar faces in index
`/faces/identify`	POST	1:N face identification

Embeddings

Endpoint	Method	Description
`/embed/image`	POST	MobileCLIP image embedding (512-dim)
`/embed/text`	POST	MobileCLIP text embedding (512-dim)
`/embed/batch`	POST	Batch image embeddings
`/embed/boxes`	POST	Per-box crop embeddings

Visual Search

Endpoint	Method	Description
`/search/image`	POST	Image-to-image similarity search
`/search/text`	POST	Text-to-image search
`/search/face`	POST	Face similarity search
`/search/ocr`	POST	Search images by text content
`/search/object`	POST	Object-level search (vehicles, people)

Data Ingestion

Endpoint	Method	Description
`/ingest`	POST	Ingest image (auto-indexes faces, OCR, objects)
`/ingest/batch`	POST	Batch ingest (up to 64 images)
`/ingest/directory`	POST	Bulk ingest from server directory

OCR (Text Extraction)

Endpoint	Method	Description
`/ocr/predict`	POST	Extract text from image (PP-OCRv5)
`/ocr/batch`	POST	Batch OCR processing

Combined Analysis

Endpoint	Method	Description
`/analyze`	POST	All models on single image (YOLO + faces + CLIP + OCR)
`/analyze/batch`	POST	Batch combined analysis

Clustering & Albums

Endpoint	Method	Description
`/clusters/train/{index}`	POST	Train FAISS clustering for an index
`/clusters/stats/{index}`	GET	Get cluster statistics
`/clusters/{index}/{id}`	GET	Get cluster members
`/clusters/albums`	GET	List auto-generated albums

Data Retrieval

Endpoint	Method	Description
`/query/image/{id}`	GET	Get stored image data/metadata
`/query/stats`	GET	Index statistics for all indexes
`/query/duplicates`	GET	List duplicate groups

Health & Monitoring

Endpoint	Method	Description
`/health`	GET	Service health check
`/health/models`	GET	Triton model status

Usage Examples

Python

import requests

# Object Detection
with open('image.jpg', 'rb') as f:
    resp = requests.post('http://localhost:4603/detect', files={'image': f})
result = resp.json()
# {"detections": [{"x1": 0.1, "y1": 0.2, "x2": 0.3, "y2": 0.4, "confidence": 0.95, "class_id": 0, "class_name": "person"}], ...}

# Face Recognition
with open('photo.jpg', 'rb') as f:
    resp = requests.post('http://localhost:4603/faces/recognize', files={'image': f})
print(resp.json())
# {"num_faces": 2, "faces": [...], "embeddings": [[...512 floats...], ...]}

# Image Embedding
with open('image.jpg', 'rb') as f:
    resp = requests.post('http://localhost:4603/embed/image', files={'image': f})
embedding = resp.json()['embedding']  # 512-dim vector

# Text-to-Image Search
resp = requests.post('http://localhost:4603/search/text',
                    json={'query': 'a red sports car', 'top_k': 10})
results = resp.json()['results']

# Image Ingestion (auto-indexes everything)
with open('photo.jpg', 'rb') as f:
    resp = requests.post('http://localhost:4603/ingest',
                        files={'image': f},
                        data={'image_id': 'photo_001'})
print(resp.json())
# {"status": "indexed", "image_id": "photo_001", "indexed": {"global": true, "faces": 2, "vehicles": 1}}

# OCR
with open('document.jpg', 'rb') as f:
    resp = requests.post('http://localhost:4603/ocr/predict', files={'image': f})
print(resp.json())
# {"num_texts": 5, "texts": ["Invoice", "Total: $100"], ...}

# Combined Analysis (everything in one call)
with open('scene.jpg', 'rb') as f:
    resp = requests.post('http://localhost:4603/analyze', files={'image': f})
result = resp.json()
# {"detections": [...], "faces": [...], "global_embedding": [...], "ocr": {...}}

cURL

# Detection
curl -X POST http://localhost:4603/detect -F "image=@photo.jpg"

# Face Recognition
curl -X POST http://localhost:4603/faces/recognize -F "image=@face.jpg"

# Text Search
curl -X POST http://localhost:4603/search/text \
    -H "Content-Type: application/json" \
    -d '{"query": "sunset beach", "top_k": 10}'

# Ingestion
curl -X POST http://localhost:4603/ingest \
    -F "image=@photo.jpg" \
    -F "image_id=my_photo_001"

Response Formats

Detection Response

{
  "detections": [
    {
      "x1": 0.094, "y1": 0.278, "x2": 0.870, "y2": 0.989,
      "confidence": 0.918,
      "class_id": 0,
      "class_name": "person"
    }
  ],
  "image": {"width": 1920, "height": 1080},
  "inference_time_ms": 12.5
}

Note: Coordinates are normalized (0.0-1.0). Multiply by image width/height for pixels.

Face Recognition Response

{
  "num_faces": 2,
  "faces": [
    {
      "box": {"x1": 0.30, "y1": 0.10, "x2": 0.50, "y2": 0.40},
      "confidence": 0.98,
      "landmarks": [[0.35, 0.20], [0.45, 0.20], [0.40, 0.28], [0.36, 0.35], [0.44, 0.35]]
    }
  ],
  "embeddings": [[...512 floats...]],
  "inference_time_ms": 25.3
}

Search Response

{
  "status": "success",
  "results": [
    {"image_id": "img_001", "score": 0.95, "image_path": "/path/to/image.jpg"}
  ],
  "total_results": 10,
  "search_time_ms": 15.2
}

Ingest Response

{
  "status": "success",
  "image_id": "photo_001",
  "num_detections": 5,
  "num_faces": 2,
  "embedding_norm": 1.0,
  "indexed": {
    "global": true,
    "vehicles": 1,
    "people": 2,
    "faces": 2,
    "ocr": true
  },
  "ocr": {
    "num_texts": 3,
    "full_text": "Invoice Total: $100",
    "indexed": true
  },
  "total_time_ms": 850.4
}

Architecture

Client (Port 4603)
       |
       v
  +----------+
  | yolo-api |  FastAPI service (all endpoints)
  +----------+
       |
       v
  +--------------+     +------------+
  | triton-server|     | opensearch |
  | (GPU)        |     | (k-NN)     |
  +--------------+     +------------+

Services:

yolo-api (port 4603): FastAPI service handling all requests
triton-server (ports 4600-4602): NVIDIA Triton Inference Server with TensorRT models
opensearch (port 4607): Vector database for similarity search
prometheus/grafana (ports 4604/4605): Monitoring stack

Models

Model	Purpose	Backend
YOLO11	Object detection	TensorRT End2End
SCRFD-10G	Face detection + landmarks	TensorRT
ArcFace	Face embeddings (512-dim)	TensorRT
MobileCLIP	Image/text embeddings (512-dim)	TensorRT
PP-OCRv5	Text detection + recognition	TensorRT

All models use FP16 precision with dynamic batching for optimal throughput.

Performance

Measured Latency (single request):

Operation	Time	Throughput
Object Detection	140-170ms	~6-7 RPS
Face Detection	100-150ms	~7-10 RPS
Face Recognition	105-130ms	~8-9 RPS
Image Embedding (CLIP)	6-8ms	~120 RPS
Text Embedding (CLIP)	5-17ms	~60-200 RPS
OCR Prediction	170-350ms	~3-6 RPS
Full Analyze	280-430ms	~2-3 RPS
Single Image Ingest	750-950ms	~1-1.3 RPS
Batch Ingest (50 images)	7.3s total	~6.8 images/sec

Batch processing provides ~2-3x throughput improvement over sequential single-image processing.

System Requirements

Minimum:

NVIDIA GPU with 8GB+ VRAM (Ampere or newer)
16GB RAM, 16 CPU cores
Docker with NVIDIA Container Toolkit

Recommended:

NVIDIA A100/A6000/RTX 4090 (16GB+)
64GB RAM, 48+ CPU cores
NVMe SSD for image storage

Configuration

Worker Count

# docker-compose.yml
command: --workers=64  # Production
command: --workers=2   # Development

GPU Selection

# docker-compose.yml
device_ids: ['0', '2']  # Use GPUs 0 and 2

Testing

Run comprehensive test suite to verify all functionality:

# Full system test (32 tests covering all endpoints)
source .venv/bin/activate
python tests/test_full_system.py 2>&1 | tee test_results/test_results.txt

# Visual validation (draws bounding boxes on test images)
python tests/validate_visual_results.py 2>&1 | tee test_results/visual_validation.txt

# View annotated test images
ls test_results/*.jpg

Test Coverage:

✅ All ML model endpoints (detection, faces, CLIP, OCR)
✅ Single and batch processing
✅ Directory ingest pipeline (50+ images)
✅ OpenSearch indexing and search
✅ Visual validation with bounding boxes

Benchmarking

cd benchmarks
./build.sh
./triton_bench --mode quick    # 30-second test
./triton_bench --mode full     # Full benchmark

See benchmarks/README.md for detailed benchmarking guide.

Documentation

CLAUDE.md: AI assistant instructions and detailed architecture
docs/: Technical documentation
- docs/OCR_SETUP_GUIDE.md: OCR model setup
- docs/FACE_RECOGNITION_IMPLEMENTATION.md: Face recognition details
- docs/opensearch_schema_design.md: Vector search schema
export/README.md: Model export documentation
benchmarks/README.md: Benchmark tool guide

Attribution

This project uses:

NVIDIA Triton Inference Server
Ultralytics YOLO
levipereira/ultralytics fork for End2End TensorRT export
Apple MobileCLIP
InsightFace ArcFace
InsightFace SCRFD
PaddleOCR

See ATTRIBUTION.md for complete licensing information.

Built for maximum throughput - Process 100K+ images in minutes, visual search in milliseconds.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
benchmarks		benchmarks
config_templates/profiles		config_templates/profiles
docs		docs
export		export
models		models
monitoring		monitoring
pytorch_models		pytorch_models
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.hadolint.yaml		.hadolint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
ATTRIBUTION.md		ATTRIBUTION.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Dockerfile.triton		Dockerfile.triton
INSTALLATION.md		INSTALLATION.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
VERSION		VERSION
docker-compose.override.yml		docker-compose.override.yml
docker-compose.yml		docker-compose.yml
env.template		env.template
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

License

davidamacey/OpenProcessor

Folders and files

Latest commit

History

Repository files navigation

Visual AI API

Quick Start

Copy & Run (One Line)

Non-Interactive Setup

Verify Installation

Management Commands

GPU Compatibility

Recent Updates

API Endpoints

Object Detection

Face Recognition

Embeddings

Visual Search

Data Ingestion

OCR (Text Extraction)

Combined Analysis

Clustering & Albums

Data Retrieval

Health & Monitoring

Usage Examples

Python

cURL

Response Formats

Detection Response

Face Recognition Response

Search Response

Ingest Response

Architecture

Models

Performance

System Requirements

Configuration

Worker Count

GPU Selection

Testing

Benchmarking

Documentation

Attribution

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 2

Uh oh!

Languages