HDED — Hierarchical Dense Ensemble Detector

Brute-force blueprint labeling pipeline: 3 YOLO models × 4 zoom levels × 820 tiles per sheet, with Label Studio integration for human-in-the-loop annotation correction.

Self-hosted, open-source, Docker-first.

What HDED does

An experiment in building a (mostly) automated data-labeling engine for object detection and classification.
HDED proposes a strategy for sorting false-positives, and finding false-negatives, with minimal HITL.

The core design thesis: brute force converts search into classification. Rather than trying to detect objects cleanly the first time, HDED over-detects everything (maximum recall) then lets human annotators (or downstream classifiers) filter false positives — because filtering is a classification problem (easy) while finding missed objects is a search problem (hard).

Pipeline:

Rasterize PDF at 300 DPI → PNG per sheet
Tile each sheet into a 3^L grid at L zoom levels (default 4 levels = 820 tiles)
Run 3 YOLO models per tile (precise, medium, primitive)
Ensemble + fuse detections with weighted box fusion
Auto-push pre-annotated projects to Label Studio for human review
Export corrected annotations → retrain loop

Quick start — local Docker

git clone <this-repo> hded
cd hded
./setup.sh            # interactive .env generation (creates secrets, directories)
./start.sh            # starts docker compose stack (auto-detects GPU)
./set_root_admin.sh   # creates first admin user (choose option 1)

Then visit http://localhost:8000 and log in with the admin credentials you just created.

Upload a PDF → pipeline runs → LS project auto-created with pre-annotations.

Architecture

┌────────────────────────────────────────────────────────┐
│ HDED app (FastAPI, port 8000)                          │
│  ├─ routes/    Thin HTTP wrappers                      │
│  ├─ services/  Business logic (pipeline, LS, SageMaker)│
│  ├─ pipeline/  Detection + classifier core             │
│  └─ master/    SageMaker orchestration (optional)      │
└────────────────────────────────────────────────────────┘
              │                    │
              ▼                    ▼
     ┌────────────────┐   ┌──────────────────────┐
     │ Label Studio   │   │ AWS SageMaker        │
     │ (4 containers, │   │ (optional, GPU tiers │
     │  1 per user)   │   │  for large batches)  │
     └────────────────┘   └──────────────────────┘
              │
              ▼
     ┌────────────────┐
     │ Postgres       │
     │ (shared, 4 DBs)│
     └────────────────┘

Max scale: ~10 concurrent users on single EC2. For larger deployments see docs/LIMITS.md.

Label Studio workflow

User uploads blueprint (PDF/PNG/JPG/TIFF)
Pipeline detects symbols → saves per-sheet LS JSON with pre-annotations
App auto-creates LS project in user's assigned container (ports 8081-8084)
User clicks "Open in Label Studio" → SSO redirect to pre-populated project
User corrects boxes, adds missed symbols, removes false positives
Export from LS as YOLO format → feeds back into classifier_bootstrap retraining

Manual re-push: On the results page, "Push to Label Studio" button re-pushes predictions (useful for retries or pushing older jobs).

AWS deployment (optional)

For large-batch processing, HDED supports AWS SageMaker with 3 GPU tiers:

Tier	Fleet	Use case	Cost (full-fleet/hr)
Scrooge McDuck	40× g4dn.xlarge (T4)	Budget, cheapest	~$30
Even Steven	5× g5.12xl + 20× g4dn	Balanced production	~$51
IDGAF	Full multi-fleet	Rush jobs	~$220

Distribution is proportional to GPU power: 5 pages → 7 instances, 20 pages → 12 instances, 400 pages → 37 instances across 4 fleets.

Setup:

./setup_aws.sh        # prompts for BUCKET_PREFIX, creates S3/ECR/IAM
./deploy.sh --all     # builds SageMaker image + deploys app to EC2

Requires: AWS CLI configured, Docker, SageMaker service quotas for g4dn/g5 instance families.

Environment variables

Copy .env.example to .env and fill in. Key vars:

Variable	Purpose
`HDED_HOST`	Public URL of HDED app (used for LS image serving)
`HDED_SESSION_SECRET`	Random 64-char hex for cookie signing
`HDED_LS_IMAGE_HOST`	URL LS containers use to reach HDED (e.g. `host.docker.internal:8000` on Mac)
`POSTGRES_PASSWORD`	Postgres password for LS databases
`HDED_RAW_BUCKET` / `HDED_RESULTS_BUCKET` / `HDED_MODELS_BUCKET`	S3 buckets (if using SageMaker)
`HDED_ECR_IMAGE`	ECR URI of SageMaker pipeline image
`HDED_SAGEMAKER_ROLE`	IAM role ARN with S3+ECR permissions

See .env.example for all options.

Directory layout

hded_classifier_2/
├── app/
│   ├── auth.py              # Session cookie auth (SQLite-backed)
│   ├── main.py              # FastAPI factory, route registration
│   ├── routes/              # HTTP handlers (thin)
│   ├── services/            # Business logic
│   ├── templates/           # Jinja2 HTML (vim-themed)
│   └── static/css/vim.css   # Single-file theme
├── pipeline/
│   ├── config.py            # Unified dataclass config from config.yaml
│   ├── hded/                # Detection pipeline (tile, detect, fuse)
│   └── classifier/          # YOLO model inference
├── master/                  # SageMaker orchestration
├── admin_cli/               # TUI for user/config management
├── config.yaml              # Model weights + ensemble config
├── gpu_limits.yaml          # Rate limits + allowed GPU tiers
├── docker-compose.yml       # App + Postgres + LS
├── docker-compose.ls.yml    # 4× LS containers for multi-user
├── docker-compose.prod.yml  # Caddy HTTPS overlay
├── setup.sh                 # Interactive .env generator
├── setup_aws.sh             # S3/ECR/IAM provisioning
├── deploy.sh                # Rsync to EC2 + docker compose up
└── sync_ls_users.py         # Seed LS accounts from HDED users

Related projects

classifier_bootstrap — CLI tool for bootstrapping YOLO classifier training data from HDED results (closes the active-learning loop)
Label Studio — the annotation UI this integrates with (heartex/label-studio)

License

MIT — see LICENSE file.

Contributing

This is a small-team tool and is not designed to scale past ~10 concurrent users. PRs welcome for:

Bug fixes
Documentation
Pipeline improvements
Additional GPU/cloud backends

Not accepting PRs for: horizontal-scaling refactors (Kubernetes, multi-region), Label Studio alternatives.

Status

Current: v2.0.0 — docker-compose + FastAPI + vim-themed UI, LS multi-container integration, optional SageMaker GPU tiers, agentic JSON API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HDED — Hierarchical Dense Ensemble Detector

What HDED does

Quick start — local Docker

Architecture

Label Studio workflow

AWS deployment (optional)

Environment variables

Directory layout

Related projects

License

Contributing

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
admin_cli		admin_cli
app		app
docs		docs
master		master
pipeline		pipeline
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Caddyfile		Caddyfile
Dockerfile		Dockerfile
Dockerfile.sagemaker		Dockerfile.sagemaker
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
deploy.sh		deploy.sh
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.ls.yml		docker-compose.ls.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
gpu_limits.yaml		gpu_limits.yaml
hded		hded
init-multi-db.sh		init-multi-db.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
seed_demo.py		seed_demo.py
set_root_admin.sh		set_root_admin.sh
setup.sh		setup.sh
setup_aws.sh		setup_aws.sh
start.sh		start.sh
stop.sh		stop.sh
stop_aws.sh		stop_aws.sh
stop_gpus.sh		stop_gpus.sh
sync_ls_users.py		sync_ls_users.py
wipe_aws.sh		wipe_aws.sh

Folders and files

Latest commit

History

Repository files navigation

HDED — Hierarchical Dense Ensemble Detector

What HDED does

Quick start — local Docker

Architecture

Label Studio workflow

AWS deployment (optional)

Environment variables

Directory layout

Related projects

License

Contributing

Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages