Skip to content

feat(visdrone): live browser demo (path 1) + preview ONNX weights#3

Open
aalvsz wants to merge 1 commit into
LibreYOLO:mainfrom
aalvsz:visdrone-demo-path1
Open

feat(visdrone): live browser demo (path 1) + preview ONNX weights#3
aalvsz wants to merge 1 commit into
LibreYOLO:mainfrom
aalvsz:visdrone-demo-path1

Conversation

@aalvsz
Copy link
Copy Markdown
Contributor

@aalvsz aalvsz commented Apr 25, 2026

Summary

Closes the "Path 1: pending trained weights" stub from the original VisDrone PR. Paths 1 (browser) and 2 (Python) now both work against a real, MIT-licensed preview model.

What's live

🤗 Preview weights: https://huggingface.co/ander2221/visdrone-yolo9-preview
🌐 Browser demo: `visdrone-finetune/demo/index.html` (zero-install, fetches ONNX from HF on first visit, caches)

The demo:

  • Self-contained single HTML file matching `blur-faces/demo/index.html` conventions
  • Inference via onnxruntime-web — WebGPU first, WASM fallback
  • Image upload + webcam input
  • Boxes + class labels drawn on a canvas
  • 10 VisDrone classes color-coded
  • Override source repo via `?repo=org/name` URL param

What's in the PR

File Role
`demo/index.html` Browser demo (~300 LOC, vanilla JS module)
`src/export_and_push.py` torch → ONNX (dynamic batch) + HF Hub upload + auto model card
`src/load_finetuned.py` Helper reproducing libreyolo's `_rebuild_for_new_classes` path so the trained hybrid checkpoint loads cleanly
`src/train.py` Drop hardcoded `nb_classes=10` — let the factory auto-detect from COCO weights and the trainer rebuild from `data.yaml`
`src/infer.py` Use the new load helper
`README.md` Path 1 banner flipped to "Live (preview)" with honest status
`.gitignore` Ignore `logs/` and `export/` (transient)

Training honesty

These are preview weights, not production:

  • Trained on a Mac M-series GPU (Apple Metal Performance Shaders) — not a real datacenter GPU
  • 5 epochs, imgsz=384, batch=8, ~12 min wall clock
  • Full Voxel51/VisDrone2019-DET train split (7766 images)
  • Loss dropped 14.9 → 5.4

Detections are real and look right on held-out val images — e.g. 34 cars + 1 bus correctly identified on `9999938_00000_d_0000496.jpg` at conf 0.15+. Confidences are modest (0.2-0.6 typical) because the model is undertrained. A full ~50-epoch run on a real GPU would replace these.

The model card on HF Hub explicitly tags this as v0.1-preview with the same caveats.

Upgrade path

When real weights are trained:

  1. Train on whatever GPU is available (script unchanged).
  2. Run `python -m src.export_and_push --weights weights/visdrone.pt --repo-id LibreYOLO/visdrone-yolo9 ...`
  3. Update the demo's default `HF_REPO` constant in `demo/index.html` (or just expect users to use the `?repo=` URL param).

Test plan

  • `python -m src.train` runs end-to-end on MPS without errors
  • `python -m src.infer` produces sensible detections on val images
  • `python -m src.export_and_push` produces working ONNX and successfully uploads to HF Hub
  • HF repo is publicly accessible (302 → CDN, content-type ok)
  • ONNX loads in onnxruntime-cpu, output shape `(1, 14, 3024)` for imgsz=384, range looks right (logits, not probabilities)
  • Browser demo end-to-end test on a fresh Chrome — pending; see screenshots in next iteration if needed

Related

🤖 Generated with Claude Code

Resolves the "Path 1: pending trained weights" stub: paths 1 (browser)
and 2 (Python) now work against a real, MIT-licensed VisDrone preview
model trained locally on Apple Metal.

What changed:

  demo/index.html             Self-contained browser demo. Pulls the ONNX
                              from ander2221/visdrone-yolo9-preview on first
                              visit, runs inference via onnxruntime-web
                              (WebGPU → WASM fallback), draws annotated
                              boxes. Webcam + image input. Override the
                              source repo via ?repo=org/name in the URL.

  src/load_finetuned.py       Helper that reproduces libreyolo's
                              _rebuild_for_new_classes path so the trained
                              hybrid checkpoint (80-channel cls intermediate
                              + 10-class final) loads cleanly.

  src/export_and_push.py      End-to-end CLI: torch -> ONNX (dynamic batch),
                              create HF Hub repo, upload .pt + .onnx + auto-
                              generated model card. Used to publish
                              ander2221/visdrone-yolo9-preview.

  src/train.py                Drop the hardcoded nb_classes=10 — let the
                              factory auto-detect from the COCO-pretrained
                              checkpoint, then trainer rebuilds for VisDrone
                              when it reads data.yaml's nc=10. Fixes the
                              previous shape-mismatch on weight load.

  src/infer.py                Use the new load_finetuned helper.

  README.md                   Path 1 banner flipped to "Live (preview)" with
                              an honest "5 epochs on Apple Metal" status
                              note and an upgrade path for fully-trained
                              upstream weights.

  .gitignore                  Ignore logs/ and export/ which only contain
                              transient training/export artifacts.

The preview weights:

  https://huggingface.co/ander2221/visdrone-yolo9-preview

Trained for 5 epochs on the full Voxel51/VisDrone2019-DET train split
(7766 images), imgsz=384, batch=8, lr0=0.005, on a Mac M-series MPS
GPU (~12 minutes wall clock). Loss dropped 14.9 → 5.4. Real detections
on val images (e.g. 34 cars + 1 bus on a held-out frame at conf 0.15+).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant