Add A100 PyTorch/CUDA YOLO-NAS benchmark submissions (3 variants) by EHxuban11 · Pull Request #5 · LibreYOLO/vision-analysis

EHxuban11 · 2026-04-25T13:05:49Z

3 new submissions for YOLO-NAS on NVIDIA A100-PCIE-40GB (MIG 7g.40gb slice) with torch 2.6.0+cu124. Variants: s, m, l.

Provenance / libreyolo_commit

These runs were executed against libreyolo at commit 3383a8f142a5decc735f362258f6851d3f026fa3 (the 106-add-d-fine-model-family feature branch, which carries the YOLO-NAS port) plus a single small local patch to libreyolo/models/yolonas/utils.py::postprocess. To stay honest about provenance, libreyolo_commit on the 3 submission JSONs is "unknown".

The local patch replaces a per-class Python NMS loop (one torchvision.ops.nms call per surviving class) with a single torchvision.ops.batched_nms call, plus a top-k=1000 pre-NMS filter that mirrors super_gradients' YoloNASPostPredictionCallback default (num_pre_nms_predictions=1000). Without this patch, YOLO-NAS at conf=0.001 (COCO eval default) keeps ~all 8400 anchors past the conf gate and dispatches one small NMS kernel per surviving class, which on A100 MIG measured at ~700 ms/image. Patched: ~60 ms/image, 10x faster.

Verified mAP-neutral: same image, same seeds, mAP_50_95 differs by 0.0004 between patched and unpatched (numerical reordering only). batched_nms with the per-class idxs argument is mathematically identical to the per-class loop, and top-k=1000 matches super_gradients' COCO eval default.

A LibreYOLO upstream issue is open to track moving the per-class loop in libreyolo/utils/general.py::postprocess_detections to batched_nms across the board (helps YOLOX / YOLOv9 / RT-DETR / D-FINE too, just less catastrophically). When that lands, these submissions can be backfilled with the merged libreyolo_commit.

Metadata changes

support-matrix.json: append the 3 yolonas-{s,m,l} model ids. No SHA bump (libreyolo_commit is "unknown" so the matrix doesn't gate this).
website/src/data/metadata/families.json: add yolonas family (Deci, acquired by NVIDIA; 2023).
website/src/data/metadata/models.json: add 3 YOLO-NAS variant entries.

Measured COCO val2017 mAP@50-95 (paper reference in parens):
yolonas-s 0.4645 (~0.475)
yolonas-m 0.5053 (~0.516)
yolonas-l 0.5119 (~0.522)

Local: scripts/validate_submission.py and scripts/build_verified_results.py both exit 0. generated/verified-results.v1.json regenerated and committed.

3 new submissions for YOLO-NAS on NVIDIA A100-PCIE-40GB (MIG 7g.40gb slice) with torch 2.6.0+cu124. Variants: s, m, l. Provenance / libreyolo_commit ----------------------------- These runs were executed against libreyolo at commit 3383a8f142a5decc735f362258f6851d3f026fa3 (the 106-add-d-fine-model-family feature branch, which carries the YOLO-NAS port) plus a single small local patch to libreyolo/models/yolonas/utils.py::postprocess. To stay honest about provenance, libreyolo_commit on the 3 submission JSONs is "unknown". The local patch replaces a per-class Python NMS loop (one torchvision.ops.nms call per surviving class) with a single torchvision.ops.batched_nms call, plus a top-k=1000 pre-NMS filter that mirrors super_gradients' YoloNASPostPredictionCallback default (num_pre_nms_predictions=1000). Without this patch, YOLO-NAS at conf=0.001 (COCO eval default) keeps ~all 8400 anchors past the conf gate and dispatches one small NMS kernel per surviving class, which on A100 MIG measured at ~700 ms/image. Patched: ~60 ms/image, 10x faster. Verified mAP-neutral: same image, same seeds, mAP_50_95 differs by 0.0004 between patched and unpatched (numerical reordering only). batched_nms with the per-class idxs argument is mathematically identical to the per-class loop, and top-k=1000 matches super_gradients' COCO eval default. A LibreYOLO upstream issue is open to track moving the per-class loop in libreyolo/utils/general.py::postprocess_detections to batched_nms across the board (helps YOLOX / YOLOv9 / RT-DETR / D-FINE too, just less catastrophically). When that lands, these submissions can be backfilled with the merged libreyolo_commit. Metadata changes ---------------- - support-matrix.json: append the 3 yolonas-{s,m,l} model ids. No SHA bump (libreyolo_commit is "unknown" so the matrix doesn't gate this). - website/src/data/metadata/families.json: add yolonas family (Deci, acquired by NVIDIA; 2023). - website/src/data/metadata/models.json: add 3 YOLO-NAS variant entries. Measured COCO val2017 mAP@50-95 (paper reference in parens): yolonas-s 0.4645 (~0.475) yolonas-m 0.5053 (~0.516) yolonas-l 0.5119 (~0.522) Local: scripts/validate_submission.py and scripts/build_verified_results.py both exit 0. generated/verified-results.v1.json regenerated and committed.

The first YOLO-NAS submission set (merged in #5) used LibreYOLO's existing shared letterbox preprocessing, which differs from super_gradients' YOLO-NAS COCO validation pipeline in two ways the file's own docstring already flagged ("A later parity pass can tighten this toward the exact SG processing pipeline"): - longest-side resize to 640 instead of 636 - top-left padding instead of center padding to 640x640 Closing those two gaps (plus moving the harness's NMS IoU from 0.6 to 0.7 to match super_gradients' default) recovers ~60% of the 1-point gap to Deci's published numbers. LibreYOLO upstream issue + working branch (113-yolo-nas-validation- preprocessing-diverges-from-super_gradients-map-loss) covers the actual code change. libreyolo_commit on these submissions stays "unknown" until the fix is merged into LibreYOLO main and a pinned commit is added to support-matrix.json. Numbers (COCO val2017, A100 PyTorch FP32): yolonas-s 0.4711 (prev 0.4645, paper 0.475) yolonas-m 0.5111 (prev 0.5053, paper 0.516) yolonas-l 0.5184 (prev 0.5119, paper 0.522) Remaining ~0.005 gap per variant is consistent across s/m/l and likely reflects FP16 vs FP32 plus minor cv2 vs PIL interpolation differences. Local: scripts/validate_submission.py and scripts/build_verified_results.py both exit 0. generated/verified-results.v1.json regenerated.

EHxuban11 merged commit f878a77 into main Apr 25, 2026
1 check passed

EHxuban11 mentioned this pull request Apr 25, 2026

Update YOLO-NAS submissions with super_gradients-parity preprocessing #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add A100 PyTorch/CUDA YOLO-NAS benchmark submissions (3 variants)#5

Add A100 PyTorch/CUDA YOLO-NAS benchmark submissions (3 variants)#5
EHxuban11 merged 1 commit into
mainfrom
benchmark/a100-pytorch-cuda-yolonas

EHxuban11 commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EHxuban11 commented Apr 25, 2026

Provenance / libreyolo_commit

Metadata changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant