AI-Based AprilTag Pipeline Acceleration#2410
AI-Based AprilTag Pipeline Acceleration#2410DoctorFogarty wants to merge 27 commits intoPhotonVision:mainfrom
Conversation
| public final FrameStaticProperties frameStaticProperties; | ||
|
|
||
| /** Optional ML detection ROI bounding boxes for visualization. Set by ML-assisted pipelines. */ | ||
| public List<RotatedRect> mlDetectionRois = List.of(); |
There was a problem hiding this comment.
Frame isn't the right place to maintain this state. can it move to the pipeline result?
| } | ||
|
|
||
| /** Result container for ML hybrid detection */ | ||
| private static class MLDetectionResult { |
There was a problem hiding this comment.
Let's refactor this to not live as an inner class
| * Performs ML-assisted hybrid AprilTag detection. Stage 1: ML model detects ROIs Stage 2: | ||
| * Traditional detector decodes tags within ROIs | ||
| */ | ||
| private MLDetectionResult processMLHybrid(Frame frame) { |
There was a problem hiding this comment.
This logic feels like it wants to be a Pipe
5f60a17 to
93c2d80
Compare
93c2d80 to
ca0e47b
Compare
| <pv-slider | ||
| v-if=" | ||
| (currentPipelineSettings.pipelineType === PipelineType.AprilTag || | ||
| currentPipelineSettings.pipelineType === PipelineType.Aruco) && | ||
| useCameraSettingsStore().isCurrentVideoFormatCalibrated && | ||
| useCameraSettingsStore().currentPipelineSettings.solvePNPEnabled && | ||
| currentPipelineSettings.doMultiTarget | ||
| " | ||
| v-model="currentPipelineSettings.multiTagAmbiguityThreshold" | ||
| label="Max Allowed Ambiguity" | ||
| tooltip="Tags with pose ambiguity above this value are excluded from multi-tag estimation. Lower = stricter. 0 = only unambiguous tags. 1 = include all (disabled)." | ||
| :min="0" | ||
| :max="1" | ||
| :step="0.05" | ||
| :switch-cols="interactiveCols" | ||
| @update:modelValue=" | ||
| (value) => useCameraSettingsStore().changeCurrentPipelineSetting({ multiTagAmbiguityThreshold: value }, false) | ||
| " | ||
| /> |
There was a problem hiding this comment.
This feature should be split to a separate PR
Setup Basic Tests Included Roboflow model tflite yolov8n trained
…ix type is cited from
This reverts commit e40f174.
…ackaged V8 model as I will replace it soon.
Removed old V8 model for AprilTags. Added entries for current V11 AprilTag Models for Rubik and OPi5
c703924 to
fd791a8
Compare
fd791a8 to
076b6ba
Compare
|
What are the performance benefits of this like? |
Ditto, I'm curious to see performance benefits from doing this. Quad fitting versus ROI cropping which still requires either a DMA transfer or a mem-copy to the NPU. I'd also want to see the performance benefits of being able to reduce decimate in just those areas given less pixels are being searched to begin with (increased range for the same baseline latency addition from using ML). |
|
@me-it-is @srimanachanta see above. |
Insanely cool. Good work. |
|
I think this is worth a design doc in the developer section of our website + some extra words added to our normal user docs as well before we merge. There's a lotta brains and thinking going on here and I want to support both future devs and users confused about why the tags have a bounding box now |








Description
Adds a two-stage hybrid ML/traditional AprilTag detection pipeline that leverages NPU hardware for accelerated tag
detection. A YOLO v11 model identifies AprilTag regions of interest (ROIs) on the NPU, then the traditional WPILib AprilTag detector decodes only the cropped sub-images for accurate tag ID and pose. This reduces the per-frame computational load on the CPU by narrowing the search space. Falls back to full-frame traditional detection when ML finds no tags if the user wishes to enable a fallback setting.
Two-Stage Hybrid Pipeline
AprilTagROIDetectionPiperuns a YOLO v11 model on the NPU to produce bounding boxes around candidate tagsAprilTagROIDecodePipeextracts each ROI sub-image, runs the WPILib AprilTag detector on it, and maps corners + homography back to full-frame coordinatesmlFallbackToTraditionalis enabled (default), the pipeline falls back to full-frame traditional detectionDrawMLROIPiperenders cyan bounding boxes around ML-detected ROIs on the output stream for tuningHomography Coordinate Transformation
transformHomography()applies translation-only mapping (ROI offset to full frame)transformHomographyWithScale()applies combined inverse-scaling and translation for ATR-resized ROIsAdaptive Tag Resizing (ATR)
atrEnabled(default:true),atrTargetDimension(default:200px),atrMinScaleFactor(default:0.25, caps at 4× downscale)New AprilTag Pipeline Settings
useMLDetection,mlConfidenceThreshold(0.5),mlNmsThreshold(0.45),mlRoiPaddingPixels(40),mlFallbackToTraditional(true),mlModelName,showDetectionBoxes(true)atrEnabled(true),atrTargetDimension(200),atrMinScaleFactor(0.25)multiTagAmbiguityThreshold(0.2) — filters high-ambiguity single-tag poses before multi-tag PNP estimationoutputShowMultipleTargetswith numericoutputMaximumTargets(default: 20, max: 127). Backward-compatible deserialization via@JsonAnySettermigrationModel Management
apriltagV4-yolo11.rknn(RK3588) andapriltagV4-yolo11.tflite(QCS6490/Rubik Pi 3)NeuralNetworkModelManagerhandles platform-aware model discovery and loadingFrontend / UI
supportedBackendsis non-emptyPipelineTypes.tsandSettingTypes.tsBug Fixes & Improvements
Frame.java: AddedmlDetectionRoisfield to carry ROI bounding boxes through the pipeline for visualizationMeta
Merge checklist: