This script (normal_angle.py) detects packages in an image, estimates their depth map, fits planes to identify flat surfaces, and computes picking points with orientation angles for robotic picking.
- Depth Estimation: Uses MiDaS (DPT_Large model) to generate a depth map from the input image, resized to 384 pixels for processing.
- Package Detection: Employs Grounding DINO for zero-shot object detection to identify package-like regions (e.g., boxes, bags) with a confidence threshold of 0.3 and minimum area of 10,000 pixels. (Tried YOLO, Fast-SAM for the same purpose but Grounding DINO proves to be good for Zero shot detection even in overlapping scenarios)
- Plane Fitting: Applies RANSAC to fit planes to depth points in detected regions, filtering flat areas using depth gradients (bottom 20% magnitude) to ensure reliable surfaces.
- Normal and Angle Computation: Calculates the plane’s normal vector and derives the picking angle in the XY plane for each valid surface.
- Visualization: Saves intermediate outputs (
depth_output.jpg,package_regions.jpg) and the final image (output.jpg) with green surface masks and red arrows indicating picking angles. The saved output can be viewed in the "output" directory.
- Input images contain package-like objects (boxes, bags, etc.) with flat surfaces suitable for picking.
- The depth map from MiDaS provides sufficient accuracy for plane fitting.
- Detected regions have enough flat points (≥1,000 inliers, ≥10,000 total mask pixels) for reliable plane fitting.
- The camera’s perspective aligns with typical robotic picking scenarios.
- Python 3.10
numpy==2.1.1,opencv-contrib-python==4.10.0.84,torch,torchvision,transformers==4.44.2,scikit-learn==1.5.2,Pillow==10.4.0- Models: MiDaS (
intel-isl/MiDaS, DPT_Large), Grounding DINO (IDEA-Research/grounding-dino-base)
-
Limitations:
- Depth Accuracy: MiDaS depth maps may be noisy for low-texture surfaces, affecting plane fitting accuracy.
- Object Detection: Grounding DINO may miss packages with uncommon appearances or low confidence scores.
- Plane Fitting: RANSAC may fail on small or non-flat regions; the 10,000-pixel minimum area may exclude valid smaller packages.
- Computational Cost: Depth estimation and object detection are resource-intensive, limiting real-time use but still possible.
-
Potential Improvements:
- Using stereo vision or LiDAR for more accurate depth maps.
- Fine-tune Grounding DINO for specific package types or integrate traditional detection (e.g., YOLO).
- Adjust RANSAC parameters (e.g., lower inlier threshold) or use alternative plane-fitting methods for smaller regions.
- Optimize for speed using lighter models (e.g., MiDaS Small) or parallel processing.
