Package Detection and Normal Vector Estimation

This script (normal_angle.py) detects packages in an image, estimates their depth map, fits planes to identify flat surfaces, and computes picking points with orientation angles for robotic picking.

Approach

Depth Estimation: Uses MiDaS (DPT_Large model) to generate a depth map from the input image, resized to 384 pixels for processing.
Package Detection: Employs Grounding DINO for zero-shot object detection to identify package-like regions (e.g., boxes, bags) with a confidence threshold of 0.3 and minimum area of 10,000 pixels. (Tried YOLO, Fast-SAM for the same purpose but Grounding DINO proves to be good for Zero shot detection even in overlapping scenarios)
Plane Fitting: Applies RANSAC to fit planes to depth points in detected regions, filtering flat areas using depth gradients (bottom 20% magnitude) to ensure reliable surfaces.
Normal and Angle Computation: Calculates the plane’s normal vector and derives the picking angle in the XY plane for each valid surface.
Visualization: Saves intermediate outputs (depth_output.jpg, package_regions.jpg) and the final image (output.jpg) with green surface masks and red arrows indicating picking angles. The saved output can be viewed in the "output" directory.

Assumptions

Input images contain package-like objects (boxes, bags, etc.) with flat surfaces suitable for picking.
The depth map from MiDaS provides sufficient accuracy for plane fitting.
Detected regions have enough flat points (≥1,000 inliers, ≥10,000 total mask pixels) for reliable plane fitting.
The camera’s perspective aligns with typical robotic picking scenarios.

Libraries Used

Python 3.10
numpy==2.1.1, opencv-contrib-python==4.10.0.84, torch, torchvision, transformers==4.44.2, scikit-learn==1.5.2, Pillow==10.4.0
Models: MiDaS (intel-isl/MiDaS, DPT_Large), Grounding DINO (IDEA-Research/grounding-dino-base)

Limitations and Potential Improvements

Limitations:
- Depth Accuracy: MiDaS depth maps may be noisy for low-texture surfaces, affecting plane fitting accuracy.
- Object Detection: Grounding DINO may miss packages with uncommon appearances or low confidence scores.
- Plane Fitting: RANSAC may fail on small or non-flat regions; the 10,000-pixel minimum area may exclude valid smaller packages.
- Computational Cost: Depth estimation and object detection are resource-intensive, limiting real-time use but still possible.
Potential Improvements:
- Using stereo vision or LiDAR for more accurate depth maps.
- Fine-tune Grounding DINO for specific package types or integrate traditional detection (e.g., YOLO).
- Adjust RANSAC parameters (e.g., lower inlier threshold) or use alternative plane-fitting methods for smaller regions.
- Optimize for speed using lighter models (e.g., MiDaS Small) or parallel processing.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
output		output
.DS_Store		.DS_Store
IMG_9102.jpeg		IMG_9102.jpeg
IMG_9103.jpeg		IMG_9103.jpeg
IMG_9104.jpeg		IMG_9104.jpeg
LICENSE		LICENSE
depth_output.jpg		depth_output.jpg
normal_angle.py		normal_angle.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Package Detection and Normal Vector Estimation

Approach

Assumptions

Libraries Used

Limitations and Potential Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Package Detection and Normal Vector Estimation

Approach

Assumptions

Libraries Used

Limitations and Potential Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages