Skip to content

bratjay01/RobotPicking

Repository files navigation

Package Detection and Normal Vector Estimation

This script (normal_angle.py) detects packages in an image, estimates their depth map, fits planes to identify flat surfaces, and computes picking points with orientation angles for robotic picking.

Approach

  • Depth Estimation: Uses MiDaS (DPT_Large model) to generate a depth map from the input image, resized to 384 pixels for processing.
  • Package Detection: Employs Grounding DINO for zero-shot object detection to identify package-like regions (e.g., boxes, bags) with a confidence threshold of 0.3 and minimum area of 10,000 pixels. (Tried YOLO, Fast-SAM for the same purpose but Grounding DINO proves to be good for Zero shot detection even in overlapping scenarios)
  • Plane Fitting: Applies RANSAC to fit planes to depth points in detected regions, filtering flat areas using depth gradients (bottom 20% magnitude) to ensure reliable surfaces.
  • Normal and Angle Computation: Calculates the plane’s normal vector and derives the picking angle in the XY plane for each valid surface.
  • Visualization: Saves intermediate outputs (depth_output.jpg, package_regions.jpg) and the final image (output.jpg) with green surface masks and red arrows indicating picking angles. The saved output can be viewed in the "output" directory.

Assumptions

  • Input images contain package-like objects (boxes, bags, etc.) with flat surfaces suitable for picking.
  • The depth map from MiDaS provides sufficient accuracy for plane fitting.
  • Detected regions have enough flat points (≥1,000 inliers, ≥10,000 total mask pixels) for reliable plane fitting.
  • The camera’s perspective aligns with typical robotic picking scenarios.

Libraries Used

  • Python 3.10
  • numpy==2.1.1, opencv-contrib-python==4.10.0.84, torch, torchvision, transformers==4.44.2, scikit-learn==1.5.2, Pillow==10.4.0
  • Models: MiDaS (intel-isl/MiDaS, DPT_Large), Grounding DINO (IDEA-Research/grounding-dino-base)

Limitations and Potential Improvements

  • Limitations:

    • Depth Accuracy: MiDaS depth maps may be noisy for low-texture surfaces, affecting plane fitting accuracy.
    • Object Detection: Grounding DINO may miss packages with uncommon appearances or low confidence scores.
    • Plane Fitting: RANSAC may fail on small or non-flat regions; the 10,000-pixel minimum area may exclude valid smaller packages.
    • Computational Cost: Depth estimation and object detection are resource-intensive, limiting real-time use but still possible.
  • Potential Improvements:

    • Using stereo vision or LiDAR for more accurate depth maps.
    • Fine-tune Grounding DINO for specific package types or integrate traditional detection (e.g., YOLO).
    • Adjust RANSAC parameters (e.g., lower inlier threshold) or use alternative plane-fitting methods for smaller regions.
    • Optimize for speed using lighter models (e.g., MiDaS Small) or parallel processing.

    Sample Output

About

Computing the normal angle and surface of different packages and boxes for efficient robot picking

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages