Collision avoidance from monocular vision trained with novel view synthesis

This repository contains the code for the working paper Collision avoidance from monocular vision trained with novel view synthesis, available on HAL https://hal.science/hal-05005146.

Installation

Clone the Repository

First, clone the repository recursively to ensure all submodules are included:

git clone https://github.com/Tordjx/Collision-avoidance-2DGS.git --recursive

Conda Environment

Add the following line to third_party/2d-gaussian-splatting/submodules/simple-knn/simple_knn.cu (source: PyTorch Audio PR #3811):

#include <float.h>

Apply a similar patch to `third_party/diff-gaussian-rasterization/cuda_rasterizer/rasterizer_impl.h:

#include <cstdint>

Now, create the vision_agent conda environment:

conda env create -f environment.yaml
conda activate vision_agent

Creating a GS Dataset

Camera Settings

To ensure high-quality image captures for accurate 3D reconstruction, follow these camera settings:

Shutter Speed: Set the shutter speed to at least 1/125s to avoid motion blur. This is crucial for maintaining clear images, particularly when capturing in dynamic environments.
ISO & Aperture: Adjust the ISO and aperture to avoid underexposure, especially in indoor settings. A larger aperture (lower f-number) allows more light, while a higher ISO setting compensates for the darker environment.
Camera Type: We used a GoPro camera with a wide 16 mm lens, set to auto-focus.
Video Settings: We record in 4K resolution at 60 fps to maximize the number of keyframes extracted from the footage.

Capture Tips

Camera Motion: Induce maximum parallax by moving around the objects you want to capture. This enhances depth information in the scene, which is critical for accurate 3D reconstruction.
Capture Duration: Capture a video for at least 5 minutes for each scene, ensuring extensive coverage of the environment.
File Compression: Due to video compression, interframes (interpolated frames) are included. Extract keyframes using FFMPEG to ensure the highest-quality images are used for reconstruction.
Manual Check: After extracting the keyframes, manually review and remove any images with excessive motion blur. This typically results in about 500 usable images per scene.
Image Coverage: Ensure that the images cover the scene extensively from different angles and viewpoints. The more diverse the captures, the better the resulting mesh and point cloud will be.

Make sure to also capture at least 3 images at known relative positions. They will allow to perform geo-registration in COLMAP afterwards, to align the z-axis with gravity and scale your frame of reference correctly. You can use a room corner or a table to do so.

COLMAP

With your images in hand, we can use COLMAP to infer the camera poses and intrinsics.

cd third_party/2d_gaussian_splatting
python convert.py -s <path to your images folder>

Now you can perform geo-registration on your dataset. First, create a text file geo-registration.txt like so:

image_name1.jpg X1 Y1 Z1
image_name2.jpg X2 Y2 Z2
image_name3.jpg X3 Y3 Z3
...

Then:

colmap model_aligner \
    --input_path ./sparse/0 \
    --output_path ./name_of_output_directory \
    --ref_images_path ./geo-registration.txt \
    --ref_is_gps 0 \
    --alignment_type custom \
    --alignment_max_error 3.0

where sparse/0 is the path to your model directory. In the ideal case, there will be a single distorted/sparse/0 directory and a single sparse/0 output directory, in which case your model path is the latter. If there are several directories in distorted/sparse, pick the largest one and rename it to 0, then re-run python convert.py -s your/scene/path --skip_matching to produce a new sparse/0 output directory.

Getting the Vision and Collision Mesh

Train the Gaussian Splatting model and render the mesh:

cd third_party/2d_gaussian_splatting
python train.py -s <path to COLMAP or NeRF Synthetic dataset>
python render.py -m <path to trained model> --skip-train --skip-test

If you encounter an incomplete mesh, you may need to adjust the --sdf_trunc or --depth_trunc parameters.

Next, decompose the mesh into convex subparts:

python coacd.py --mesh <path to your mesh>

You may need to tweak the --preprocess_resolution and --threshold parameters.

Afterwards, open the mesh in Blender, remove the ground and any artifacts, and save the processed mesh as manual_postprocess.obj.

Finally, generate a URDF to load the mesh in Pinocchio:

cd third_party/obj2urdf
python obj2urdf.py <path to the file.obj>

Copy manual_postprocess.obj, manual_postprocess.urdf, and point_cloud.ply to the data folder.

Train and Test Your Navigation Policy

Collect the Dataset

Collect some RGB and depth images to train the visual encoder:

python make_dataset.py

By default, the dataset will have 60,000 samples, but you can adjust this with the --len_dataset parameter.

Train the Vision Encoder

Train the vision encoder:

python autoencoder.py

This script will also visualize the image, depth reconstruction, and depth ground truth at the end of training. Use the --skip_train argument to skip training and only view the visualizations. You can also adjust the batch size and number of epochs with the --batch_size and --epochs parameters.

Train Your Navigation Policy

Now, you’re ready to train your navigation policy:

python train_policy.py

You can adjust the number of training steps with the --training_steps parameter.

Test Your Navigation Policy

To test the navigation policy:

python test_nav_policy.py

This will open a window showing the agent’s behavior when instructed to go full throttle forward.

Trying It Out on Upkie!

In Simulation

In one terminal, start the Upkie simulation:

git clone https://github.com/upkie/upkie.git
cd upkie
git checkout 541b8ed686508c159a643f8c22316627a96f71ef
./start_simulation.sh

In another terminal, run the agent:

python run.py

This will open a window to visualize the FPV of the robot. With a joystick connected, your policy will correct your joystick inputs to avoid collisions.

On Your Real Upkie

In one terminal, reset the Upkie system:

upkie_tool rezero
make run_pi3hat_spine

In another terminal, run the agent:

python run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Collision avoidance from monocular vision trained with novel view synthesis

Installation

Clone the Repository

Conda Environment

Creating a GS Dataset

Camera Settings

Capture Tips

COLMAP

Getting the Vision and Collision Mesh

Train and Test Your Navigation Policy

Collect the Dataset

Train the Vision Encoder

Train Your Navigation Policy

Test Your Navigation Policy

Trying It Out on Upkie!

In Simulation

On Your Real Upkie

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
config		config
env		env
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
autoencoder.py		autoencoder.py
coacd.py		coacd.py
environment.yaml		environment.yaml
make_dataset.py		make_dataset.py
run.py		run.py
test_nav_policy.py		test_nav_policy.py
train_policy.py		train_policy.py
vision_training.png		vision_training.png

Folders and files

Latest commit

History

Repository files navigation

Collision avoidance from monocular vision trained with novel view synthesis

Installation

Clone the Repository

Conda Environment

Creating a GS Dataset

Camera Settings

Capture Tips

COLMAP

Getting the Vision and Collision Mesh

Train and Test Your Navigation Policy

Collect the Dataset

Train the Vision Encoder

Train Your Navigation Policy

Test Your Navigation Policy

Trying It Out on Upkie!

In Simulation

On Your Real Upkie

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages