🐃Niagara
Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View TCSVT 2025
Xianzu Wu
·
Zhenxin Ai
·
Harry Yang
·
Sernam Lim
·
Jun Liu
·
Huan Wang
- 📢
18.03.2025: release code and paper. - Release Complete Checkpoint.
| Method | PSNR (5f) | SSIM (5f) | LPIPS (5f) | PSNR (10f) | SSIM (10f) | LPIPS (10f) | PSNR (u[-30,30]f) | SSIM (u[-30,30]f) | LPIPS (u[-30,30]f) |
|---|---|---|---|---|---|---|---|---|---|
| Syn-Sin | - | - | - | - | - | - | 22.30 | 0.740 | - |
| SV-MPI | 27.10 | 0.870 | - | 24.40 | 0.812 | - | 23.52 | 0.785 | - |
| BTS | - | - | - | - | - | - | 24.00 | 0.755 | 0.194 |
| Splatter Image | 28.15 | 0.894 | 0.110 | 25.34 | 0.842 | 0.144 | 24.15 | 0.810 | 0.177 |
| MINE | 28.45 | 0.897 | 0.111 | 25.89 | 0.850 | 0.150 | 24.75 | 0.820 | 0.179 |
| Flash3D | 28.46 | 0.899 | 0.100 | 25.94 | 0.857 | 0.133 | 24.93 | 0.833 | 0.160 |
| Ours | 29.00 | 0.904 | 0.099 | 26.30 | 0.862 | 0.131 | 25.28 | 0.836 | 0.156 |
Niagara has been trained and tested with the followings software versions:
- Python 3.10
- Pytorch 2.2.2
- CUDA 11.8
- GCC 11.2 (or more recent)
Begin by installing CUDA 11.8 and adding the path containing the nvcc compiler to the PATH environmental variable.
Then the python environment can be created either via conda:
conda create -y python=3.10 -n niagara
conda activate niagaraor using Python's venv module (assuming you already have access to Python 3.10 on your system):
python3.10 -m venv .venv
. .venv/bin/activateFinally, install the required packages as follows:
pip install -r requirements-torch.txt --extra-index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txtgit clone diff-gaussian-rasterization @ git+https://github.com/eldar/diff-gaussian-rasterization-w-pose@main
git submodule add diff-gaussian-rasterization @ git+https://github.com/eldar/diff-gaussian-rasterization-w-pose@main third_party/diff-gaussian-rasterization-w-pose
git submodule update --init --recursiveFor downloading the RealEstate10K dataset we base our instructions on the Behind The Scenes scripts.
First you need to download the video sequence metadata including camera poses from https://google.github.io/realestate10k/download.html and unpack it into data/ such that the folder layout is as follows:
data/RealEstate10K/train
data/RealEstate10K/test
Finally download the training and test sets of the dataset with the following commands:
python datasets/download_realestate10k.py -d data/RealEstate10K -o data/RealEstate10K -m train
python datasets/download_realestate10k.py -d data/RealEstate10K -o data/RealEstate10K -m testThis step will take several days to complete. Finally, download additional data for the RealEstate10K dataset. In particular, we provide pre-processed COLMAP cache containing sparse point clouds which are used to estimate the scaling factor for depth predictions. The last two commands filter the training and testing set from any missing video sequences.
sh datasets/dowload_realestate10k_colmap.sh
python -m datasets.preprocess_realestate10k -d data/RealEstate10K -s train
python -m datasets.preprocess_realestate10k -d data/RealEstate10K -s testFor downloading the KITTI dataset, we base our instructions on the versatran01 scripts.
cd kitti_raw
wget -nc -i kitti_archives.txtThis step will take in some time to complete. Finally, the KITTI download data you need to extract.
unzip "*drive*.zip" "*/*/image*"
unzip "*drive*.zip" "*/*/oxts*"
unzip "*calib*.zip"We provide model weights that could be downloaded and evaluated on RealEstate10K test set:
python -m misc.download_pretrained_models -o exp/re10k_v2
sh evaluate.sh exp/re10k_v2Huggingface login (国内需要先 export HF_ENDPOINT=https://hf-mirror.com)
huggingface-cli login
# Then input your huggingface token for authentication
In order to train the model on RealEstate10K dataset execute this command:
python train.py \
+experiment=layered_re10k \
model.depth.version=v1 \
train.logging=false For multiple GPU, we can run with this command:
bash train.shbash evaluate.sh@article{wu2025niagara,
author={Wu, Xianzu and Ai, Zhenxin and Yang, Harry and Lim, Sernam and Liu, Jun and Wang, Huan},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
title={Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View},
year={2025},
doi={10.1109/TCSVT.2025.3643728}
}
A large portion of codes in this repo is based on Flash3D, some of the code is borrowed from:
