-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Hi,
I tried to run the training pipeline on a different MipNeRF dataset (kitchen, stump), and I bumped into the following error:
[NOTE] Not running eval iterations since only viewer is enabled.
Use --vis {wandb, tensorboard, viewer+wandb, viewer+tensorboard} to run with eval.
No Nerfstudio checkpoint to load, so training from scratch.
Disabled comet/tensorboard/wandb event writers
(viser) Connection opened (0, 1 total), 378 persistent messages
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [15,0,0], thread: [106,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [15,0,0], thread: [107,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [13,0,0], thread: [116,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [13,0,0], thread: [117,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [5,0,0], thread: [24,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [5,0,0], thread: [25,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 0.0152
VanillaPipeline.get_train_loss_dict: 0.0147
Traceback (most recent call last):
File "/home/agenuinedream/lib/anaconda3/envs/pixie/bin/ns-train", line 7, in <module>
sys.exit(entrypoint())
File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/scripts/train.py", line 262, in entrypoint
main(
File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/scripts/train.py", line 247, in main
launch(
File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/scripts/train.py", line 189, in launch
main_func(local_rank=0, world_size=world_size, config=config)
File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/scripts/train.py", line 100, in train_loop
trainer.train()
File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/engine/trainer.py", line 266, in train
loss, loss_dict, metrics_dict = self.train_iteration(step)
File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
out = func(*args, **kwargs)
File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/engine/trainer.py", line 502, in train_iteration
_, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
out = func(*args, **kwargs)
File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 299, in get_train_loss_dict
ray_bundle, batch = self.datamanager.next_train(step)
File "/home/agenuinedream/repo/pixie/third_party/f3rm/f3rm/feature_datamanager.py", line 108, in next_train
ray_bundle, batch = super().next_train(step)
File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/data/datamanagers/base_datamanager.py", line 538, in next_train
ray_bundle = self.train_ray_generator(ray_indices)
File "/home/agenuinedream/lib/anaconda3/envs/pixie/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/agenuinedream/lib/anaconda3/envs/pixie/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/model_components/ray_generators.py", line 52, in forward
ray_bundle = self.cameras.generate_rays(
File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/cameras/cameras.py", line 423, in generate_rays
if cameras.is_jagged and coords is None and (keep_shape is None or keep_shape is False):
File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/cameras/cameras.py", line 289, in is_jagged
h_jagged = not torch.all(self.height == self.height.view(-1)[0])
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ERROR [RUN] Stopping pipeline at step: TRAIN_F3RM_RERUN
May I ask:
- Is this pipeline generalized to different multi-view images with COLMAP cameras dataset?
- If yes, would you mind sharing how to solve this bug?
Thank you!
Leo
Metadata
Metadata
Assignees
Labels
No labels