Skip to content

Training for other MipNeRF dataset (eg. kitchen) #8

@LeoHsuProgrammingLab

Description

@LeoHsuProgrammingLab

Hi,

I tried to run the training pipeline on a different MipNeRF dataset (kitchen, stump), and I bumped into the following error:

[NOTE] Not running eval iterations since only viewer is enabled.
Use --vis {wandb, tensorboard, viewer+wandb, viewer+tensorboard} to run with eval.
No Nerfstudio checkpoint to load, so training from scratch.
Disabled comet/tensorboard/wandb event writers
(viser) Connection opened (0, 1 total), 378 persistent messages
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [15,0,0], thread: [106,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [15,0,0], thread: [107,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [13,0,0], thread: [116,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [13,0,0], thread: [117,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [5,0,0], thread: [24,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [5,0,0], thread: [25,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 0.0152              
VanillaPipeline.get_train_loss_dict: 0.0147              
Traceback (most recent call last):
  File "/home/agenuinedream/lib/anaconda3/envs/pixie/bin/ns-train", line 7, in <module>
    sys.exit(entrypoint())
  File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/scripts/train.py", line 262, in entrypoint
    main(
  File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/scripts/train.py", line 247, in main
    launch(
  File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/scripts/train.py", line 189, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/scripts/train.py", line 100, in train_loop
    trainer.train()
  File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/engine/trainer.py", line 266, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/engine/trainer.py", line 502, in train_iteration
    _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
  File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 299, in get_train_loss_dict
    ray_bundle, batch = self.datamanager.next_train(step)
  File "/home/agenuinedream/repo/pixie/third_party/f3rm/f3rm/feature_datamanager.py", line 108, in next_train
    ray_bundle, batch = super().next_train(step)
  File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/data/datamanagers/base_datamanager.py", line 538, in next_train
    ray_bundle = self.train_ray_generator(ray_indices)
  File "/home/agenuinedream/lib/anaconda3/envs/pixie/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/agenuinedream/lib/anaconda3/envs/pixie/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/model_components/ray_generators.py", line 52, in forward
    ray_bundle = self.cameras.generate_rays(
  File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/cameras/cameras.py", line 423, in generate_rays
    if cameras.is_jagged and coords is None and (keep_shape is None or keep_shape is False):
  File "/home/agenuinedream/repo/pixie/third_party/nerfstudio/nerfstudio/cameras/cameras.py", line 289, in is_jagged
    h_jagged = not torch.all(self.height == self.height.view(-1)[0])
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

ERROR    [RUN] Stopping pipeline at step: TRAIN_F3RM_RERUN

May I ask:

  1. Is this pipeline generalized to different multi-view images with COLMAP cameras dataset?
  2. If yes, would you mind sharing how to solve this bug?

Thank you!
Leo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions