Hi, i have tried to run the idu code on the JAX dataset, but i am runing into this error in the idu_refine.py in FlowEdit code:
`python train.py -s ./data/datasets_JAX/JAX_068/ -m ./outputs/JAX_idu/JAX_068 --start_checkpoint ./outputs/JAX/JAX_068/chkpnt30000.pth --iterative_datasets_update --eval --port 6209 --kernel_size 0.1 --resolution 1 --sh_degree 1 --appearance_enabled --lambda_depth 0 --lambda_opacity 0 --idu_opacity_reset_interval 5000 --idu_refine --idu_num_samples_per_view 2 --densify_grad_threshold 0.0002 --idu_num_cams 6 --idu_use_flow_edit --idu_render_size 1024 --idu_flow_edit_n_min 4 --idu_flow_edit_n_max 10 --idu_grid_size 3 --idu_grid_width 512 --idu_grid_height 512 --idu_episode_iterations 10000 --idu_iter_full_train 0 --idu_opacity_cooling_iterations 500 --lambda_pseudo_depth 0.5 --idu_densify_until_iter 9000 --idu_train_ratio 0.75
Optimizing ./outputs/JAX_idu/JAX_068
===== IDU Params ===== [14/11 19:39:19]
Datasets Type: jax_v1 [14/11 19:39:19]
Radius List: [300.0, 275.0, 275.0, 250.0, 250.0] [14/11 19:39:19]
Elevation List: [85.0, 75.0, 65.0, 55.0, 45.0] [14/11 19:39:19]
FOV: 60.0 [14/11 19:39:19]
====================== [14/11 19:39:19]
Training IDU episode with elevation 85.0 and radius 300.0 [14/11 19:39:19]
of IDU targets: 9 [14/11 19:39:19]
Loading model from checkpoint ./outputs/JAX/JAX_068/chkpnt30000.pth [14/11 19:39:19]
self._features_dc.shape torch.Size([1188016, 1, 3]) [14/11 19:39:19]
self._features_rest.shape torch.Size([1188016, 3, 3]) [14/11 19:39:19]
self._xyz.shape torch.Size([1188016, 3]) [14/11 19:39:19]
self._embeddings.shape torch.Size([1188016, 24]) [14/11 19:39:19]
./outputs/JAX/JAX_068 [14/11 19:39:19]
Loading trained model at iteration 30000 [14/11 19:39:19]
Found transforms_train.json and points3D.txt files, assuming multi-scale Satellite data set! [14/11 19:39:19]
Reading Training Transforms [14/11 19:39:19]
Reading Test Transforms [14/11 19:39:19]
Number of training images: 17 [14/11 19:39:19]
Number of testing images: 2 [14/11 19:39:19]
Converting point3D.txt to .ply, will happen only the first time you open the scene. [14/11 19:39:19]
No rotation matrix found, skipping normalization [14/11 19:39:19]
Nerf Normalization: {'translate': array([0., 0., 0.]), 'radius': 128.0} [14/11 19:39:19]
Loading point cloud from /home/jovyan/Skyfall-GS/data/datasets_JAX/JAX_068/points3D.ply [14/11 19:39:19]
Number of points in the point cloud: 69027 [14/11 19:39:19]
Loading Training Cameras [14/11 19:39:19]
optimizing: False [14/11 19:39:19]
Loading Test Cameras [14/11 19:39:24]
optimizing: False [14/11 19:39:24]
Loading point cloud from iteration 30000 [14/11 19:39:25]
Generated 108 IDU cameras [14/11 19:39:25]
optimizing: False [14/11 19:39:25]
IDU Rendering progress: 100%|██████████████████████████████████████████████████████████████████| 108/108 [00:02<00:00, 37.80it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00, 5.88s/it]
Loading pipeline components...: 29%|█████████████████▋ | 2/7 [00:44<01:59, 23.82s/it]You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████| 7/7 [00:47<00:00, 6.72s/it]
Initialized FlowEdit with FLUX model. [14/11 19:41:46]
Refining images using FlowEdit with (min, max, avg) = (4, 10, 1): 0%| | 0/108 [00:00<?, ?it/s]Processing image 1/108 with n_max = 10 [14/11 19:41:46]
Refining images using FlowEdit with (min, max, avg) = (4, 10, 1): 0%| | 0/108 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/jovyan/Skyfall-GS/train.py", line 1139, in
training_idu(lp.extract(args), op.extract(args), pp.extract(args), args.start_checkpoint)
File "/home/jovyan/Skyfall-GS/train.py", line 952, in training_idu
start_checkpoint_path = training_idu_episode(
^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/Skyfall-GS/train.py", line 610, in training_idu_episode
idu_cam_list = generate_idu_training_set(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/Skyfall-GS/train.py", line 458, in generate_idu_training_set
final_imgs = refine_pipe.run(
^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/Skyfall-GS/submodules/FlowEdit/idu_refine.py", line 123, in run
refine_img = self.run_single_image(img, src_prompt, tar_prompt, T_steps, n_avg, src_guidance_scale, tar_guidance_scale, n_min, current_n_max)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/Skyfall-GS/submodules/FlowEdit/idu_refine.py", line 83, in run_single_image
x0_src_denorm = self.pipe.vae.encode(image_src).latent_dist.mode()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 271, in encode
h = self.encoder(x)
^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/diffusers/models/autoencoders/vae.py", line 172, in forward
sample = down_block(sample)
^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/diffusers/models/unets/unet_2d_blocks.py", line 1474, in forward
hidden_states = resnet(hidden_states, temb=None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/diffusers/models/resnet.py", line 327, in forward
hidden_states = self.norm1(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/normalization.py", line 287, in forward
return F.group_norm(
^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/functional.py", line 2561, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":830, please report a bug to PyTorch.
Pipelines loaded with dtype=torch.float16 cannot run with cpu device. It is not recommended to move them to cpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16 operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16 argument, or use another device for inference.
Pipelines loaded with dtype=torch.float16 cannot run with cpu device. It is not recommended to move them to cpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16 operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16 argument, or use another device for inference.
Pipelines loaded with dtype=torch.float16 cannot run with cpu device. It is not recommended to move them to cpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16 operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16 argument, or use another device for inference.
Pipelines loaded with dtype=torch.float16 cannot run with cpu device. It is not recommended to move them to cpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16 operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16 argument, or use another device for inference.`
i have tried to force every layer of pipe .to(cuda), and also tried with fp32 (ran out of Ram). i laso tried to add the dtype=torch.float16 in the with autocast code but nothing seem to work.
Hi, i have tried to run the idu code on the JAX dataset, but i am runing into this error in the idu_refine.py in FlowEdit code:
`python train.py -s ./data/datasets_JAX/JAX_068/ -m ./outputs/JAX_idu/JAX_068 --start_checkpoint ./outputs/JAX/JAX_068/chkpnt30000.pth --iterative_datasets_update --eval --port 6209 --kernel_size 0.1 --resolution 1 --sh_degree 1 --appearance_enabled --lambda_depth 0 --lambda_opacity 0 --idu_opacity_reset_interval 5000 --idu_refine --idu_num_samples_per_view 2 --densify_grad_threshold 0.0002 --idu_num_cams 6 --idu_use_flow_edit --idu_render_size 1024 --idu_flow_edit_n_min 4 --idu_flow_edit_n_max 10 --idu_grid_size 3 --idu_grid_width 512 --idu_grid_height 512 --idu_episode_iterations 10000 --idu_iter_full_train 0 --idu_opacity_cooling_iterations 500 --lambda_pseudo_depth 0.5 --idu_densify_until_iter 9000 --idu_train_ratio 0.75
Optimizing ./outputs/JAX_idu/JAX_068
===== IDU Params ===== [14/11 19:39:19]
Datasets Type: jax_v1 [14/11 19:39:19]
Radius List: [300.0, 275.0, 275.0, 250.0, 250.0] [14/11 19:39:19]
Elevation List: [85.0, 75.0, 65.0, 55.0, 45.0] [14/11 19:39:19]
FOV: 60.0 [14/11 19:39:19]
====================== [14/11 19:39:19]
Training IDU episode with elevation 85.0 and radius 300.0 [14/11 19:39:19]
of IDU targets: 9 [14/11 19:39:19]
Loading model from checkpoint ./outputs/JAX/JAX_068/chkpnt30000.pth [14/11 19:39:19]
self._features_dc.shape torch.Size([1188016, 1, 3]) [14/11 19:39:19]
self._features_rest.shape torch.Size([1188016, 3, 3]) [14/11 19:39:19]
self._xyz.shape torch.Size([1188016, 3]) [14/11 19:39:19]
self._embeddings.shape torch.Size([1188016, 24]) [14/11 19:39:19]
./outputs/JAX/JAX_068 [14/11 19:39:19]
Loading trained model at iteration 30000 [14/11 19:39:19]
Found transforms_train.json and points3D.txt files, assuming multi-scale Satellite data set! [14/11 19:39:19]
Reading Training Transforms [14/11 19:39:19]
Reading Test Transforms [14/11 19:39:19]
Number of training images: 17 [14/11 19:39:19]
Number of testing images: 2 [14/11 19:39:19]
Converting point3D.txt to .ply, will happen only the first time you open the scene. [14/11 19:39:19]
No rotation matrix found, skipping normalization [14/11 19:39:19]
Nerf Normalization: {'translate': array([0., 0., 0.]), 'radius': 128.0} [14/11 19:39:19]
Loading point cloud from /home/jovyan/Skyfall-GS/data/datasets_JAX/JAX_068/points3D.ply [14/11 19:39:19]
Number of points in the point cloud: 69027 [14/11 19:39:19]
Loading Training Cameras [14/11 19:39:19]
optimizing: False [14/11 19:39:19]
Loading Test Cameras [14/11 19:39:24]
optimizing: False [14/11 19:39:24]
Loading point cloud from iteration 30000 [14/11 19:39:25]
Generated 108 IDU cameras [14/11 19:39:25]
optimizing: False [14/11 19:39:25]
IDU Rendering progress: 100%|██████████████████████████████████████████████████████████████████| 108/108 [00:02<00:00, 37.80it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00, 5.88s/it]
Loading pipeline components...: 29%|█████████████████▋ | 2/7 [00:44<01:59, 23.82s/it]You set
add_prefix_space. The tokenizer needs to be converted from the slow tokenizersLoading pipeline components...: 100%|██████████████████████████████████████████████████████████████| 7/7 [00:47<00:00, 6.72s/it]
Initialized FlowEdit with FLUX model. [14/11 19:41:46]
Refining images using FlowEdit with (min, max, avg) = (4, 10, 1): 0%| | 0/108 [00:00<?, ?it/s]Processing image 1/108 with n_max = 10 [14/11 19:41:46]
Refining images using FlowEdit with (min, max, avg) = (4, 10, 1): 0%| | 0/108 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/jovyan/Skyfall-GS/train.py", line 1139, in
training_idu(lp.extract(args), op.extract(args), pp.extract(args), args.start_checkpoint)
File "/home/jovyan/Skyfall-GS/train.py", line 952, in training_idu
start_checkpoint_path = training_idu_episode(
^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/Skyfall-GS/train.py", line 610, in training_idu_episode
idu_cam_list = generate_idu_training_set(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/Skyfall-GS/train.py", line 458, in generate_idu_training_set
final_imgs = refine_pipe.run(
^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/Skyfall-GS/submodules/FlowEdit/idu_refine.py", line 123, in run
refine_img = self.run_single_image(img, src_prompt, tar_prompt, T_steps, n_avg, src_guidance_scale, tar_guidance_scale, n_min, current_n_max)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/Skyfall-GS/submodules/FlowEdit/idu_refine.py", line 83, in run_single_image
x0_src_denorm = self.pipe.vae.encode(image_src).latent_dist.mode()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 271, in encode
h = self.encoder(x)
^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/diffusers/models/autoencoders/vae.py", line 172, in forward
sample = down_block(sample)
^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/diffusers/models/unets/unet_2d_blocks.py", line 1474, in forward
hidden_states = resnet(hidden_states, temb=None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/diffusers/models/resnet.py", line 327, in forward
hidden_states = self.norm1(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/normalization.py", line 287, in forward
return F.group_norm(
^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/functional.py", line 2561, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":830, please report a bug to PyTorch.
Pipelines loaded with
dtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpuas running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16argument, or use another device for inference.Pipelines loaded with
dtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpuas running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16argument, or use another device for inference.Pipelines loaded with
dtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpuas running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16argument, or use another device for inference.Pipelines loaded with
dtype=torch.float16cannot run withcpudevice. It is not recommended to move them tocpuas running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16operations on this device in PyTorch. Please, remove thetorch_dtype=torch.float16argument, or use another device for inference.`i have tried to force every layer of pipe .to(cuda), and also tried with fp32 (ran out of Ram). i laso tried to add the dtype=torch.float16 in the with autocast code but nothing seem to work.