FlowEdit: Fp16 error

Hi, i have tried to run the idu code on the JAX dataset, but i am runing into this error in the idu_refine.py in FlowEdit code:

`python train.py     -s ./data/datasets_JAX/JAX_068/     -m ./outputs/JAX_idu/JAX_068     --start_checkpoint ./outputs/JAX/JAX_068/chkpnt30000.pth     --iterative_datasets_update     --eval     --port 6209     --kernel_size 0.1     --resolution 1     --sh_degree 1     --appearance_enabled     --lambda_depth 0     --lambda_opacity 0     --idu_opacity_reset_interval 5000     --idu_refine     --idu_num_samples_per_view 2     --densify_grad_threshold 0.0002     --idu_num_cams 6     --idu_use_flow_edit     --idu_render_size 1024     --idu_flow_edit_n_min 4     --idu_flow_edit_n_max 10     --idu_grid_size 3     --idu_grid_width 512     --idu_grid_height 512     --idu_episode_iterations 10000     --idu_iter_full_train 0     --idu_opacity_cooling_iterations 500     --lambda_pseudo_depth 0.5     --idu_densify_until_iter 9000     --idu_train_ratio 0.75
Optimizing ./outputs/JAX_idu/JAX_068
===== IDU Params ===== [14/11 19:39:19]
Datasets Type: jax_v1 [14/11 19:39:19]
Radius List: [300.0, 275.0, 275.0, 250.0, 250.0] [14/11 19:39:19]
Elevation List: [85.0, 75.0, 65.0, 55.0, 45.0] [14/11 19:39:19]
FOV: 60.0 [14/11 19:39:19]
====================== [14/11 19:39:19]
Training IDU episode with elevation 85.0 and radius 300.0 [14/11 19:39:19]
# of IDU targets: 9 [14/11 19:39:19]
Loading model from checkpoint ./outputs/JAX/JAX_068/chkpnt30000.pth [14/11 19:39:19]
self._features_dc.shape torch.Size([1188016, 1, 3]) [14/11 19:39:19]
self._features_rest.shape torch.Size([1188016, 3, 3]) [14/11 19:39:19]
self._xyz.shape torch.Size([1188016, 3]) [14/11 19:39:19]
self._embeddings.shape torch.Size([1188016, 24]) [14/11 19:39:19]
./outputs/JAX/JAX_068 [14/11 19:39:19]
Loading trained model at iteration 30000 [14/11 19:39:19]
Found transforms_train.json and points3D.txt files, assuming multi-scale Satellite data set! [14/11 19:39:19]
Reading Training Transforms [14/11 19:39:19]
Reading Test Transforms [14/11 19:39:19]
Number of training images: 17 [14/11 19:39:19]
Number of testing images: 2 [14/11 19:39:19]
Converting point3D.txt to .ply, will happen only the first time you open the scene. [14/11 19:39:19]
No rotation matrix found, skipping normalization [14/11 19:39:19]
Nerf Normalization: {'translate': array([0., 0., 0.]), 'radius': 128.0} [14/11 19:39:19]
Loading point cloud from /home/jovyan/Skyfall-GS/data/datasets_JAX/JAX_068/points3D.ply [14/11 19:39:19]
Number of points in the point cloud: 69027 [14/11 19:39:19]
Loading Training Cameras [14/11 19:39:19]
optimizing: False [14/11 19:39:19]
Loading Test Cameras [14/11 19:39:24]
optimizing: False [14/11 19:39:24]
Loading point cloud from iteration 30000 [14/11 19:39:25]
Generated 108 IDU cameras [14/11 19:39:25]
optimizing: False [14/11 19:39:25]
IDU Rendering progress: 100%|██████████████████████████████████████████████████████████████████| 108/108 [00:02<00:00, 37.80it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.88s/it]
Loading pipeline components...:  29%|█████████████████▋                                            | 2/7 [00:44<01:59, 23.82s/it]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████| 7/7 [00:47<00:00,  6.72s/it]
Initialized FlowEdit with FLUX model. [14/11 19:41:46]
Refining images using FlowEdit with (min, max, avg) = (4, 10, 1):   0%|                                  | 0/108 [00:00<?, ?it/s]Processing image 1/108 with n_max = 10 [14/11 19:41:46]
Refining images using FlowEdit with (min, max, avg) = (4, 10, 1):   0%|                                  | 0/108 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/jovyan/Skyfall-GS/train.py", line 1139, in <module>
    training_idu(lp.extract(args), op.extract(args), pp.extract(args), args.start_checkpoint)
  File "/home/jovyan/Skyfall-GS/train.py", line 952, in training_idu
    start_checkpoint_path = training_idu_episode(
                            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/Skyfall-GS/train.py", line 610, in training_idu_episode
    idu_cam_list = generate_idu_training_set(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/Skyfall-GS/train.py", line 458, in generate_idu_training_set
    final_imgs = refine_pipe.run(
                 ^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/Skyfall-GS/submodules/FlowEdit/idu_refine.py", line 123, in run
    refine_img = self.run_single_image(img, src_prompt, tar_prompt, T_steps, n_avg, src_guidance_scale, tar_guidance_scale, n_min, current_n_max)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/Skyfall-GS/submodules/FlowEdit/idu_refine.py", line 83, in run_single_image
    x0_src_denorm = self.pipe.vae.encode(image_src).latent_dist.mode()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 271, in encode
    h = self.encoder(x)
        ^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/diffusers/models/autoencoders/vae.py", line 172, in forward
    sample = down_block(sample)
             ^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/diffusers/models/unets/unet_2d_blocks.py", line 1474, in forward
    hidden_states = resnet(hidden_states, temb=None)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/diffusers/models/resnet.py", line 327, in forward
    hidden_states = self.norm1(hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/normalization.py", line 287, in forward
    return F.group_norm(
           ^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/functional.py", line 2561, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":830, please report a bug to PyTorch. 
Pipelines loaded with `dtype=torch.float16` cannot run with `cpu` device. It is not recommended to move them to `cpu` as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for`float16` operations on this device in PyTorch. Please, remove the `torch_dtype=torch.float16` argument, or use another device for inference.
Pipelines loaded with `dtype=torch.float16` cannot run with `cpu` device. It is not recommended to move them to `cpu` as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for`float16` operations on this device in PyTorch. Please, remove the `torch_dtype=torch.float16` argument, or use another device for inference.
Pipelines loaded with `dtype=torch.float16` cannot run with `cpu` device. It is not recommended to move them to `cpu` as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for`float16` operations on this device in PyTorch. Please, remove the `torch_dtype=torch.float16` argument, or use another device for inference.
Pipelines loaded with `dtype=torch.float16` cannot run with `cpu` device. It is not recommended to move them to `cpu` as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for`float16` operations on this device in PyTorch. Please, remove the `torch_dtype=torch.float16` argument, or use another device for inference.`

i have tried to force every layer of pipe .to(cuda), and also tried with fp32 (ran out of Ram). i laso tried to add the dtype=torch.float16 in the with autocast code but nothing seem to work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlowEdit: Fp16 error #10

of IDU targets: 9 [14/11 19:39:19]

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

FlowEdit: Fp16 error #10

Description

of IDU targets: 9 [14/11 19:39:19]

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions