RuntimeError in pipeline_flux_controlnet._pack_latents for some image sizes (e.g. 640×480)

Hi,
first of all, thanks for releasing this repo.
I’ve run into an issue where some input images fail to upscale and crash the script, while others work fine. A concrete example is a 640×480 image, which I would expect to be supported, but it triggers a shape/view error inside the pipeline_flux_controlnet.py code.
________________________________________
Command

```
accelerate launch test_flux_controlnet.py \
  --pretrained_model_name_or_path "black-forest-labs/FLUX.1-dev" \
  --controlnet_model_name_or_path "preset/models/dp2o-flux/model.safetensors" \
  --image_path "preset/test_inp" \
  --output_dir "preset/test_oup_dp2o_flux" \
  --align_method "adain" \
  --ram_path "preset/models/ram_swin_large_14m.pth" \
  --dape_path "preset/models/DAPE.pth" \
  --num_double_layers 4 \
  --num_single_layers 0 \
  --guidance_scale 2.5 \
  --num_inference_steps 25 \
  --mixed_precision "fp16"
```

The folder preset/test_inp contains several images. Some of them process correctly, but at least one of them fails:
•	Failing example: preset/test_inp/Fotografija-0095.png (resolution: 640×480)
________________________________________
Error
This is the full traceback I get:

`C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\timm\models\hub.py:4: FutureWarning: Importing from timm.models.hub is deprecated, please import via timm.models
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.models", FutureWarning)
C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\timm\models\registry.py:4: FutureWarning: Importing from timm.models.registry is deprecated, please import via timm.models
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.models", FutureWarning)
C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\timm\models\helpers.py:7: FutureWarning: Importing from timm.models.helpers is deprecated, please import via timm.models
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.models", FutureWarning)
===> preset/test_oup_dp2o_flux.
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:10<00:00,  5.10s/it]
BertLMHeadModel has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
/encoder/layer/0/crossattention/self/query is tied
/encoder/layer/0/crossattention/self/key is tied
/encoder/layer/0/crossattention/self/value is tied
/encoder/layer/0/crossattention/output/dense is tied
/encoder/layer/0/crossattention/output/LayerNorm is tied
/encoder/layer/0/intermediate/dense is tied
/encoder/layer/0/output/dense is tied
/encoder/layer/0/output/LayerNorm is tied
/encoder/layer/1/crossattention/self/query is tied
/encoder/layer/1/crossattention/self/key is tied
/encoder/layer/1/crossattention/self/value is tied
/encoder/layer/1/crossattention/output/dense is tied
/encoder/layer/1/crossattention/output/LayerNorm is tied
/encoder/layer/1/intermediate/dense is tied
/encoder/layer/1/output/dense is tied
/encoder/layer/1/output/LayerNorm is tied
Loading default thretholds from .txt....
--------------
preset/models/ram_swin_large_14m.pth
--------------
load checkpoint from preset/models/ram_swin_large_14m.pth
vit: swin_l
load lora from preset/models/DAPE.pth
Found 6 images to process
Process 0, preset\test_inp\Fotografija-0095.png, tag: boy, catch, child, dress, hand, jumpsuit, pavement, red, stair, walk, wear, woman
Traceback (most recent call last):
  File "C:\Users\miki\DP2O-SR\test_flux_controlnet.py", line 390, in <module>
    main(args)
  File "C:\Users\miki\DP2O-SR\test_flux_controlnet.py", line 346, in main
    image_result = pipeline(
                   ^^^^^^^^^
  File "C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\torch\utils\_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\diffusers\pipelines\flux\pipeline_flux_controlnet.py", line 795, in __call__
    control_image = self._pack_latents(
                    ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\diffusers\pipelines\flux\pipeline_flux_controlnet.py", line 486, in _pack_latents
    latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[1, 16, 32, 2, 42, 2]' is invalid for input of size 87040
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\miki\miniconda3\envs\dp2osr\Scripts\accelerate.exe\__main__.py", line 6, in <module>
  File "C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 50, in main
    args.func(args)
  File "C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\accelerate\commands\launch.py", line 1281, in launch_command
    simple_launcher(args)
  File "C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\accelerate\commands\launch.py", line 869, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\miki\\miniconda3\\envs\\dp2osr\\python.exe', 'test_flux_controlnet.py', '--pretrained_model_name_or_path', 'black-forest-labs/FLUX.1-dev', '--controlnet_model_name_or_path', 'preset/models/dp2o-flux/model.safetensors', '--image_path', 'preset/test_inp', '--output_dir', 'preset/test_oup_dp2o_flux', '--align_method', 'adain', '--ram_path', 'preset/models/ram_swin_large_14m.pth', '--dape_path', 'preset/models/DAPE.pth', '--num_double_layers', '4', '--num_single_layers', '0', '--guidance_scale', '2.5', '--num_inference_steps', '25', '--mixed_precision', 'fp16']' returned non-zero exit status 1.`

What I expected
•	The script to upscale all images in the folder, including non-square ones such as 640×480.
•	At minimum, for unsupported resolutions, a clear error message or automatic padding/resize, rather than a low-level view shape mismatch.
________________________________________
Actual behavior
•	The run crashes on some images with a RuntimeError in _pack_latents (latents.view(...) shape is invalid for input size).
•	As a result, batch processing stops as soon as it hits such an image.
________________________________________
Questions
1.	Are there any strict requirements or constraints on input image dimensions for the FLUX + ControlNet pipeline in this repo (e.g. must be multiples of a certain factor)?
2.	If so, could you clarify the expected valid resolutions in the README?
3.	If this is not expected, is there a recommended workaround (e.g. internal resizing, padding, or changing some pipeline parameter)?
________________________________________
Thanks in advance for taking a look!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError in pipeline_flux_controlnet._pack_latents for some image sizes (e.g. 640×480) #4

preset/models/ram_swin_large_14m.pth

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RuntimeError in pipeline_flux_controlnet._pack_latents for some image sizes (e.g. 640×480) #4

Description

preset/models/ram_swin_large_14m.pth

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions