Skip to content

RuntimeError in pipeline_flux_controlnet._pack_latents for some image sizes (e.g. 640×480) #4

@zelenooki87

Description

@zelenooki87

Hi,
first of all, thanks for releasing this repo.
I’ve run into an issue where some input images fail to upscale and crash the script, while others work fine. A concrete example is a 640×480 image, which I would expect to be supported, but it triggers a shape/view error inside the pipeline_flux_controlnet.py code.


Command

accelerate launch test_flux_controlnet.py \
  --pretrained_model_name_or_path "black-forest-labs/FLUX.1-dev" \
  --controlnet_model_name_or_path "preset/models/dp2o-flux/model.safetensors" \
  --image_path "preset/test_inp" \
  --output_dir "preset/test_oup_dp2o_flux" \
  --align_method "adain" \
  --ram_path "preset/models/ram_swin_large_14m.pth" \
  --dape_path "preset/models/DAPE.pth" \
  --num_double_layers 4 \
  --num_single_layers 0 \
  --guidance_scale 2.5 \
  --num_inference_steps 25 \
  --mixed_precision "fp16"

The folder preset/test_inp contains several images. Some of them process correctly, but at least one of them fails:
• Failing example: preset/test_inp/Fotografija-0095.png (resolution: 640×480)


Error
This is the full traceback I get:

C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning) C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\timm\models\hub.py:4: FutureWarning: Importing from timm.models.hub is deprecated, please import via timm.models warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.models", FutureWarning) C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\timm\models\registry.py:4: FutureWarning: Importing from timm.models.registry is deprecated, please import via timm.models warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.models", FutureWarning) C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\timm\models\helpers.py:7: FutureWarning: Importing from timm.models.helpers is deprecated, please import via timm.models warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.models", FutureWarning) ===> preset/test_oup_dp2o_flux. You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:10<00:00, 5.10s/it] BertLMHeadModel has generative capabilities, as prepare_inputs_for_generationis explicitly overwritten. However, it doesn't directly inherit fromGenerationMixin. From 👉v4.50👈 onwards, PreTrainedModelwill NOT inherit fromGenerationMixin, and this model will lose the ability to call generate` and other related functions.

  • If you're using trust_remote_code=True, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
  • If you are the owner of the model architecture code, please modify your model class such that it inherits from GenerationMixin (after PreTrainedModel, otherwise you'll get an exception).
  • If you are not the owner of the model architecture class, please contact the model code owner to update it.
    The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use mean_resizing=False
    /encoder/layer/0/crossattention/self/query is tied
    /encoder/layer/0/crossattention/self/key is tied
    /encoder/layer/0/crossattention/self/value is tied
    /encoder/layer/0/crossattention/output/dense is tied
    /encoder/layer/0/crossattention/output/LayerNorm is tied
    /encoder/layer/0/intermediate/dense is tied
    /encoder/layer/0/output/dense is tied
    /encoder/layer/0/output/LayerNorm is tied
    /encoder/layer/1/crossattention/self/query is tied
    /encoder/layer/1/crossattention/self/key is tied
    /encoder/layer/1/crossattention/self/value is tied
    /encoder/layer/1/crossattention/output/dense is tied
    /encoder/layer/1/crossattention/output/LayerNorm is tied
    /encoder/layer/1/intermediate/dense is tied
    /encoder/layer/1/output/dense is tied
    /encoder/layer/1/output/LayerNorm is tied
    Loading default thretholds from .txt....

preset/models/ram_swin_large_14m.pth

load checkpoint from preset/models/ram_swin_large_14m.pth
vit: swin_l
load lora from preset/models/DAPE.pth
Found 6 images to process
Process 0, preset\test_inp\Fotografija-0095.png, tag: boy, catch, child, dress, hand, jumpsuit, pavement, red, stair, walk, wear, woman
Traceback (most recent call last):
File "C:\Users\miki\DP2O-SR\test_flux_controlnet.py", line 390, in
main(args)
File "C:\Users\miki\DP2O-SR\test_flux_controlnet.py", line 346, in main
image_result = pipeline(
^^^^^^^^^
File "C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\torch\utils_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\diffusers\pipelines\flux\pipeline_flux_controlnet.py", line 795, in call
control_image = self._pack_latents(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\diffusers\pipelines\flux\pipeline_flux_controlnet.py", line 486, in _pack_latents
latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[1, 16, 32, 2, 42, 2]' is invalid for input of size 87040
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Users\miki\miniconda3\envs\dp2osr\Scripts\accelerate.exe_main
.py", line 6, in
File "C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 50, in main
args.func(args)
File "C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\accelerate\commands\launch.py", line 1281, in launch_command
simple_launcher(args)
File "C:\Users\miki\miniconda3\envs\dp2osr\Lib\site-packages\accelerate\commands\launch.py", line 869, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\miki\miniconda3\envs\dp2osr\python.exe', 'test_flux_controlnet.py', '--pretrained_model_name_or_path', 'black-forest-labs/FLUX.1-dev', '--controlnet_model_name_or_path', 'preset/models/dp2o-flux/model.safetensors', '--image_path', 'preset/test_inp', '--output_dir', 'preset/test_oup_dp2o_flux', '--align_method', 'adain', '--ram_path', 'preset/models/ram_swin_large_14m.pth', '--dape_path', 'preset/models/DAPE.pth', '--num_double_layers', '4', '--num_single_layers', '0', '--guidance_scale', '2.5', '--num_inference_steps', '25', '--mixed_precision', 'fp16']' returned non-zero exit status 1.`

What I expected
• The script to upscale all images in the folder, including non-square ones such as 640×480.
• At minimum, for unsupported resolutions, a clear error message or automatic padding/resize, rather than a low-level view shape mismatch.


Actual behavior
• The run crashes on some images with a RuntimeError in _pack_latents (latents.view(...) shape is invalid for input size).
• As a result, batch processing stops as soon as it hits such an image.


Questions

  1. Are there any strict requirements or constraints on input image dimensions for the FLUX + ControlNet pipeline in this repo (e.g. must be multiples of a certain factor)?
  2. If so, could you clarify the expected valid resolutions in the README?
  3. If this is not expected, is there a recommended workaround (e.g. internal resizing, padding, or changing some pipeline parameter)?

Thanks in advance for taking a look!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions