Skip to content

Logs for Windows #2

Description

@tin2tin

https://github.com/asomoza/diffusers-recipes/blob/main/models/ltx2_3/scripts/ltx23_two_stages_sdnq_distilled.py
(offloading to disk from the very beginning, in spite of plenty of unused vram)

Python: 3.13.9 (main, Jan 16 2026, 12:29:45) [MSC v.1944 64 bit (AMD64)]

Blender PIP user site: C:\Users\peter\AppData\Roaming\Python\Python313\site-packages
Adding site to path
00:58.156  blend            | Read blend: "C:\Users\peter\Downloads\ltx_test_.blend"
01:40.625  operator         | Saved "ltx_test_.blend"
02:39.312  operator         | Saved "ltx_test_.blend"
SDNQ: Triton is not available. Falling back to PyTorch Eager mode.
Using random seed: 157222063
VRAM baseline: 1.67 GB

──────────────────────────────────────────────────────────────────────
  Step 0: Encode prompts
──────────────────────────────────────────────────────────────────────
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 41.96it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.14it/s]
It seems like some layers were not executed during the forward pass. This may lead to problems when applying lazy prefetching with automatic tracing and lead to device-mismatch related errors. Please make sure that all layers are executed during the forward pass. The following layers were not executed:
unexecuted_layers=['model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.6.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.9.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.3.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.6.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.17.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.24.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.9.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.10.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.8.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.7.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.22.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.5.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.10.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.21.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.8.mlp.fc2', 'model.multi_modal_projector.mm_soft_emb_norm', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.23.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.19.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.24.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.22.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.26.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.2.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.14.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.16.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.13.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.9.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.25.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.11.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.26.layer_norm2', 'model.vision_tower.vision_model.embeddings.patch_embedding', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.18.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.17.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.11.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.12.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.13.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.14.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj', 'model.vision_tower.vision_model.embeddings.position_embedding', 'model.vision_tower.vision_model.encoder.layers.25.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.20.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj', 'model.vision_tower.vision_model.post_layernorm', 'model.vision_tower.vision_model.encoder.layers.4.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.6.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.13.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.16.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.1.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.21.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.11.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.3.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.18.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.23.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.4.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.17.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.7.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.0.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.26.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.21.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.8.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.7.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.1.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.20.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.18.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.24.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.0.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.15.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.15.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.18.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.12.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.21.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.2.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.5.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.19.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.1.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.25.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.19.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.16.layer_norm1', 'model.multi_modal_projector', 'model.vision_tower.vision_model.encoder.layers.17.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.19.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.10.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.3.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.7.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.12.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.1.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.5.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.3.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.9.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.2.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.4.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.12.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.0.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.22.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.15.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.23.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.8.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.2.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.14.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.5.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.15.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.16.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.25.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.13.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.26.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.20.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.6.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.11.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.23.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.10.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.14.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.22.mlp.fc2', 'model.vision_tower.vision_model.embeddings', 'model.vision_tower.vision_model.encoder.layers.0.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.20.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.4.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.24.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj']
  prompt_embeds: torch.Size([1, 1024, 188160])
  [Step 0: Encode prompts] 18.3s | Peak VRAM: 4.41 GB | Peak RAM: 10.13 GB

──────────────────────────────────────────────────────────────────────
  Stage 1: Generate at 768x512
──────────────────────────────────────────────────────────────────────
Fetching 2 files: 100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 4478.70it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:30<00:00, 15.21s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.98it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 6/6 [00:02<00:00,  2.02it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:08<00:00, 31.06s/it]
  Video latent: torch.Size([1, 128, 31, 16, 24])
  Audio latent: torch.Size([1, 8, 251, 16])
  [Stage 1: Generate at 768x512] 311.3s | Peak VRAM: 5.48 GB | Peak RAM: 15.16 GB

──────────────────────────────────────────────────────────────────────
  Spatial Upscale: 2x
──────────────────────────────────────────────────────────────────────
  Upscaled video latent: torch.Size([1, 128, 31, 32, 48])
  [Spatial Upscale: 2x] 3.6s | Peak VRAM: 4.19 GB | Peak RAM: 3.13 GB

──────────────────────────────────────────────────────────────────────
  Stage 2: Refine at 1536x1024
──────────────────────────────────────────────────────────────────────
Fetching 2 files: 100%|███████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 36314.32it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:17<00:00,  8.94s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 53.82it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 6/6 [00:01<00:00,  4.71it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [04:01<00:00, 80.46s/it]
  Stage 2 video latent: torch.Size([1, 128, 31, 32, 48])
  [Stage 2: Refine at 1536x1024] 284.9s | Peak VRAM: 12.24 GB | Peak RAM: 13.54 GB

──────────────────────────────────────────────────────────────────────
  Decode: Video (streaming) + Audio
──────────────────────────────────────────────────────────────────────
  [Decode: Video (streaming) + Audio] 124.6s | Peak VRAM: 20.83 GB | Peak RAM: 19.15 GB

──────────────────────────────────────────────────────────────────────
  Save output
──────────────────────────────────────────────────────────────────────
Encoding video chunks: 100%|█████████████████████████████████████████████████████████████| 1/1 [00:10<00:00, 10.16s/it]
  [Save output] 19.9s | Peak VRAM: 0.09 GB | Peak RAM: 14.00 GB

══════════════════════════════════════════════════════════════════════
  TOTAL: 764.2s | Peak VRAM: 20.83 GB | Peak RAM: 19.15 GB
  Output: C:/Users/peter/Downloads/outputs/ltx23/ltx23_two_stage_sdnq_distilled_s1_4bit_s2_4bit_1536x1024_10s_seed_157222063.mp4
══════════════════════════════════════════════════════════════════════


https://github.com/asomoza/diffusers-recipes/blob/main/models/ltx2_3/scripts/ltx23_ic_lora_sdnq.py


Using random seed: 793627925
Fetching 2 files: 100%|███████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 51781.53it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:13<00:00,  6.87s/it]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 8/8 [00:19<00:00,  2.40s/it]
It seems like some layers were not executed during the forward pass. This may lead to problems when applying lazy prefetching with automatic tracing and lead to device-mismatch related errors. Please make sure that all layers are executed during the forward pass. The following layers were not executed:
unexecuted_layers=['model.vision_tower.vision_model.encoder.layers.12.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.13.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj', 'model.vision_tower.vision_model.post_layernorm', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.26.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.10.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.25.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.12.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.5.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.10.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.6.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.5.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.18.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.5.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.20.mlp.fc2', 'model.multi_modal_projector', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.9.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.18.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.1.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.26.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.4.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.7.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.3.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.13.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.7.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.4.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.20.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.8.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.0.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.19.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.3.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.18.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.13.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.17.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.20.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.22.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.20.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.9.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.16.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.18.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.14.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.1.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.9.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.9.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.15.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.13.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.14.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.6.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.15.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.19.layer_norm1', 'model.vision_tower.vision_model.embeddings.patch_embedding', 'model.vision_tower.vision_model.encoder.layers.19.layer_norm2', 'model.vision_tower.vision_model.embeddings.position_embedding', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.7.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.12.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.25.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.23.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.25.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.6.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.16.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.22.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.22.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.26.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.1.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.10.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.17.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.11.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.6.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.2.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.3.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.11.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.22.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.21.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.1.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.8.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.2.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.23.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.11.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.23.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.16.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.0.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.14.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.19.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.25.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.15.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.0.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.8.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.2.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.21.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.8.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.14.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.11.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.24.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.17.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.24.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.23.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj', 'model.multi_modal_projector.mm_soft_emb_norm', 'model.vision_tower.vision_model.encoder.layers.4.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.2.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.7.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.24.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.21.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.0.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.16.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.5.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.15.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.21.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.3.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.4.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.10.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj', 'model.vision_tower.vision_model.embeddings', 'model.vision_tower.vision_model.encoder.layers.17.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.12.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.26.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.24.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj']
02:19.500  operator         | ERROR Python: Traceback (most recent call last):
                            |   File "C:\Users\peter\Downloads\ltx_test_.blend\Text.003", line 75, in <module>
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\utils\_contextlib.py", line 120, in decorate_context
                            |     return func(*args, **kwargs)
                            |   File "C:\Users\peter\Documents\blender-5.1.0-RC\5.1\python\lib\site-packages\pipeline_ltx2_multimodal.py", line 837, in __call__
                            |     control_tokens, control_coords = self.prepare_control_latents(
                            |                                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
                            |         control_video=control_video,
                            |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                            |     ...<9 lines>...
                            |         frame_rate=frame_rate,
                            |         ^^^^^^^^^^^^^^^^^^^^^^
                            |     )
                            |     ^
                            |   File "C:\Users\peter\Documents\blender-5.1.0-RC\5.1\python\lib\site-packages\pipeline_ltx2_multimodal.py", line 329, in prepare_control_latents
                            |     ref_latent = retrieve_latents(self.vae.encode(control_pixels), generator=generator, sample_mode="argmax")
                            |                                   ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\utils\accelerate_utils.py", line 46, in wrapper
                            |     return method(self, *args, **kwargs)
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_ltx2.py", line 1259, in encode
                            |     h = self._encode(x, causal=causal)
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_ltx2.py", line 1235, in _encode
                            |     enc = self.encoder(x, causal=causal)
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
                            |     return self._call_impl(*args, **kwargs)
                            |            ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
                            |     return forward_call(*args, **kwargs)
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_ltx2.py", line 827, in forward
                            |     hidden_states = down_block(hidden_states, causal=causal)
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
                            |     return self._call_impl(*args, **kwargs)
                            |            ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
                            |     return forward_call(*args, **kwargs)
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_ltx2.py", line 454, in forward
                            |     hidden_states = resnet(hidden_states, temb, generator, causal=causal)
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
                            |     return self._call_impl(*args, **kwargs)
                            |            ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
                            |     return forward_call(*args, **kwargs)
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_ltx2.py", line 204, in forward
                            |     hidden_states = self.conv1(hidden_states, causal=causal)
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
                            |     return self._call_impl(*args, **kwargs)
                            |            ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
                            |     return forward_call(*args, **kwargs)
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_ltx2.py", line 108, in forward
                            |     hidden_states = self.conv(hidden_states)
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
                            |     return self._call_impl(*args, **kwargs)
                            |            ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
                            |     return forward_call(*args, **kwargs)
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\hooks\hooks.py", line 189, in new_forward
                            |     output = function_reference.forward(*args, **kwargs)
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\hooks\hooks.py", line 189, in new_forward
                            |     output = function_reference.forward(*args, **kwargs)
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\conv.py", line 717, in forward
                            |     return self._conv_forward(input, self.weight, self.bias)
                            |            ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                            |   File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\conv.py", line 712, in _conv_forward
                            |     return F.conv3d(
                            |            ~~~~~~~~^
                            |         input, weight, bias, self.stride, self.padding, self.dilation, self.groups
                            |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                            |     )
                            |     ^
                            | torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 25.42 GiB. GPU 0 has a total capacity of 23.99 GiB of which 10.98 GiB is free. Of the allocated memory 4.54 GiB is allocated by PyTorch, and 6.73 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions