Python: 3.13.9 (main, Jan 16 2026, 12:29:45) [MSC v.1944 64 bit (AMD64)]
Blender PIP user site: C:\Users\peter\AppData\Roaming\Python\Python313\site-packages
Adding site to path
00:58.156 blend | Read blend: "C:\Users\peter\Downloads\ltx_test_.blend"
01:40.625 operator | Saved "ltx_test_.blend"
02:39.312 operator | Saved "ltx_test_.blend"
SDNQ: Triton is not available. Falling back to PyTorch Eager mode.
Using random seed: 157222063
VRAM baseline: 1.67 GB
──────────────────────────────────────────────────────────────────────
Step 0: Encode prompts
──────────────────────────────────────────────────────────────────────
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 41.96it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.14it/s]
It seems like some layers were not executed during the forward pass. This may lead to problems when applying lazy prefetching with automatic tracing and lead to device-mismatch related errors. Please make sure that all layers are executed during the forward pass. The following layers were not executed:
unexecuted_layers=['model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.6.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.9.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.3.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.6.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.17.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.24.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.9.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.10.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.8.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.7.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.22.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.5.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.10.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.21.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.8.mlp.fc2', 'model.multi_modal_projector.mm_soft_emb_norm', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.23.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.19.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.24.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.22.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.26.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.2.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.14.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.16.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.13.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.9.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.25.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.11.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.26.layer_norm2', 'model.vision_tower.vision_model.embeddings.patch_embedding', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.18.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.17.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.11.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.12.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.13.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.14.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj', 'model.vision_tower.vision_model.embeddings.position_embedding', 'model.vision_tower.vision_model.encoder.layers.25.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.20.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj', 'model.vision_tower.vision_model.post_layernorm', 'model.vision_tower.vision_model.encoder.layers.4.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.6.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.13.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.16.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.1.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.21.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.11.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.3.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.18.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.23.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.4.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.17.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.7.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.0.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.26.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.21.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.8.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.7.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.1.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.20.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.18.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.24.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.0.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.15.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.15.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.18.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.12.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.21.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.2.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.5.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.19.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.1.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.25.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.19.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.16.layer_norm1', 'model.multi_modal_projector', 'model.vision_tower.vision_model.encoder.layers.17.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.19.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.10.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.3.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.7.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.12.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.1.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.5.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.3.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.9.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.2.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.4.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.12.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.0.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.22.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.15.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.23.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.8.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.2.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.14.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.5.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.15.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.16.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.25.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.13.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.26.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.20.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.6.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.11.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.23.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.10.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.14.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.22.mlp.fc2', 'model.vision_tower.vision_model.embeddings', 'model.vision_tower.vision_model.encoder.layers.0.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.20.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.4.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.24.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj']
prompt_embeds: torch.Size([1, 1024, 188160])
[Step 0: Encode prompts] 18.3s | Peak VRAM: 4.41 GB | Peak RAM: 10.13 GB
──────────────────────────────────────────────────────────────────────
Stage 1: Generate at 768x512
──────────────────────────────────────────────────────────────────────
Fetching 2 files: 100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 4478.70it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:30<00:00, 15.21s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 4.98it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.02it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:08<00:00, 31.06s/it]
Video latent: torch.Size([1, 128, 31, 16, 24])
Audio latent: torch.Size([1, 8, 251, 16])
[Stage 1: Generate at 768x512] 311.3s | Peak VRAM: 5.48 GB | Peak RAM: 15.16 GB
──────────────────────────────────────────────────────────────────────
Spatial Upscale: 2x
──────────────────────────────────────────────────────────────────────
Upscaled video latent: torch.Size([1, 128, 31, 32, 48])
[Spatial Upscale: 2x] 3.6s | Peak VRAM: 4.19 GB | Peak RAM: 3.13 GB
──────────────────────────────────────────────────────────────────────
Stage 2: Refine at 1536x1024
──────────────────────────────────────────────────────────────────────
Fetching 2 files: 100%|███████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 36314.32it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:17<00:00, 8.94s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 53.82it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 6/6 [00:01<00:00, 4.71it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [04:01<00:00, 80.46s/it]
Stage 2 video latent: torch.Size([1, 128, 31, 32, 48])
[Stage 2: Refine at 1536x1024] 284.9s | Peak VRAM: 12.24 GB | Peak RAM: 13.54 GB
──────────────────────────────────────────────────────────────────────
Decode: Video (streaming) + Audio
──────────────────────────────────────────────────────────────────────
[Decode: Video (streaming) + Audio] 124.6s | Peak VRAM: 20.83 GB | Peak RAM: 19.15 GB
──────────────────────────────────────────────────────────────────────
Save output
──────────────────────────────────────────────────────────────────────
Encoding video chunks: 100%|█████████████████████████████████████████████████████████████| 1/1 [00:10<00:00, 10.16s/it]
[Save output] 19.9s | Peak VRAM: 0.09 GB | Peak RAM: 14.00 GB
══════════════════════════════════════════════════════════════════════
TOTAL: 764.2s | Peak VRAM: 20.83 GB | Peak RAM: 19.15 GB
Output: C:/Users/peter/Downloads/outputs/ltx23/ltx23_two_stage_sdnq_distilled_s1_4bit_s2_4bit_1536x1024_10s_seed_157222063.mp4
══════════════════════════════════════════════════════════════════════
Using random seed: 793627925
Fetching 2 files: 100%|███████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 51781.53it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:13<00:00, 6.87s/it]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 8/8 [00:19<00:00, 2.40s/it]
It seems like some layers were not executed during the forward pass. This may lead to problems when applying lazy prefetching with automatic tracing and lead to device-mismatch related errors. Please make sure that all layers are executed during the forward pass. The following layers were not executed:
unexecuted_layers=['model.vision_tower.vision_model.encoder.layers.12.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.13.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj', 'model.vision_tower.vision_model.post_layernorm', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.26.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.10.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.25.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.12.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.5.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.10.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.6.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.5.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.18.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.5.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.20.mlp.fc2', 'model.multi_modal_projector', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.9.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.18.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.1.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.26.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.4.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.7.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.3.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.13.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.7.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.4.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.20.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.8.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.0.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.19.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.3.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.18.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.13.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.17.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.20.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.22.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.20.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.9.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.16.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.18.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.14.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.1.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.9.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.9.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.15.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.13.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.14.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.6.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.15.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.19.layer_norm1', 'model.vision_tower.vision_model.embeddings.patch_embedding', 'model.vision_tower.vision_model.encoder.layers.19.layer_norm2', 'model.vision_tower.vision_model.embeddings.position_embedding', 'model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.7.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.12.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.25.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.23.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.25.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.6.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.16.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.22.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.22.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.26.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.1.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.10.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.17.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.11.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.6.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.2.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.3.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.11.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.22.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.21.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.1.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.8.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.2.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.23.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.11.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.23.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.16.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.0.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.14.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.19.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.25.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.15.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.0.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.8.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.2.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.21.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.8.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.14.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.11.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.24.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.17.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.24.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.23.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj', 'model.multi_modal_projector.mm_soft_emb_norm', 'model.vision_tower.vision_model.encoder.layers.4.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.2.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.7.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.24.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.21.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.0.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.16.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.5.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.15.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.21.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.3.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.4.mlp.fc1', 'model.vision_tower.vision_model.encoder.layers.10.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj', 'model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj', 'model.vision_tower.vision_model.embeddings', 'model.vision_tower.vision_model.encoder.layers.17.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.12.layer_norm1', 'model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj', 'model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.26.layer_norm2', 'model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj', 'model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj', 'model.vision_tower.vision_model.encoder.layers.24.mlp.fc2', 'model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj']
02:19.500 operator | ERROR Python: Traceback (most recent call last):
| File "C:\Users\peter\Downloads\ltx_test_.blend\Text.003", line 75, in <module>
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\utils\_contextlib.py", line 120, in decorate_context
| return func(*args, **kwargs)
| File "C:\Users\peter\Documents\blender-5.1.0-RC\5.1\python\lib\site-packages\pipeline_ltx2_multimodal.py", line 837, in __call__
| control_tokens, control_coords = self.prepare_control_latents(
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
| control_video=control_video,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| ...<9 lines>...
| frame_rate=frame_rate,
| ^^^^^^^^^^^^^^^^^^^^^^
| )
| ^
| File "C:\Users\peter\Documents\blender-5.1.0-RC\5.1\python\lib\site-packages\pipeline_ltx2_multimodal.py", line 329, in prepare_control_latents
| ref_latent = retrieve_latents(self.vae.encode(control_pixels), generator=generator, sample_mode="argmax")
| ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\utils\accelerate_utils.py", line 46, in wrapper
| return method(self, *args, **kwargs)
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_ltx2.py", line 1259, in encode
| h = self._encode(x, causal=causal)
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_ltx2.py", line 1235, in _encode
| enc = self.encoder(x, causal=causal)
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
| return forward_call(*args, **kwargs)
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_ltx2.py", line 827, in forward
| hidden_states = down_block(hidden_states, causal=causal)
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
| return forward_call(*args, **kwargs)
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_ltx2.py", line 454, in forward
| hidden_states = resnet(hidden_states, temb, generator, causal=causal)
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
| return forward_call(*args, **kwargs)
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_ltx2.py", line 204, in forward
| hidden_states = self.conv1(hidden_states, causal=causal)
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
| return forward_call(*args, **kwargs)
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_ltx2.py", line 108, in forward
| hidden_states = self.conv(hidden_states)
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
| return forward_call(*args, **kwargs)
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\hooks\hooks.py", line 189, in new_forward
| output = function_reference.forward(*args, **kwargs)
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\diffusers\hooks\hooks.py", line 189, in new_forward
| output = function_reference.forward(*args, **kwargs)
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\conv.py", line 717, in forward
| return self._conv_forward(input, self.weight, self.bias)
| ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "C:\Program Files\Blender Foundation\Blender 5.1\5.1\python\Lib\site-packages\torch\nn\modules\conv.py", line 712, in _conv_forward
| return F.conv3d(
| ~~~~~~~~^
| input, weight, bias, self.stride, self.padding, self.dilation, self.groups
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| )
| ^
| torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 25.42 GiB. GPU 0 has a total capacity of 23.99 GiB of which 10.98 GiB is free. Of the allocated memory 4.54 GiB is allocated by PyTorch, and 6.73 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
https://github.com/asomoza/diffusers-recipes/blob/main/models/ltx2_3/scripts/ltx23_two_stages_sdnq_distilled.py
(offloading to disk from the very beginning, in spite of plenty of unused vram)
https://github.com/asomoza/diffusers-recipes/blob/main/models/ltx2_3/scripts/ltx23_ic_lora_sdnq.py