Input contains (near) NaN/+-Inf, LTX-2.3, ComfyUI

Hi! I am deploy Flash Attention 2 adaptation for Tesla V100/ Titan V and from time to time somebody open issues cases related with ComfyUI and thinking sometimes that exactly my code causes.

So, last one was https://github.com/ai-bond/flash-attention-v100/issues/36 exactly about ` Input contains (near) NaN/+-Inf, LTX-2.3, ComfyUI`

User told :

<details>
<Summary>With V100 attention backend - code caused error</Summary>

````
[INFO] Using Flash Attention
aimdo: /project/src-posix/cuda-funchooks.c:52:DEBUG:aimdo_setup_hooks: hooks successfully installed
aimdo: /project/src/control.c:240:INFO:comfy-aimdo inited for GPU: Tesla V100-SXM2-16GB (VRAM: 16144 MB)
[INFO] DynamicVRAM support detected and enabled
[INFO] Python version: 3.12.3 (main, Mar 23 2026, 19:04:32) [GCC 13.3.0]
[INFO] ComfyUI version: 0.24.0
[INFO] comfy-aimdo version: 0.4.9
[INFO] comfy-kitchen version: 0.2.10
[INFO] comfyui-frontend-package version: 1.45.15
[INFO] comfyui-workflow-templates version: 0.9.98
[INFO] comfyui-embedded-docs version: 0.5.3
[INFO] comfy-kitchen version: 0.2.10
[INFO] comfy-aimdo version: 0.4.9
[INFO] [Prompt Server] web root: /mnt/qdata/comfy-env/lib/python3.12/site-packages/comfyui_frontend_package/static
[INFO] Asset seeder disabled
[INFO]
Import times for custom nodes:
[INFO]    0.0 seconds: /mnt/qdata/ComfyUI/custom_nodes/websocket_image_save.py
[INFO]
[INFO] Context impl SQLiteImpl.
[INFO] Will assume non-transactional DDL.
[INFO] Using RAM pressure cache.
[INFO] Starting server
[INFO] To see the GUI go to: http://192.168.2.1:8080
[INFO] got prompt
[INFO] VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
[INFO] model weight dtype torch.float16, manual cast: None
[INFO] model_type FLUX
[INFO] VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
[WARNING] no CLIP/text encoder weights in checkpoint, the text encoder model will not be loaded.
[INFO] Requested to load VideoVAE
[INFO] Model VideoVAE prepared for dynamic VRAM loading. 2769MB Staged. 0 patches attached.
[INFO] Found quantization metadata version 1
[INFO] Using MixedPrecisionOps for text encoder
[INFO] CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
[INFO] Requested to load LTXAVTEModel_
[INFO] Model LTXAVTEModel_ prepared for dynamic VRAM loading. 11200MB Staged. 0 patches attached. Force pre-loaded 400 weights: 1749 KB.
[INFO] Model LTXAVTEModel_ prepared for dynamic VRAM loading. 11200MB Staged. 0 patches attached. Force pre-loaded 400 weights: 1749 KB.
[INFO] Requested to load LTXAV
[INFO] Model LTXAV prepared for dynamic VRAM loading. 40050MB Staged. 1660 patches attached. Force pre-loaded 608 weights: 3303 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:35<00:00,  4.40s/it]
[INFO] Model VideoVAE prepared for dynamic VRAM loading. 2769MB Staged. 0 patches attached.
[INFO] Model LTXAV prepared for dynamic VRAM loading. 40050MB Staged. 1660 patches attached. Force pre-loaded 608 weights: 3303 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:19<00:00, 26.66s/it]
[INFO] Requested to load AudioVAE
[INFO] loaded completely;  693.46 MB loaded, full load: True
[INFO] 0 models unloaded.
[INFO] Model VideoVAE prepared for dynamic VRAM loading. 2769MB Staged. 0 patches attached.
[ERROR] Input contains (near) NaN/+-Inf
[ERROR] !!! Exception during processing !!! Invalid argument: 'avcodec_send_frame()' returned 22; last error log: [aac] Input contains (near) NaN/+-Inf
[ERROR] Traceback (most recent call last):
  File "/mnt/qdata/ComfyUI/execution.py", line 542, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/qdata/ComfyUI/execution.py", line 341, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/qdata/ComfyUI/execution.py", line 315, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "/mnt/qdata/ComfyUI/execution.py", line 303, in process_inputs
    result = f(**inputs)
             ^^^^^^^^^^^
  File "/mnt/qdata/ComfyUI/comfy_api/internal/__init__.py", line 149, in wrapped_func
    return method(locked_class, **inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/qdata/ComfyUI/comfy_api/latest/_io.py", line 1900, in EXECUTE_NORMALIZED
    to_return = cls.execute(*args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/qdata/ComfyUI/comfy_extras/nodes_video.py", line 113, in execute
    video.save_to(
  File "/mnt/qdata/ComfyUI/comfy_api/latest/_input_impl/video_types.py", line 558, in save_to
    output.mux(audio_stream.encode(frame))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "av/audio/stream.py", line 25, in av.audio.stream.AudioStream.encode
    @cython.ccall
    ^^^^^^^^^^^
  File "av/audio/stream.py", line 35, in av.audio.stream.AudioStream.encode
    packets = self.codec_context.encode(frame)
    ^^^^^^^^^^^
  File "av/codec/context.py", line 449, in av.codec.context.CodecContext.encode
    for packet in self._send_frame_and_recv(frame):
    ^^^^^^^^^^^
  File "av/codec/context.py", line 372, in _send_frame_and_recv
    err_check(res, "avcodec_send_frame()")
    ^^^^^^^^^^^
  File "av/error.py", line 412, in av.error.err_check
    raise cls(code, message, filename, log)
    ^^^^^^^^^^^
av.error.ArgumentError: Invalid argument: 'avcodec_send_frame()' returned 22; last error log: [aac] Input contains (near) NaN/+-Inf
````
</details>

But if he used pytorch attention - in this point generation is succesfull.
<details>
<Summary>Succesfull pytorch attention generation log</Summary>

````
[INFO] Using pytorch attention
aimdo: /project/src-posix/cuda-funchooks.c:52:DEBUG:aimdo_setup_hooks: hooks successfully installed
aimdo: /project/src/control.c:240:INFO:comfy-aimdo inited for GPU: Tesla V100-SXM2-16GB (VRAM: 16144 MB)
[INFO] DynamicVRAM support detected and enabled
[INFO] Python version: 3.12.3 (main, Mar 23 2026, 19:04:32) [GCC 13.3.0]
[INFO] ComfyUI version: 0.24.0
[INFO] comfy-aimdo version: 0.4.9
[INFO] comfy-kitchen version: 0.2.10
[INFO] comfyui-frontend-package version: 1.45.15
[INFO] comfyui-workflow-templates version: 0.9.98
[INFO] comfyui-embedded-docs version: 0.5.3
[INFO] comfy-kitchen version: 0.2.10
[INFO] comfy-aimdo version: 0.4.9
[INFO] [Prompt Server] web root: /mnt/qdata/comfy-env/lib/python3.12/site-packages/comfyui_frontend_package/static
[INFO] Asset seeder disabled
[INFO]
Import times for custom nodes:
[INFO]    0.0 seconds: /mnt/qdata/ComfyUI/custom_nodes/websocket_image_save.py
[INFO]
[INFO] Context impl SQLiteImpl.
[INFO] Will assume non-transactional DDL.
[INFO] Using RAM pressure cache.
[INFO] Starting server

[INFO] To see the GUI go to: http://192.168.2.1:8080
[INFO] got prompt
[INFO] VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
[INFO] model weight dtype torch.float16, manual cast: None
[INFO] model_type FLUX
[INFO] VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
[WARNING] no CLIP/text encoder weights in checkpoint, the text encoder model will not be loaded.
[INFO] Requested to load VideoVAE
[INFO] Model VideoVAE prepared for dynamic VRAM loading. 2769MB Staged. 0 patches attached.
[INFO] Found quantization metadata version 1
[INFO] Using MixedPrecisionOps for text encoder
[INFO] CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
[INFO] Requested to load LTXAVTEModel_
[INFO] Model LTXAVTEModel_ prepared for dynamic VRAM loading. 11200MB Staged. 0 patches attached. Force pre-loaded 400 weights: 1749 KB.
[INFO] Model LTXAVTEModel_ prepared for dynamic VRAM loading. 11200MB Staged. 0 patches attached. Force pre-loaded 400 weights: 1749 KB.
[INFO] Requested to load LTXAV
[INFO] Model LTXAV prepared for dynamic VRAM loading. 40050MB Staged. 1660 patches attached. Force pre-loaded 608 weights: 3303 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:32<00:00,  4.02s/it]
[INFO] Model VideoVAE prepared for dynamic VRAM loading. 2769MB Staged. 0 patches attached.
[INFO] Model LTXAV prepared for dynamic VRAM loading. 40050MB Staged. 1660 patches attached. Force pre-loaded 608 weights: 3303 KB.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:05<00:00, 21.68s/it]
[INFO] Requested to load AudioVAE
[INFO] loaded completely;  693.46 MB loaded, full load: True
[INFO] 0 models unloaded.
[INFO] Model VideoVAE prepared for dynamic VRAM loading. 2769MB Staged. 0 patches attached.
[INFO] Prompt executed in 189.31 seconds
````
</details>

Looks like - he not alone )

https://github.com/RandomInternetPreson/ComfyUI_LTX-2_VRAM_Memory_Management/issues/10
https://github.com/Lightricks/ComfyUI-LTXVideo/issues/430
https://github.com/Comfy-Org/ComfyUI/issues/10010
https://github.com/Lightricks/ComfyUI-LTXVideo/issues/475
https://github.com/kijai/ComfyUI-KJNodes/issues/608
https://github.com/Comfy-Org/ComfyUI/issues/11664
https://github.com/Lightricks/ComfyUI-LTXVideo/issues/386
https://github.com/Lightricks/ComfyUI-LTXVideo/issues/336

So, my investigavion follow to this -> [ffmpeg/aacenc.c](https://ffmpeg.org/doxygen/trunk/aacenc_8c_source.html)

```
~903
             for (k = 0; k < 1024; k++) {
                 if (!(fabs(cpe->ch[ch].coeffs[k]) < 1E16)) { // Ensure headroom for energy calculation
                     av_log(avctx, AV_LOG_ERROR, "Input contains (near) NaN/+-Inf\n");
                     return AVERROR(EINVAL);
                 }
             }
~908
```

It mean error was caused by ffmpeg but thrown into ComfyUI via Python.
I've tried to replace ComfyUI/comfy_api/latest/_input_impl/video_types.py with [video_types.py](https://github.com/user-attachments/files/29089274/video_types.py) or patch with

```
~577
-                frame = av.AudioFrame.from_ndarray(waveform.float().cpu().contiguous().numpy(), format='fltp', layout=layout)
+                waveform = torch.nan_to_num(waveform, nan=0.0, posinf=1.0, neginf=-1.0)
+                waveform = s_waveform.clamp(-1.0, 1.0)
+                frame = av.AudioFrame.from_ndarray(waveform.float().cpu().contiguous().numpy(), format='fltp', layout=layout)
~577
```

And look's like in (#386, #475, #11664) this error because AudioVAE in LTX-2.3 may be sensitive to sequence length. At "inconvenient" lengths (e.g., 69 or 121 frames), it may be leads to microscopic numerical anomalies, which Vocoder amplifies to NaNs. May be ensures that libav always receives a valid buffer(possibly with a micro-audio artifact, but without a complete  crash)?

Also in all attempts

```
[INFO] VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
[INFO] VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
```

torch.nn.functional.scaled_dot_product_attention if it uses Math backend it can calculate in F32 but most FA(Sage) like and other if they use fp16 at input - export in fp16! and exactly this can causes errors like.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Input contains (near) NaN/+-Inf, LTX-2.3, ComfyUI #523

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Input contains (near) NaN/+-Inf, LTX-2.3, ComfyUI #523

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions