perf: run LTXVDilateVideoMask pooling on GPU for massive speedup by Denarius40 · Pull Request #514 · Lightricks/ComfyUI-LTXVideo

Denarius40 · 2026-06-17T21:32:19Z

Summary

LTXVDilateVideoMask was running F.max_pool2d / F.max_pool1d on the CPU regardless of whether a GPU was available. This change moves the mask to the active torch device before the pooling passes (via comfy.model_management.get_torch_device()) and returns it to CPU afterwards.

Performance

In local testing this yields a ~500x speedup when a GPU is available, since the max-pool operations are dramatically faster on CUDA than on CPU for typical video mask sizes.

Max-pooling ops fail when the input tensor is on CPU but ComfyUI is running on CUDA. Move the mask to the active torch device before the pooling passes and return it to CPU afterwards so downstream nodes receive a CPU tensor as expected by ComfyUI conventions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: run LTXVDilateVideoMask pooling on GPU for massive speedup#514

perf: run LTXVDilateVideoMask pooling on GPU for massive speedup#514
Denarius40 wants to merge 1 commit into
Lightricks:masterfrom
Denarius40:fix/dilate-mask-device

Denarius40 commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Denarius40 commented Jun 17, 2026

Summary

Performance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant