Skip to content

perf: run LTXVDilateVideoMask pooling on GPU for massive speedup#514

Open
Denarius40 wants to merge 1 commit into
Lightricks:masterfrom
Denarius40:fix/dilate-mask-device
Open

perf: run LTXVDilateVideoMask pooling on GPU for massive speedup#514
Denarius40 wants to merge 1 commit into
Lightricks:masterfrom
Denarius40:fix/dilate-mask-device

Conversation

@Denarius40

Copy link
Copy Markdown

Summary

LTXVDilateVideoMask was running F.max_pool2d / F.max_pool1d on the CPU regardless of whether a GPU was available. This change moves the mask to the active torch device before the pooling passes (via comfy.model_management.get_torch_device()) and returns it to CPU afterwards.

Performance

In local testing this yields a ~500x speedup when a GPU is available, since the max-pool operations are dramatically faster on CUDA than on CPU for typical video mask sizes.

Max-pooling ops fail when the input tensor is on CPU but ComfyUI is
running on CUDA. Move the mask to the active torch device before the
pooling passes and return it to CPU afterwards so downstream nodes
receive a CPU tensor as expected by ComfyUI conventions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant