Hi, I have run SDXL with Tensor Parallel as well as sequence parallel. Below is my PR, and may it help those who need it.
The Motivation:
Just trying to avoid using grad checkpointing to get higher throughput when inputs have higher resolution like 720p.
However, tensor parallel comes at a cost, and I have not gained throughput by TP. (Tested with 720*1080 on A100, batchsize=16 and amp).
Just in case someone have the same idea or try to run tensor prarallel with more blocks, below is my code changes:
PR: support tensor parallel for sdxl
Hi, I have run SDXL with
Tensor Parallelas well assequence parallel. Below is my PR, and may it help those who need it.The Motivation:
Just trying to avoid using
grad checkpointingto get higher throughput when inputs have higher resolution like 720p.However, tensor parallel comes at a cost, and I have not gained throughput by TP. (Tested with 720*1080 on A100, batchsize=16 and amp).
Just in case someone have the same idea or try to run tensor prarallel with more blocks, below is my code changes:
PR: support tensor parallel for sdxl