We currently support only distributed data parallel.
When working with bigger models (e.g. with language models), the models are so large that they don't fit anymore in a single GPU memory, but need to be distributed across GPUs. This is where Fully Sharded Data Parallel (FSDP) comes in.
Current Opacus already has some kind of support for this, see: https://github.com/meta-pytorch/opacus/blob/main/examples/fsdp_example.py so likely we should be able to support FSDP also.
We currently support only distributed data parallel.
When working with bigger models (e.g. with language models), the models are so large that they don't fit anymore in a single GPU memory, but need to be distributed across GPUs. This is where Fully Sharded Data Parallel (FSDP) comes in.
Current Opacus already has some kind of support for this, see: https://github.com/meta-pytorch/opacus/blob/main/examples/fsdp_example.py so likely we should be able to support FSDP also.