Add FSDP support for large models

We currently support only distributed data parallel.

When working with bigger models (e.g. with language models), the models are so large that they don't fit anymore in a single GPU memory, but need to be distributed across GPUs. This is where Fully Sharded Data Parallel (FSDP) comes in.

Current Opacus already has some kind of support for this, see: https://github.com/meta-pytorch/opacus/blob/main/examples/fsdp_example.py so likely we should be able to support FSDP also.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FSDP support for large models #15

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add FSDP support for large models #15

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions