zero-1

Here are 2 public repositories matching this topic...

xrsrke / pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*

transformers moe data-parallelism distributed-optimizers model-parallelism megatron mixture-of-experts pipeline-parallelism huggingface-transformers megatron-lm tensor-parallelism large-scale-language-modeling 3d-parallelism zero-1 sequence-parallelism

Updated Dec 14, 2023
Python

kadamrahul18 / GPT2-Optimization

Star

GPT-2 (124M) fixed-work multi-GPU training benchmark on Slurm (V100) using DeepSpeed ZeRO-1 + AMP. Measured 1→4 GPU scaling (3.42× throughput) with reproducible run artifacts (configs + metrics JSON + commit IDs).

performance hpc amp slurm pytorch reproducibility distributed-training mixed-precision gpt2 deepspeed zero-1 gpu-benchmarking

Updated Jan 8, 2026
Python

Improve this page

Add a description, image, and links to the zero-1 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the zero-1 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly