Skip to content

请问是否支持多级多卡 #27

Description

@lainxx

采用指令 torchrun --standalone --nnodes=4 --nproc-per-node=8 train_pretrain_stage0.py
报错:torch.distributed.elastic.rendezvous.api.RendezvousTimeoutError

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions