Skip to content

MPI TRUNCATED when run imagenet dataset on resnet18 #1

@GeKeShi

Description

@GeKeShi

hello, I‘m trying to test this code on imagenet, but I find that when the program runs to self.comm.Bcast([self.model_recv_buf.recv_buf[layer_idx], MPI.DOUBLE], root=0) in function async_fetch_weights_bcast in distributed_worker.py at the first step, it thrown an error that is MPI_ERR_TRUNCATE: message truncated , but I check the memory size in Bcast and it works when the program ran on Cifar10/100, have u encountered this problem?

And another issue: then I replaced the Pytorch0.3.0 with Pytorch0.4/1.1, the proceeding time on decode of QSGD is significantly higher than 0.3.0, almost 10 times than it, have u tried this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions