Skip to content

多机训练失败后,非master node的进程没有完全kill掉 #416

@frankxyy

Description

@frankxyy

如题,多机训练失败后,非master node还是存活着一个libai进程,导致会持续向控制台打印日志。类似这样的日志:
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions