Skip to content

training issues #12

@asmodaay

Description

@asmodaay

Hi, i tried to train model with only LJ data, and with only own data, with fp16 and with fr32, with 1 gpu and with 3 gpu, but everywhere i have this
Снимок экрана 2020-05-17 в 19 46 32
Always los is Nan.
When i start with pretrained chekpoint your code return this:
Снимок экрана 2020-05-17 в 19 52 44
I solve it by changing def load_checkpoint , but loss is nan(

do u have any ideas what am i doing wrong?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions