Skip to content

Training stuck at Epoch 15 #23

@Brunettow

Description

@Brunettow

Hello,
I can train the model since the process kills itself after this message:

Building the data loader. Curriculum = 3/8, length = 32218.
Epoch 15 acc/qa=1.000000 loss=0.046158 loss/qa=0.046158 time/data=0.008719 time/step=1.016501: 100%|##############################| 1006/1006 [18:08<00:00, 1.08s/it]
Epoch 15 (validation) validation/acc/qa=1.000000: 2%|#4 | 20/1094 [00:41<11:04, 1.62it/s]/home/colors/Desktop/nscl/Jacinle/bin/jac-crun: line 6: 3305 Killed $JACROOT/bin/jac-run "$@"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions