I try to use TUPE in the NMT encoder, but got loss exploding error. Does it need some fix for TUPE to use in NMT?
the error is like:
2021-01-02 14:08:12 | INFO | train_inner | epoch 001: 12110 / 53999 loss=4.403, nll_loss=2.737, ppl=6.67, wps=46907.8, ups=1.07, wpb=43719.3, bsz=1694.4, num_updates=12100, lr=0.000325246, gnorm=0.245, loss_scale=4, train_wall=93, wall=0
2021-01-02 14:09:40 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 2.0
2021-01-02 14:09:41 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 1.0
2021-01-02 14:09:42 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.5
2021-01-02 14:09:43 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.25
2021-01-02 14:09:44 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.125
2021-01-02 14:09:45 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0625
2021-01-02 14:09:46 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.03125
2021-01-02 14:09:46 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.015625
2021-01-02 14:09:47 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125
2021-01-02 14:09:48 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.00390625
2021-01-02 14:09:49 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.001953125
2021-01-02 14:09:50 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0009765625
2021-01-02 14:09:51 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.00048828125
2021-01-02 14:09:52 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.000244140625
2021-01-02 14:09:53 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0001220703125
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.encoder.layers.0.self_attn.dropout_module, shape: torch.Size([2816, 34, 34]), forward input max: nan, input min: nan
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.encoder.layers.0.self_attn.dropout_module, shape: torch.Size([7296, 13, 13]), forward input max: nan, input min: nan
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.encoder.layers.0.self_attn.dropout_module, shape: torch.Size([2304, 38, 38]), forward input max: nan, input min: nan
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.encoder.layers.0.self_attn.dropout_module, shape: torch.Size([2816, 33, 33]), forward input max: nan, input min: nan
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.decoder.output_projection, shape: torch.Size([176, 23, 47038]), backward
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.decoder.output_projection, shape: torch.Size([144, 40, 47038]), backward
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.decoder.output_projection, shape: torch.Size([176, 31, 47038]), backward
2021-01-02 14:09:54 | WARNING | fairseq.nan_detector | NaN detected in output of module.decoder.output_projection, shape: torch.Size([456, 12, 47038]), backward
FloatingPointError: Minimum loss scale reached (0.0001). Your loss is probably exploding. Try lowering the learning rate, using gradient clipping or increasing the batch size.
Any help is appreciate! Thx.
I try to use TUPE in the NMT encoder, but got loss exploding error. Does it need some fix for TUPE to use in NMT?
the error is like:
Any help is appreciate! Thx.