loss become infinite while training quant models

hi, when i try to train a quant model using config`detectron2/configs/COCO-Detection/retinanet_R_18_FPN_1x-Full-SyncBN-lsq-2bit.yaml`, and the loss became `nan` at iterations 390

```bash
-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/zhangjinhe/anaconda3/envs/torch/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/home/zhangjinhe/QTools/git/detectron2/detectron2/engine/launch.py", line 125, in _distributed_worker    main_func(*args)
  File "/home/zhangjinhe/QTools/git/detectron2/tools/train_net.py", line 154, in main
    return trainer.train()
  File "/home/zhangjinhe/QTools/git/detectron2/detectron2/engine/defaults.py", line 489, in train    super().train(self.start_iter, self.max_iter)
  File "/home/zhangjinhe/QTools/git/detectron2/detectron2/engine/train_loop.py", line 149, in train    self.run_step()  File "/home/zhangjinhe/QTools/git/detectron2/detectron2/engine/defaults.py", line 499, in run_step    self._trainer.run_step()  File "/home/zhangjinhe/QTools/git/detectron2/detectron2/engine/train_loop.py", line 289, in run_step    self._write_metrics(loss_dict, data_time)  File "/home/zhangjinhe/QTools/git/detectron2/detectron2/engine/train_loop.py", line 332, in _write_metrics
    f"Loss became infinite or NaN at iteration={self.iter}!\n"
FloatingPointError: Loss became infinite or NaN at iteration=390!
```
The commang i use is `python tools/train_net.py --config-file configs/COCO-Detection/retinanet_R_18_FPN_1x-Full-SyncBN-lsq-2bit.yaml --num-gpus 4 MODEL.WEIGHTS output/coco-detection/retinanet_R_18_FPN_1x-Full_BN/model_final.pth `

I change the input_size from `(640, 672, 704, 736, 768, 800)` to `(800,)` and the checkpoint file is the result of another experiment using config `retinanet_R_18_FPN_1x-Full-BN.yaml`

Any ideas why?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loss become infinite while training quant models #5

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

loss become infinite while training quant models #5

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions