OverflowError: out of range integral type conversion attempted

Hi, i'm replicating your training shell just like readme said with sh train_bart_model.sh command. And this error apear al the end.

{'loss': 0.0193, 'grad_norm': 0.06763239204883575, 'learning_rate': 2.983362019506598e-06, 'epoch': 2.98}                
{'loss': 0.0197, 'grad_norm': 0.07400441914796829, 'learning_rate': 1.835915088927137e-06, 'epoch': 2.99}                
{'loss': 0.0201, 'grad_norm': 0.07796286791563034, 'learning_rate': 6.884681583476765e-07, 'epoch': 2.99}                
{'train_runtime': 8481.4381, 'train_samples_per_second': 105.243, 'train_steps_per_second': 0.411, 'train_loss': 0.0540917304825899, 'epoch': 3.0}
100%|██████████████████████████████████████████████████████████████████████████████| 3486/3486 [2:21:21<00:00,  2.43s/it]
[WARNING|configuration_utils.py:447] 2024-03-30 18:17:42,674 >> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
Non-default generation parameters: {'early_stopping': True, 'num_beams': 4, 'no_repeat_ngram_size': 3, 'forced_bos_token_id': 0, 'forced_eos_token_id': 2}
***** train metrics *****
  epoch                    =        3.0
  train_loss               =     0.0541
  train_runtime            = 2:21:21.43
  train_samples            =     297536
  train_samples_per_second =    105.243
  train_steps_per_second   =      0.411
100%|████████████████████████████████████████████████████████████████████████████████████| 63/63 [26:08<00:00, 21.08s/it]Traceback (most recent call last):
  File "/var/www/nlp/spelling/run_summarization.py", line 708, in <module>
    main()
  File "/var/www/nlp/spelling/run_summarization.py", line 650, in main
    metrics = trainer.evaluate(max_length=max_length, num_beams=num_beams, metric_key_prefix="eval")
  File "/var/www/nlp/spelling/venv/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 180, in evaluate
    return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
  File "/var/www/nlp/spelling/venv/lib/python3.10/site-packages/transformers/trainer.py", line 3365, in evaluate
    output = eval_loop(
  File "/var/www/nlp/spelling/venv/lib/python3.10/site-packages/transformers/trainer.py", line 3656, in evaluation_loop
    metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
  File "/var/www/nlp/spelling/run_summarization.py", line 590, in compute_metrics
    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
  File "/var/www/nlp/spelling/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3785, in batch_decode
    return [
  File "/var/www/nlp/spelling/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3786, in <listcomp>
    self.decode(
  File "/var/www/nlp/spelling/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3825, in decode
    return self._decode(
  File "/var/www/nlp/spelling/venv/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 625, in _decode
    text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
OverflowError: out of range integral type conversion attempted
100%|████████████████████████████████████████████████████████████████████████████████████| 63/63 [26:09<00:00, 24.91s/it]

My enviroment is ubuntu 20.04 , 32GB RAM 48Cores,  RTX4080. 
Sat Mar 30 16:50:57 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:81:00.0  On |                  N/A |
| 53%   68C    P2   196W / 320W |  10981MiB / 16376MiB |     79%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3901      G   /usr/lib/xorg/Xorg                144MiB |
|    0   N/A  N/A      4070      G   /usr/bin/gnome-shell               66MiB |
|    0   N/A  N/A      6775      G   ...3/usr/lib/firefox/firefox       11MiB |
|    0   N/A  N/A     15644      G   ...on=20240329-134507.235000       58MiB |
|    0   N/A  N/A     32210      C   python                          10694MiB |
+-----------------------------------------------------------------------------+


And mi last checkpoint was: 3000.
I don't know if the 3 process was finish. 
Thanks in advance
Martín

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OverflowError: out of range integral type conversion attempted #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

OverflowError: out of range integral type conversion attempted #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions