Here is the snippet of the error:
04/03/2021 14:39:09 - WARNING - main - Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False
04/03/2021 14:39:10 - INFO - pytorch_transformers.modeling_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json from cache at /home/shankar5/.cache/torch/pytorch_transformers/4dad0251492946e18ac39290fcfe91b89d370fee250efe9521476438fe8ca185.7156163d5fdc189c3016baca0775ffce230789d7fa2a42ef516483e4ca884517
04/03/2021 14:39:10 - INFO - pytorch_transformers.modeling_utils - Model config {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"finetuning_task": "factcc_annotated",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"num_labels": 2,
"output_attentions": false,
"output_hidden_states": false,
"pad_token_id": 0,
"pruned_heads": {},
"torchscript": false,
"type_vocab_size": 2,
"vocab_size": 30522
}
04/03/2021 14:39:10 - INFO - pytorch_transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/shankar5/.cache/torch/pytorch_transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
04/03/2021 14:39:10 - INFO - main - Loading model from checkpoint.
04/03/2021 14:39:10 - INFO - pytorch_transformers.modeling_utils - loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin from cache at /home/shankar5/.cache/torch/pytorch_transformers/aa1ef1aede4482d0dbcd4d52baad8ae300e60902e88fcb0bebdec09afd232066.36ca03ab34a1a5d5fa7bc3d03d55c4fa650fed07220e2eeebc06ce58d0e9a157
04/03/2021 14:39:15 - INFO - pytorch_transformers.modeling_utils - Weights of BertForSequenceClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
04/03/2021 14:39:15 - INFO - pytorch_transformers.modeling_utils - Weights from pretrained model not used in BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
04/03/2021 14:39:15 - INFO - main - Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='', config_name='', data_dir='/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/data', device=device(type='cpu'), do_eval=True, do_lower_case=True, do_train=False, eval_all_checkpoints=True, evaluate_during_training=False, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=5e-05, local_rank=-1, logging_steps=100, loss_lambda=0.1, max_grad_norm=1.0, max_seq_length=512, max_steps=-1, model_name_or_path='bert-base-uncased', model_type='bert', n_gpu=0, no_cuda=False, num_train_epochs=3.0, output_dir='/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/factcc-checkpoint', output_mode='classification', overwrite_cache=True, overwrite_output_dir=False, per_gpu_eval_batch_size=8, per_gpu_train_batch_size=12, save_steps=50, seed=42, task_name='factcc_annotated', tokenizer_name='', train_from_scratch=False, warmup_steps=0, weight_decay=0.0)
04/03/2021 14:39:15 - INFO - main - Evaluate the following checkpoints: ['/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/factcc-checkpoint']
04/03/2021 14:39:20 - INFO - main - Creating features from dataset file at /project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/data
04/03/2021 14:39:20 - INFO - utils - Writing example 0 of 5115
label_map label: ['CORRECT', 'INCORRECT']
Traceback (most recent call last):
File "/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/modeling/run.py", line 519, in
main()
File "/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/modeling/run.py", line 511, in main
result = evaluate(args, model, tokenizer, prefix=global_step)
File "/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/modeling/run.py", line 215, in evaluate
eval_dataset = load_and_cache_examples(args, eval_task, tokenizer, evaluate=True)
File "/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/modeling/run.py", line 300, in load_and_cache_examples
pad_token_segment_id=4 if args.model_type in ['xlnet'] else 0)
File "/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/modeling/utils.py", line 307, in convert_examples_to_features
label_id = label_map[example.label]
KeyError: 0
I formatted the jsonl file according to the instructions, but changed the id to label since it threw errors:
{
"label": 0,
"text": "This statistic presents the most common concerns about online versus on-campus learning options according to online students in the United States in 2019 . During the survey period , 21 percent of respondents expressed some concern about the perception of their online degree by prospective employers .\n",
"claim": "This statistic shows the results of a survey conducted in the country in 2019 on the importance of online in a Perception_of_online_learning_degree_by_prospective_employers . Some 31 % of respondents stated that online was Quality_and_instruction_and_academic_support to them in a Perception_of_online_learning_degree_by_prospective_employers because they Quality_and_instruction_and_academic_support feel it .\n"
},
{
"label": 1,
"text": "The statistic shows the divorce rate in Norway from 2009 to 2019 , by gender . The divorce rate overall declined within this decade . In 2019 , there were ten divorces per thousand married and separated males , and 10.3 divorces per thousand married and separated females .\n",
"claim": "This statistic shows the rate Norway of from 2009 to 2019 by gender . In 2019 , Norway 's Males Norway amounted to approximately 10.0 million , while the Females Norway amounted to approximately 10.3 million inhabitants .\n"
},
{
"label": 2,
"text": "This statistic shows the Milliman Medical Index ( MMI ) or the annual medical cost for a family of four in the U.S. from 2013 to 2020 . In 2013 , the projected annual medical cost for a family of four was 22,030 U.S. dollars whereas this cost increased to 28,653 U.S. dollars in 2020 .\n",
"claim": "This statistic presents the Annual medical of cost and for to family in U.S. U.S. U.S. 2013 to 2019 , with a forecast for 2020 . Over this period , the medical of the cost and for industry to family in U.S. U.S. increased , reaching around 28166 million U.S. in 2018 .\n"
},
Here is the snippet of the error:
04/03/2021 14:39:09 - WARNING - main - Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False
04/03/2021 14:39:10 - INFO - pytorch_transformers.modeling_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json from cache at /home/shankar5/.cache/torch/pytorch_transformers/4dad0251492946e18ac39290fcfe91b89d370fee250efe9521476438fe8ca185.7156163d5fdc189c3016baca0775ffce230789d7fa2a42ef516483e4ca884517
04/03/2021 14:39:10 - INFO - pytorch_transformers.modeling_utils - Model config {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"finetuning_task": "factcc_annotated",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"num_labels": 2,
"output_attentions": false,
"output_hidden_states": false,
"pad_token_id": 0,
"pruned_heads": {},
"torchscript": false,
"type_vocab_size": 2,
"vocab_size": 30522
}
04/03/2021 14:39:10 - INFO - pytorch_transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/shankar5/.cache/torch/pytorch_transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
04/03/2021 14:39:10 - INFO - main - Loading model from checkpoint.
04/03/2021 14:39:10 - INFO - pytorch_transformers.modeling_utils - loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin from cache at /home/shankar5/.cache/torch/pytorch_transformers/aa1ef1aede4482d0dbcd4d52baad8ae300e60902e88fcb0bebdec09afd232066.36ca03ab34a1a5d5fa7bc3d03d55c4fa650fed07220e2eeebc06ce58d0e9a157
04/03/2021 14:39:15 - INFO - pytorch_transformers.modeling_utils - Weights of BertForSequenceClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
04/03/2021 14:39:15 - INFO - pytorch_transformers.modeling_utils - Weights from pretrained model not used in BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
04/03/2021 14:39:15 - INFO - main - Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='', config_name='', data_dir='/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/data', device=device(type='cpu'), do_eval=True, do_lower_case=True, do_train=False, eval_all_checkpoints=True, evaluate_during_training=False, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=5e-05, local_rank=-1, logging_steps=100, loss_lambda=0.1, max_grad_norm=1.0, max_seq_length=512, max_steps=-1, model_name_or_path='bert-base-uncased', model_type='bert', n_gpu=0, no_cuda=False, num_train_epochs=3.0, output_dir='/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/factcc-checkpoint', output_mode='classification', overwrite_cache=True, overwrite_output_dir=False, per_gpu_eval_batch_size=8, per_gpu_train_batch_size=12, save_steps=50, seed=42, task_name='factcc_annotated', tokenizer_name='', train_from_scratch=False, warmup_steps=0, weight_decay=0.0)
04/03/2021 14:39:15 - INFO - main - Evaluate the following checkpoints: ['/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/factcc-checkpoint']
04/03/2021 14:39:20 - INFO - main - Creating features from dataset file at /project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/data
04/03/2021 14:39:20 - INFO - utils - Writing example 0 of 5115
label_map label: ['CORRECT', 'INCORRECT']
Traceback (most recent call last):
File "/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/modeling/run.py", line 519, in
main()
File "/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/modeling/run.py", line 511, in main
result = evaluate(args, model, tokenizer, prefix=global_step)
File "/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/modeling/run.py", line 215, in evaluate
eval_dataset = load_and_cache_examples(args, eval_task, tokenizer, evaluate=True)
File "/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/modeling/run.py", line 300, in load_and_cache_examples
pad_token_segment_id=4 if args.model_type in ['xlnet'] else 0)
File "/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/modeling/utils.py", line 307, in convert_examples_to_features
label_id = label_map[example.label]
KeyError: 0
I formatted the jsonl file according to the instructions, but changed the id to label since it threw errors:
{
"label": 0,
"text": "This statistic presents the most common concerns about online versus on-campus learning options according to online students in the United States in 2019 . During the survey period , 21 percent of respondents expressed some concern about the perception of their online degree by prospective employers .\n",
"claim": "This statistic shows the results of a survey conducted in the country in 2019 on the importance of online in a Perception_of_online_learning_degree_by_prospective_employers . Some 31 % of respondents stated that online was Quality_and_instruction_and_academic_support to them in a Perception_of_online_learning_degree_by_prospective_employers because they Quality_and_instruction_and_academic_support feel it .\n"
},
{
"label": 1,
"text": "The statistic shows the divorce rate in Norway from 2009 to 2019 , by gender . The divorce rate overall declined within this decade . In 2019 , there were ten divorces per thousand married and separated males , and 10.3 divorces per thousand married and separated females .\n",
"claim": "This statistic shows the rate Norway of from 2009 to 2019 by gender . In 2019 , Norway 's Males Norway amounted to approximately 10.0 million , while the Females Norway amounted to approximately 10.3 million inhabitants .\n"
},
{
"label": 2,
"text": "This statistic shows the Milliman Medical Index ( MMI ) or the annual medical cost for a family of four in the U.S. from 2013 to 2020 . In 2013 , the projected annual medical cost for a family of four was 22,030 U.S. dollars whereas this cost increased to 28,653 U.S. dollars in 2020 .\n",
"claim": "This statistic presents the Annual medical of cost and for to family in U.S. U.S. U.S. 2013 to 2019 , with a forecast for 2020 . Over this period , the medical of the cost and for industry to family in U.S. U.S. increased , reaching around 28166 million U.S. in 2018 .\n"
},