Discrepancy between pre-trained and self trained models

Hello,

We are running evaluations for the E3 Cadets dataset and have encountered a discrepancy between the paper's results and our own self-trained model.

The provided **pre-trained model** *perfectly* matches the results reported in the paper (0.9701 F1-Score). This gives us confidence that our evaluation setup is correct.

However, our **self-trained model** performs significantly worse. The F1-Score drops from 0.9701 to 0.8972, which is a **-7.51%** difference. We generally find a difference of 2% to be acceptable however this is greater than that.

OUR RESULTS 

| Model | Precision | Recall | F1-Score | % F1 Diff (from Paper) |
| :--- | :--- | :--- | :--- | :--- |
| **Paper (Baseline)** | 0.9440 | 0.9977 | **0.9701** | N/A |
| **Pre-trained** | 0.9441 | 0.9977 | **0.9701** | $0.00\%$ |
| **Own-trained** | 0.8151 | 0.9977 | **0.8972** | $-7.51\%$ |

Since the pre-trained model and paper results are identical, the discrepancy seems to be in the training process itself. Any guidance on why there is such a difference?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy between pre-trained and self trained models #28

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model	Precision	Recall	F1-Score	% F1 Diff (from Paper)
Paper (Baseline)	0.9440	0.9977	0.9701	N/A
Pre-trained	0.9441	0.9977	0.9701	$0.00%$
Own-trained	0.8151	0.9977	0.8972	$-7.51%$

Discrepancy between pre-trained and self trained models #28

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions