You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on May 1, 2025. It is now read-only.
I really appreciate the excellent paper.
I tested factCC on CNN/DM dataset using gold reference sentences as claims(splitted into single sentence).
I strictly followed md, and used the official pre-trained factCC checkpoint.
I labeled all the claims as 'CORRECT' (because they are gold references).
The accuracy output by factCC is around 42% which means the model thinks only 42% of the reference sentences is factuality correct.
Is this reasonable or did I wrongly use the metric ?
I really appreciate the excellent paper.
I tested factCC on CNN/DM dataset using gold reference sentences as claims(splitted into single sentence).
I strictly followed md, and used the official pre-trained factCC checkpoint.
I labeled all the claims as 'CORRECT' (because they are gold references).
The accuracy output by factCC is around 42% which means the model thinks only 42% of the reference sentences is factuality correct.
Is this reasonable or did I wrongly use the metric ?