Is factCC reliable for factual correctness evaluation?

I really appreciate the excellent paper.
I tested factCC on CNN/DM dataset using gold reference sentences as claims(splitted into single sentence).
I strictly followed md, and used the official pre-trained factCC checkpoint.
I labeled all the claims as 'CORRECT' (because they are gold references).
The accuracy output by factCC is around 42% which means the model thinks only 42% of the reference sentences is factuality correct.
Is this reasonable or did I wrongly use the metric ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is factCC reliable for factual correctness evaluation? #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is factCC reliable for factual correctness evaluation? #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions