implementation clarification

Hi team,
In the paper it mentioned the algorithm details is:
"Within each cluster, we compute all pairwise cosine similarities and set a threshold cosine similarity above which data pairs are considered semantic duplicates. Finally, from each group of semantic duplicates within a cluster, we keep the image with the lowest cosine similarity to the cluster centroid and remove the rest."

but in your implementation code , both in `advance_semdedup.py` and `semdedup.py`
you are removing all data points that the max parewise distance is above threshold:
`eps_points_to_remove = M > 1 - eps`

Does this mean the implementation provided in this repository is a simplified version of semdedup?
and the non-simplified version is not public?


I wonder for the metrics published in the paper, is it based on simplified version of semdedup or non-simplified version?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implementation clarification #8

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

implementation clarification #8

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions