You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 1, 2024. It is now read-only.
Hi team,
In the paper it mentioned the algorithm details is:
"Within each cluster, we compute all pairwise cosine similarities and set a threshold cosine similarity above which data pairs are considered semantic duplicates. Finally, from each group of semantic duplicates within a cluster, we keep the image with the lowest cosine similarity to the cluster centroid and remove the rest."
but in your implementation code , both in advance_semdedup.py and semdedup.py
you are removing all data points that the max parewise distance is above threshold: eps_points_to_remove = M > 1 - eps
Does this mean the implementation provided in this repository is a simplified version of semdedup?
and the non-simplified version is not public?
I wonder for the metrics published in the paper, is it based on simplified version of semdedup or non-simplified version?
Hi team,
In the paper it mentioned the algorithm details is:
"Within each cluster, we compute all pairwise cosine similarities and set a threshold cosine similarity above which data pairs are considered semantic duplicates. Finally, from each group of semantic duplicates within a cluster, we keep the image with the lowest cosine similarity to the cluster centroid and remove the rest."
but in your implementation code , both in
advance_semdedup.pyandsemdedup.pyyou are removing all data points that the max parewise distance is above threshold:
eps_points_to_remove = M > 1 - epsDoes this mean the implementation provided in this repository is a simplified version of semdedup?
and the non-simplified version is not public?
I wonder for the metrics published in the paper, is it based on simplified version of semdedup or non-simplified version?