Skip to content
This repository was archived by the owner on Nov 1, 2024. It is now read-only.
This repository was archived by the owner on Nov 1, 2024. It is now read-only.

implementation clarification #8

@n0thing233

Description

@n0thing233

Hi team,
In the paper it mentioned the algorithm details is:
"Within each cluster, we compute all pairwise cosine similarities and set a threshold cosine similarity above which data pairs are considered semantic duplicates. Finally, from each group of semantic duplicates within a cluster, we keep the image with the lowest cosine similarity to the cluster centroid and remove the rest."

but in your implementation code , both in advance_semdedup.py and semdedup.py
you are removing all data points that the max parewise distance is above threshold:
eps_points_to_remove = M > 1 - eps

Does this mean the implementation provided in this repository is a simplified version of semdedup?
and the non-simplified version is not public?

I wonder for the metrics published in the paper, is it based on simplified version of semdedup or non-simplified version?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions