Skip to content

feat: semantic deduplication #1

@kcelestinomaria

Description

@kcelestinomaria

Embedding-based near-duplicate detection using FAISS or Annoy, is important.

Implementing a configurable threshold for “similarity score” will help remove redundant rows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions