Skip to content

Feature: Add perceptual image hashing to detect similar photos #2

@vgudur-dev

Description

@vgudur-dev

Feature request

To go beyond exact duplicate detection, CleanSweep could offer perceptual hashing (pHash, dHash) to identify visually similar photos and screenshots even when their metadata or minor modifications differ. This would help users clean up near-duplicates such as burst shots, edited versions or resized images.

Suggested approach:

  • Use a Python library like imagehash (which wraps PIL) or implement a simple dHash.
  • When scanning directories, compute perceptual hash for image files and group images within a small Hamming distance threshold (e.g. <= 5).
  • Add a CLI flag --similar-images to enable or disable this feature.
  • Provide a summary report listing each group of similar images along with file paths and similarity scores.

Why it matters: Many users have hundreds of nearly identical photos taking up storage. Byte-level duplicate detection misses these near duplicates. Adding perceptual hashing will make CleanSweep even more powerful for decluttering photo libraries.

If you'd like to work on this, please comment below!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions