Feature request
To go beyond exact duplicate detection, CleanSweep could offer perceptual hashing (pHash, dHash) to identify visually similar photos and screenshots even when their metadata or minor modifications differ. This would help users clean up near-duplicates such as burst shots, edited versions or resized images.
Suggested approach:
- Use a Python library like
imagehash (which wraps PIL) or implement a simple dHash.
- When scanning directories, compute perceptual hash for image files and group images within a small Hamming distance threshold (e.g. <= 5).
- Add a CLI flag
--similar-images to enable or disable this feature.
- Provide a summary report listing each group of similar images along with file paths and similarity scores.
Why it matters: Many users have hundreds of nearly identical photos taking up storage. Byte-level duplicate detection misses these near duplicates. Adding perceptual hashing will make CleanSweep even more powerful for decluttering photo libraries.
If you'd like to work on this, please comment below!
Feature request
To go beyond exact duplicate detection, CleanSweep could offer perceptual hashing (pHash, dHash) to identify visually similar photos and screenshots even when their metadata or minor modifications differ. This would help users clean up near-duplicates such as burst shots, edited versions or resized images.
Suggested approach:
imagehash(which wrapsPIL) or implement a simple dHash.--similar-imagesto enable or disable this feature.Why it matters: Many users have hundreds of nearly identical photos taking up storage. Byte-level duplicate detection misses these near duplicates. Adding perceptual hashing will make CleanSweep even more powerful for decluttering photo libraries.
If you'd like to work on this, please comment below!