CloneWiper is a high-performance, modern duplicate file detection tool built with Python and PySide6 (Qt). It follows Material Design 3 principles to provide a premium, seamless experience for managing your file library.
- Smart Duplicate Detection: Five hash modes for flexible duplicate detection
- MD5 Only: Fast exact duplicate detection using MD5 checksums (best for identical files)
- Single Perceptual Hash: Detects visually similar images using phash algorithm
- Multi-Algorithm Perceptual Hashing (Default): Combines four algorithms (average_hash, phash, dhash, whash) with voting mechanism for superior accuracy
- Uses Hamming distance comparison with voting (requires 3/4 algorithms to agree)
- Detects duplicates even when images are resized, compressed, or slightly modified
- Tier-1 pHash screening skips full multi-hash work for images with a unique perceptual hash
- Optimized with parallel hash calculation, batched size-group prefiltering, and Union-Find similarity grouping
- Single P-Hash + ORB: Single perceptual hash with ORB feature verification for higher-confidence image matches
- Multi-Algo P-Hash + ORB: Multi-algorithm voting plus ORB verification (most accurate, slowest)
- Image Support: Works with common formats (JPEG, PNG, GIF, BMP, TIFF, WebP) and RAW files (CR2, NEF, ARW, etc.)
- Video Support: Perceptual hashing for video files using keyframe extraction
- Cross-Platform Support: Works on Windows (macOS support from source code only)
- High Performance:
- Asynchronous processing with multi-threaded file scanning
- Fast Scanning: Uses
os.scandirfor efficient file system enumeration (up to 20x faster than traditional scanning) - Dynamic CPU optimization for hybrid architectures (P-cores/E-cores detection)
- Adaptive I/O strategy (preloads small files for MD5, chunks large files)
- Similarity grouping tuned for throughput (tiered pHash screening, LSH-style candidate search, Union-Find clustering, priority-ordered pair comparison)
- Batch cache writes to reduce database lock contention
- Persistent Caching:
- Hash Cache: SQLite-backed cache (p-hash and MD5) for fast re-scans
- Thumbnail Cache: Local SQLite database persists only expensive previews (documents, videos, and audio artwork); regular images use memory cache only
- Material Design 3 UI: Clean, modern dark-themed interface with rounded corners (when not maximized)
- Custom Title Bar: Frameless window with native Windows resize, drag-to-snap, and Windows 11 Snap Layout support
- Smart Thumbnails:
- Images: Fast previews, including RAW support (
.arw,.cr2,.nef, etc.) - Video: Frame extraction for common video formats
- Documents: High-quality PDF, EPUB, MOBI, and AZW3 thumbnails using pypdfium2 and PyMuPDF
- Music: Album art extraction and rich metadata display using mutagen
- Images: Fast previews, including RAW support (
- Interactive File Cards: Hover effects, scrolling text for long filenames, selection management, and visually aligned rounded thumbnail cards
- Pagination: Efficient handling of large result sets with 100 groups per page and clickable page indicator dropdown
- Drag & Drop: Drag and drop folders onto the results area for easy folder selection; remove folders with the inline
x, Delete, Backspace, or context menu - Real-Time Progress: Centered progress indicator with phase detail and percentage; adaptive update intervals for large scans (prefilter, pHash index, and hash phases)
- Quick Selection Strategies:
- Keep Newest: Keeps the most recently modified file
- Keep Oldest: Keeps the oldest file by modification time
- Keep Best: Keeps the highest resolution image (exact width Γ height); if multiple match, keeps the largest file size; marks sidecars and non-image files in the group for deletion
- Keep Smallest: Keeps the highest resolution image; if multiple match, keeps the smallest file size
- Keep RAW: Prefers RAW files over JPEG when both exist in the same group
- Quick Actions: Delete Selected, Clear Selection (with scope: Current Page or All Pages)
- Footer quick-action bar scrolls horizontally on narrow windows and stays hidden during scans
- Selected quick-selection strategy remains highlighted after use
- Multi-Algorithm Perceptual Hashing:
- Combines four hash algorithms (average, perceptual, difference, wavelet) with parallel calculation
- Uses Hamming distance comparison with voting mechanism (requires 3/4 algorithms to agree)
- Two-phase filtering: quick filter with average_hash, then detailed multi-algorithm comparison
- Detects similar images and videos even if they're slightly modified, resized, or have different compression
- Hybrid CPU Optimization: Automatically detects and optimizes for hybrid CPU architectures (Intel 12th/13th gen, AMD Ryzen)
- Dynamically adjusts worker threads based on P-cores and E-cores
- Optimized thread pool sizes for I/O-intensive and CPU-intensive tasks
- File Type Grouping: Organize duplicates by file type
- Multiple Sorting Options: Sort by count, size, name, or date (ascending/descending)
- Scope Control: Apply actions to current page or all pages
- Safe Deletion: Uses
send2trashto move files to recycle bin/trash- Batch recycle-bin operations improve delete speed for large selections
- Deleted files are removed from memory and thumbnail caches
- Persistent Cache:
- Hash Cache: Stores calculated hashes (p-hash and MD5)
- Thumbnail Cache: Offloads expensive document/video/audio-art thumbnail generation to a local database (
thumbnails.db) while keeping regular image thumbnails memory-only - Cache persists across sessions for costly media previews without storing thumbnails for every image
- Automatic cache cleanup removes stale entries and prunes formats that are no longer persistently cached
- Python 3.8+
- Windows 10/11 (macOS: run from source code only, executable build not currently supported)
-
Clone the repository:
git clone https://github.com/markyip/CloneWiper.git cd CloneWiper -
Install dependencies:
pip install -r requirements.txt
For full feature support, install optional dependencies:
# Video thumbnails
pip install opencv-python>=4.8.0
# PDF/EPUB/MOBI/AZW3 thumbnails
pip install PyMuPDF>=1.23.0
pip install pypdfium2>=0.20.0
# Music metadata and album art
pip install mutagen>=1.47.0# Using launch script
launch.bat
# Or directly
python main.pyBy default the app stays quiet on the console. To enable detailed engine and UI debug logs:
set CLONEWIPER_DEBUG=1
python main.py(On PowerShell: $env:CLONEWIPER_DEBUG=1 then python main.py.)
Note: macOS executable build is currently not supported. You can run from source code:
# Install dependencies
pip3 install -r requirements.txt
# Run directly
python3 main.pyCloneWiper offers five hash modes for different use cases:
-
MD5 Only (Fastest)
- Best for: Finding exact duplicate files
- Uses: MD5 checksum comparison
- Pros: Very fast, low CPU usage
- Cons: Only detects identical files (byte-for-byte)
-
Single Perceptual Hash (Balanced)
- Best for: Finding visually similar images with moderate accuracy
- Uses: phash algorithm
- Pros: Faster than multi-algorithm, detects resized/compressed images
- Cons: Less accurate than multi-algorithm mode
-
Multi-Algorithm Perceptual Hash (Most Accurate β Default)
- Best for: Finding visually similar images with highest accuracy without ORB overhead
- Uses: Four algorithms (average, perceptual, difference, wavelet) with voting
- Pros: Highest hash-only accuracy, tier-1 pHash screening speeds up large libraries
- Cons: Slower than single-hash or MD5 modes
-
Single P-Hash + ORB (Accurate with verification)
- Best for: Image libraries where false positives must be minimized
- Uses: phash plus ORB feature matching on candidate pairs
- Pros: Strong visual verification on top of perceptual hashing
- Cons: Slower than hash-only modes
-
Multi-Algo P-Hash + ORB (Maximum accuracy)
- Best for: Critical deduplication where accuracy matters more than speed
- Uses: Multi-algorithm voting plus ORB verification
- Pros: Strictest matching
- Cons: Slowest mode
Recommendation: Use Multi-Algorithm Perceptual Hash for most cases. Use an ORB mode when you need extra verification on near-duplicate images.
-
Install PyInstaller:
pip install pyinstaller
-
Run the build script:
build_windows.bat
This build script will:
- Check and install PyInstaller if needed
- Build an optimized executable with all features
- Exclude unnecessary modules to minimize file size
Notes:
- Python 3.12+: The script must not exclude
distutils(PyInstaller 6βsdistutilshook conflicts with--exclude-module=distutils). The providedbuild_windows.batfollows this. - Application icon: the build bundles
icons\favicon.icointo anicons/folder inside the executable so the taskbar and title bar show the correct icon. - If your executable is larger than expected (>300MB), create a clean virtual environment with only the dependencies you need before building.
Or manually:
pyinstaller --onefile --windowed --icon=favicon.ico --name=CloneWiper main.py
The executable will be in
dist/CloneWiper.exe
Note: macOS executable build is currently not supported. Please run from source code using python3 main.py.
CloneWiper/
βββ core/
β βββ __init__.py
β βββ engine.py # Core scanning and hashing engine
β βββ thumbnail_cache.py # Persistent SQLite thumbnail cache
βββ icons/
β βββ favicon.ico # Multi-size application icon (Windows)
β βββ README.md # Icon resources documentation
βββ main.py # Application entry point
βββ qt_app.py # PySide6 UI implementation
βββ verify_thumbnail_cache.py # Optional utility to inspect thumbnail cache
βββ requirements.txt # Python dependencies
βββ favicon.ico # Application icon (Windows)
βββ launch.bat # Windows launch script
βββ build_windows.bat # Windows PyInstaller build script
βββ RELEASE_NOTES_v1.1.md # Release notes for v1.1
βββ RELEASE_NOTES_v1.2.md # Release notes for v1.2
βββ RELEASE_NOTES_v1.3.md # Release notes for v1.3
βββ README.md # This file
βββ LICENSE # License file
See RELEASE_NOTES_v1.3.md for the latest changes. Older notes live in RELEASE_NOTES_v1.2.md, RELEASE_NOTES_v1.1.md, and on the GitHub Releases page.
# Add tests when available
python -m pytestThis project follows PEP 8 style guidelines.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- PySide6 - Qt for Python
- Pillow - Image processing
- ImageHash - Perceptual hashing
- PyMuPDF - PDF/EPUB rendering
- pypdfium2 - High-quality PDF rendering
- rawpy - RAW image processing
- OpenCV - Video processing
- mutagen - Audio metadata
- psutil - CPU architecture detection
- Material Design 3 - Design guidelines
For issues, questions, or suggestions, please open an issue on GitHub.