CloneReaper Prime is an advanced, cross-platform duplicate file finder and manager written in Python. It's designed to be both powerful for automated tasks and safe and easy to use through its interactive menu. Find duplicate files, recover precious disk space, and manage your data with confidence. It is a direct descendant of CloneReaper but rewritten to make managing your data and logging changes much easier.
- High-Performance Scanning: Uses parallel processing (
multiprocessing) to hash files and find duplicates quickly, especially on multi-core systems. - Email Notifications: Configure SMTP settings to have scan reports automatically emailed to you upon completion. Guided setup for first time users.
- Efficient Two-Stage Scan: First identifies files of the same size, then only hashes those potential duplicates, saving significant time.
- Safety First Approach:
- Dry Run Mode: See what changes would be made without touching a single file.
- Safe Quarantine: Move duplicates to a quarantine folder for review instead of deleting them permanently.
- Multiple Confirmations: Requires double or triple confirmation for permanent deletion.
- Intelligent Hardlink Support:
- Correctly detects hardlinked files on both Windows (NTFS) and Linux/macOS.
- Can replace duplicate files with hardlinks to save space without altering your directory structure—perfect for media libraries!
- Flexible & User-Friendly:
- Interactive Menu: An easy-to-navigate menu system for configuration and execution.
- Command-Line Mode: Supports arguments for automation and scripting (e.g., weekly
cronjobs or scheduled tasks).
- Persistent Configuration: Automatically saves your settings (paths, email config, etc.) to a
clonereaper_config.jsonfile so you don't have to re-enter them every time. - Comprehensive Reporting:
- Generate detailed reports of duplicates and hardlinks in JSON, CSV, or plain TXT format.
- Import a previous JSON report to perform actions later, separating the scanning and cleaning phases.
CloneReaper Prime is designed to be simple to set up.
-
Prerequisites:
- Python 3.8 or newer.
-
Clone the repository:
git clone https://github.com/medy17/Clone-Reaper-Prime.git cd Clone-Reaper-Prime -
Install dependencies: The only external dependency is
pywin32for Windows-specific features. The includedrequirements.txtfile handles this automatically.pip install -r requirements.txt
You can run CloneReaper Prime in two modes: Interactive (recommended for first-time use) or Non-Interactive (for automation).
Simply run the script without any arguments to launch the full menu-driven interface.
python CloneReaperPrimeProd.pyYou will be guided through a series of menus to:
- Configure Scan Settings: Set the target directory, minimum file size, and hashing algorithm.
- Configure Actions: Choose what to do with duplicates (Quarantine, Delete, Link) and enable Dry Run mode.
- Configure Reporting & Email: Set up report generation and email notifications.
- Run the Scan: Execute the scan and review the results.
- Perform Actions: If not in Dry Run mode, confirm and perform the chosen actions on the found duplicates.
CloneReaper Prime can be run from the command line, making it perfect for scheduled tasks.
Example: Scan a directory, permanently delete duplicates, and generate a JSON report.
python CloneReaperPrimeProd.py /path/to/your/media --non-interactive --action delete --report-format jsonNote: When using
--non-interactive, the script will not ask for confirmation. Use with caution!
The first time you exit the interactive menu, CloneReaper Prime will create a clonereaper_config.json file in the same directory. This file stores all your settings, so they are automatically loaded the next time you start the script.
You can edit this file directly if you prefer, but it's generally safer to manage settings through the interactive menu.
Example clonereaper_config.json:
{
"directory": "D:/Jellyfin/Movies",
"min_size": 1,
"hash_algo": "sha256",
"action_mode": "quarantine",
"dry_run": false,
"email_config": {
"enabled": true,
"server": "smtp.gmail.com",
"port": 587,
"user": "your-email@gmail.com",
"password": "your-app-password",
"recipient": "your-email@gmail.com"
}
}This is a powerful tool that can delete a large number of files. Please follow these best practices:
- Always run in
Dry Runmode first. This will show you exactly which files are identified as duplicates without making any changes. - Use the
Quarantineaction instead ofPermanent Deletefor your first few runs. This moves files to a safe folder, allowing you to verify them and recover any that were incorrectly identified. - Double-check your target directory. Make sure you are not scanning a system directory or a folder synced with a cloud service that might have its own versioning system.
- Backup your data. Before running any large-scale file operation, ensure you have a reliable backup.
Contributions are welcome! If you have an idea for a new feature or have found a bug, please feel free to:
- Open an issue to discuss the change.
- Fork the repository and submit a pull request.
This project is licensed under the CC BY-NC-SA License. See the LICENSE file for details.