Safe, reviewable disk cleanup. Inventories the filesystem, classifies files into
KEEP / REVIEW / LIKELY_TRASH / SAFE_DELETE, and quarantines (never deletes) what the operator approves.
The defaults are paranoid:
- Dry-run by default. Nothing on disk changes until
--executeis passed. - Whitelist-first. Sensitive path fragments (
.env,.ssh/,secret,credential) and protected project roots are forced toKEEPbefore any other rule runs. - Quarantine, not delete.
--executemoves files markedSAFE_DELETEinto_quarantine/<date>/files/...alongside amanifest.json. There is no hard delete in code. Restoring is a manual copy fromdstback tosrc.
The tool prefers leaving junk on disk over deleting the wrong file. Every other design choice falls out of that asymmetry.
Four steps: scan builds an inventory (size, mtime, extension, quick-hash), classify runs the rules, audit groups results into a readable REVIEW.md, and quarantine performs the move once opted in. Read the report before running --execute; if any entry looks wrong, stop — nothing has touched disk yet.
A sample run on a developer workstation flagged 3,221 files totalling 1.34 GB — mostly stale temp directories (2,503), Office lock files like ~$report.docx (705), and 13 old installers. None were deleted; everything sat in _quarantine/ for a week before being wiped by hand.
- Config-driven via
cleanup.config.yaml(scan targets, protected roots, sensitive fragments). Ships with a cross-platform template covering~/Desktop,~/Downloads, etc. - Quick-hash dedup: SHA1 of head + tail bytes, so duplicate detection avoids reading every byte of a 4 GB video.
- Five validator modules:
dup_hash,encoding_check,installer_age,lock_files,study_material. - Every dry-run writes a
REVIEW.mdgrouped by label and source root.
The guarantees in Safety model are not configurable — they run first, every time. A protection rule beats every other rule.
git clone https://github.com/lfzds4399-cpu/cleanup-harness.git
cd cleanup-harness
pip install pyyaml
cp cleanup.config.example.yaml cleanup.config.yaml
# edit targets / protected_roots for the host machine
python run.py # dry-run: scan → classify → audit. Writes manifests/<ts>/REVIEW.md.
# review the report. If the SAFE_DELETE entries look correct:
python run.py --execute # opt-in: moves SAFE_DELETE files into _quarantine/<date>/A --restore <date> flag exists on agents/trash_quarantine.py, but opening _quarantine/<date>/manifest.json first and confirming the entries before restoring is the recommended path.
All paths support ~ and environment variables ($HOME, $LOCALAPPDATA, etc.):
targets: [~/Desktop, ~/Downloads, ~/.cache]
protected_roots: [~/.claude, ~/.ssh] # always KEEP
sensitive_fragments: ["/.env", "/.git/", secret] # substring match → KEEPSee cleanup.config.example.yaml for the full schema.
scanner.py # walks targets, emits inventory.jsonl
classifier.py # applies rules + validators, emits decisions.jsonl
run.py # orchestrator: scan → classify → audit
phase1_execute.py # opt-in quarantine executor
config.py # YAML config loader (env-aware)
agents/
safe_guard.py # whitelist enforcer (sensitive_fragments + protected_roots)
manifest_audit.py # decisions.jsonl → REVIEW.md
trash_quarantine.py # move-to-quarantine
validators/
dup_hash.py # exact-content duplicates
encoding_check.py # files with replacement chars (mojibake)
installer_age.py # old .exe / .msi / .dmg
lock_files.py # ~$ Office lock files
study_material.py # context-aware (e.g. "physics" filename keyword)
Beta. Runs daily on a developer workstation across Windows and macOS. Classification rules are tuned for a single-user directory layout — fork and adjust as needed.
PRs welcome — see CONTRIBUTING.md. Security: SECURITY.md. License: MIT (LICENSE).