Skip to content

lfzds4399-cpu/cleanup-harness

Repository files navigation

cleanup-harness

Safe, reviewable disk cleanup. Inventories the filesystem, classifies files into KEEP / REVIEW / LIKELY_TRASH / SAFE_DELETE, and quarantines (never deletes) what the operator approves.

License: MIT Python

中文 README

Safety model

The defaults are paranoid:

  • Dry-run by default. Nothing on disk changes until --execute is passed.
  • Whitelist-first. Sensitive path fragments (.env, .ssh/, secret, credential) and protected project roots are forced to KEEP before any other rule runs.
  • Quarantine, not delete. --execute moves files marked SAFE_DELETE into _quarantine/<date>/files/... alongside a manifest.json. There is no hard delete in code. Restoring is a manual copy from dst back to src.

The tool prefers leaving junk on disk over deleting the wrong file. Every other design choice falls out of that asymmetry.

Pipeline

Four steps: scan builds an inventory (size, mtime, extension, quick-hash), classify runs the rules, audit groups results into a readable REVIEW.md, and quarantine performs the move once opted in. Read the report before running --execute; if any entry looks wrong, stop — nothing has touched disk yet.

A sample run on a developer workstation flagged 3,221 files totalling 1.34 GB — mostly stale temp directories (2,503), Office lock files like ~$report.docx (705), and 13 old installers. None were deleted; everything sat in _quarantine/ for a week before being wiped by hand.

Features

  • Config-driven via cleanup.config.yaml (scan targets, protected roots, sensitive fragments). Ships with a cross-platform template covering ~/Desktop, ~/Downloads, etc.
  • Quick-hash dedup: SHA1 of head + tail bytes, so duplicate detection avoids reading every byte of a 4 GB video.
  • Five validator modules: dup_hash, encoding_check, installer_age, lock_files, study_material.
  • Every dry-run writes a REVIEW.md grouped by label and source root.

The guarantees in Safety model are not configurable — they run first, every time. A protection rule beats every other rule.

Quick start

git clone https://github.com/lfzds4399-cpu/cleanup-harness.git
cd cleanup-harness
pip install pyyaml

cp cleanup.config.example.yaml cleanup.config.yaml
# edit targets / protected_roots for the host machine

python run.py             # dry-run: scan → classify → audit. Writes manifests/<ts>/REVIEW.md.
# review the report. If the SAFE_DELETE entries look correct:
python run.py --execute   # opt-in: moves SAFE_DELETE files into _quarantine/<date>/

A --restore <date> flag exists on agents/trash_quarantine.py, but opening _quarantine/<date>/manifest.json first and confirming the entries before restoring is the recommended path.

Configuration

All paths support ~ and environment variables ($HOME, $LOCALAPPDATA, etc.):

targets:             [~/Desktop, ~/Downloads, ~/.cache]
protected_roots:     [~/.claude, ~/.ssh]           # always KEEP
sensitive_fragments: ["/.env", "/.git/", secret]   # substring match → KEEP

See cleanup.config.example.yaml for the full schema.

Project layout

scanner.py           # walks targets, emits inventory.jsonl
classifier.py        # applies rules + validators, emits decisions.jsonl
run.py               # orchestrator: scan → classify → audit
phase1_execute.py    # opt-in quarantine executor
config.py            # YAML config loader (env-aware)
agents/
  safe_guard.py       # whitelist enforcer (sensitive_fragments + protected_roots)
  manifest_audit.py   # decisions.jsonl → REVIEW.md
  trash_quarantine.py # move-to-quarantine
validators/
  dup_hash.py         # exact-content duplicates
  encoding_check.py   # files with replacement chars (mojibake)
  installer_age.py    # old .exe / .msi / .dmg
  lock_files.py       # ~$ Office lock files
  study_material.py   # context-aware (e.g. "physics" filename keyword)

Status

Beta. Runs daily on a developer workstation across Windows and macOS. Classification rules are tuned for a single-user directory layout — fork and adjust as needed.

Contributing & License

PRs welcome — see CONTRIBUTING.md. Security: SECURITY.md. License: MIT (LICENSE).

About

Reviewable disk cleanup pipeline that quarantines files instead of deleting them.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages