Skip to content

hobblewash/document-anonymizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document Anonymizer

A complete local-only desktop application for anonymizing sensitive information in Word, Excel, and PDF documents.

✅ No Admin Rights Required!

This application installs and runs completely in your user directory. No administrator privileges needed at any point.

Features

  • Multiple File Formats: Supports .docx, .xlsx, and .pdf
  • Two Modes:
    • Anonymize: Replace entities with generic tokens like [NAME], [EMAIL]
    • Pseudonymize: Replace with consistent fake data (same input always gets same fake output)
  • Layout Preservation: Output PDFs maintain original layout, tables, images, headers/footers
  • Interactive Review: Review and edit detected entities before processing
  • Batch Processing: Process entire folders at once
  • 100% Offline: No network calls, all processing local
  • Cross-Platform: Works on Windows, macOS, and Linux

Entity Detection

The app detects and can replace:

  • Person names (via NER)
  • Organization names (via NER)
  • Locations and addresses (via NER + regex)
  • UK postcodes
  • Email addresses
  • URLs
  • Phone numbers (international formats)
  • Credit card numbers (with Luhn validation)
  • IBAN bank account numbers
  • UK National Insurance numbers

Quick Start (Windows)

🚀 Without Admin Rights (Recommended)

For users without administrator access:

  1. Double-click setup_no_admin.bat - Installs everything to your user folder
  2. Wait 2-3 minutes for installation
  3. Double-click run.bat - Launch the application

All-in-one option: Double-click START_HERE.bat for guided setup

📦 Batch Files Available

  • setup_no_admin.bat - Install without admin rights (recommended)
  • run.bat - Launch the application
  • generate_samples.bat - Create test files
  • build.bat - Create standalone .exe
  • clean_and_setup.bat - Fresh reinstall

Need help? See INSTALL_NO_ADMIN.md for detailed instructions.

That's it! The application will open and you can start anonymizing documents.

Installation

From Source

  1. Requirements: Python 3.11, 3.12, or 3.13

  2. Clone/Download this repository

  3. Create virtual environment:

python -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Download spaCy model (required for NER):
python -m spacy download en_core_web_sm
  1. Run the application:
python app.py

Usage

  1. Select Input: Choose a single file or folder containing documents
  2. Select Output Folder: Where anonymized PDFs will be saved
  3. Choose Mode: Anonymize (tokens) or Pseudonymize (fake data)
  4. Options:
    • Enable "Preview before saving" to review entities
    • Enable "Export mapping JSON" to save a mapping file
  5. Click Start: Process your documents
  6. Review Entities (if preview enabled):
    • Review detected entities in the table
    • Edit, accept, or reject each entity
    • Apply changes to generate output

Building Standalone Executables

Windows

pyinstaller --name="DocumentAnonymizer" ^
  --windowed ^
  --onefile ^
  --add-data="venv/Lib/site-packages/spacy/data;spacy/data" ^
  --hidden-import=spacy ^
  --hidden-import=en_core_web_sm ^
  app.py

macOS

pyinstaller --name="DocumentAnonymizer" \
  --windowed \
  --onefile \
  --add-data="venv/lib/python3.11/site-packages/spacy/data:spacy/data" \
  --hidden-import=spacy \
  --hidden-import=en_core_web_sm \
  app.py

Linux

pyinstaller --name="DocumentAnonymizer" \
  --onefile \
  --add-data="venv/lib/python3.11/site-packages/spacy/data:spacy/data" \
  --hidden-import=spacy \
  --hidden-import=en_core_web_sm \
  app.py

The executable will be in the dist/ folder.

Project Structure

anonymizer_app/
├── app.py                 # Entry point
├── gui/
│   ├── __init__.py
│   ├── main_window.py     # Main window UI
│   ├── models.py          # Qt table models
│   └── views.py           # Custom widgets
├── core/
│   ├── __init__.py
│   ├── detector.py        # Entity detection
│   ├── mapping.py         # Anonymize/pseudonymize mapping
│   ├── replacer.py        # Replacement engine
│   ├── docx_handler.py    # DOCX processor
│   ├── xlsx_handler.py    # XLSX processor
│   └── pdf_handler.py     # PDF processor
├── tests/
│   ├── __init__.py
│   ├── test_detector.py
│   ├── test_mapping.py
│   └── test_handlers.py
├── samples/               # Sample test files
├── requirements.txt
├── pyproject.toml
└── README.md

Testing

Run tests with pytest:

pytest tests/ -v

Security & Privacy

  • No Network Calls: All processing happens locally
  • No Telemetry: No data leaves your machine
  • Temporary Files: Cleaned up automatically
  • Log Safety: Logs never contain full sensitive values

License

MIT License - See LICENSE file for details

About

document anonymizer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors