A complete local-only desktop application for anonymizing sensitive information in Word, Excel, and PDF documents.
This application installs and runs completely in your user directory. No administrator privileges needed at any point.
- Multiple File Formats: Supports .docx, .xlsx, and .pdf
- Two Modes:
- Anonymize: Replace entities with generic tokens like [NAME], [EMAIL]
- Pseudonymize: Replace with consistent fake data (same input always gets same fake output)
- Layout Preservation: Output PDFs maintain original layout, tables, images, headers/footers
- Interactive Review: Review and edit detected entities before processing
- Batch Processing: Process entire folders at once
- 100% Offline: No network calls, all processing local
- Cross-Platform: Works on Windows, macOS, and Linux
The app detects and can replace:
- Person names (via NER)
- Organization names (via NER)
- Locations and addresses (via NER + regex)
- UK postcodes
- Email addresses
- URLs
- Phone numbers (international formats)
- Credit card numbers (with Luhn validation)
- IBAN bank account numbers
- UK National Insurance numbers
For users without administrator access:
- Double-click
setup_no_admin.bat- Installs everything to your user folder - Wait 2-3 minutes for installation
- Double-click
run.bat- Launch the application
All-in-one option: Double-click START_HERE.bat for guided setup
setup_no_admin.bat- Install without admin rights (recommended)run.bat- Launch the applicationgenerate_samples.bat- Create test filesbuild.bat- Create standalone .execlean_and_setup.bat- Fresh reinstall
Need help? See INSTALL_NO_ADMIN.md for detailed instructions.
That's it! The application will open and you can start anonymizing documents.
-
Requirements: Python 3.11, 3.12, or 3.13
-
Clone/Download this repository
-
Create virtual environment:
python -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate- Install dependencies:
pip install -r requirements.txt- Download spaCy model (required for NER):
python -m spacy download en_core_web_sm- Run the application:
python app.py- Select Input: Choose a single file or folder containing documents
- Select Output Folder: Where anonymized PDFs will be saved
- Choose Mode: Anonymize (tokens) or Pseudonymize (fake data)
- Options:
- Enable "Preview before saving" to review entities
- Enable "Export mapping JSON" to save a mapping file
- Click Start: Process your documents
- Review Entities (if preview enabled):
- Review detected entities in the table
- Edit, accept, or reject each entity
- Apply changes to generate output
pyinstaller --name="DocumentAnonymizer" ^
--windowed ^
--onefile ^
--add-data="venv/Lib/site-packages/spacy/data;spacy/data" ^
--hidden-import=spacy ^
--hidden-import=en_core_web_sm ^
app.pypyinstaller --name="DocumentAnonymizer" \
--windowed \
--onefile \
--add-data="venv/lib/python3.11/site-packages/spacy/data:spacy/data" \
--hidden-import=spacy \
--hidden-import=en_core_web_sm \
app.pypyinstaller --name="DocumentAnonymizer" \
--onefile \
--add-data="venv/lib/python3.11/site-packages/spacy/data:spacy/data" \
--hidden-import=spacy \
--hidden-import=en_core_web_sm \
app.pyThe executable will be in the dist/ folder.
anonymizer_app/
├── app.py # Entry point
├── gui/
│ ├── __init__.py
│ ├── main_window.py # Main window UI
│ ├── models.py # Qt table models
│ └── views.py # Custom widgets
├── core/
│ ├── __init__.py
│ ├── detector.py # Entity detection
│ ├── mapping.py # Anonymize/pseudonymize mapping
│ ├── replacer.py # Replacement engine
│ ├── docx_handler.py # DOCX processor
│ ├── xlsx_handler.py # XLSX processor
│ └── pdf_handler.py # PDF processor
├── tests/
│ ├── __init__.py
│ ├── test_detector.py
│ ├── test_mapping.py
│ └── test_handlers.py
├── samples/ # Sample test files
├── requirements.txt
├── pyproject.toml
└── README.md
Run tests with pytest:
pytest tests/ -v- No Network Calls: All processing happens locally
- No Telemetry: No data leaves your machine
- Temporary Files: Cleaned up automatically
- Log Safety: Logs never contain full sensitive values
MIT License - See LICENSE file for details