Find what your notes are really saying twice.
dedux detects semantically overlapping notes in your knowledge base -- Obsidian vaults, Zettelkasten, or any markdown directory. No embeddings, no API keys, no cloud. Just pure local text analysis.
While hash-based deduplication tools find exact duplicates, dedux finds semantic overlap: notes covering the same topic with different words, from different angles, without you realizing.
dedux uses character n-gram shingling and word n-gram analysis with Jaccard similarity to detect when two files discuss the same concepts, even with different wording.
- Extract key phrases from each markdown file (headings, body, tags)
- Compute similarity between all file pairs
- Flag pairs above your threshold
- Show overlapping phrases so you can decide whether to merge
pip install deduxOr install from source:
git clone https://github.com/izag8216/dedux.git
cd dedux
pip install -e .# Find overlapping notes (default 50% threshold)
dedux scan ./vault/
# Only high-overlap pairs
dedux scan ./vault/ --threshold 0.7
# Output as JSON
dedux scan ./vault/ --format json -o results.json
# Output as CSV
dedux scan ./vault/ --format csv -o overlaps.csvdedux diff note1.md note2.mddedux export ./vault/ --format json -o results.json
dedux export ./vault/ --format markdown -o report.md| Format | Flag | Use Case |
|---|---|---|
| Text | --format text |
Terminal review (default) |
| JSON | --format json |
Programmatic processing |
| CSV | --format csv |
Spreadsheet analysis |
| Markdown | --format markdown |
Wikis, documentation |
dedux scan results
----------------------------------------
Files scanned: 42
Threshold: 50%
Overlaps found: 3
Time: 0.34s
Top overlapping pairs:
1. [78.0%] ████████████████░░░░
python-notes.md
python-overview.md
2. [62.0%] ████████████░░░░░░░░
cli-design.md
terminal-tools.md
| dedux | md-dedupe | Embeddings |
|---|---|---|
| Semantic overlap | Exact hash match | Semantic similarity |
| No API key | No API key | Requires API |
| Zero dependencies | Zero dependencies | Heavy ML stack |
| Offline | Offline | Often online |
| Markdown-aware | Any file | Any text |
- Python 3.10+ -- No external dependencies
- argparse -- CLI interface
- difflib -- Sequence matching
- collections -- Frequency analysis
# Install dev dependencies
pip install -e .
pip install pytest
# Run tests
pytest tests/ -v
# Run CLI directly
python -m dedux.cli scan ./tests/fixtures/MIT License -- see LICENSE for details.