PRIM-006
MAILSIEVE is a command-line tool for discovering publicly listed business email addresses from domains, with an emphasis on rate-limiting, resumability, and evidence logging.
It is designed for research, compliance checks, and operational workflows where polite crawling and auditability matter.
- Domain-based email discovery
- Safe resume via
processed.txt - Polite rate-limiting and concurrency controls
- CSV output (append-only)
- Evidence logging (GDPR-trimmed)
- Parallel execution with controlled fan-out
Requires Node.js ≥ 18.
git clone https://github.com/midiakiasat/MAILSIEVE.git
cd MAILSIEVE
npm install
chmod +x batch-run.shCreate a file named domains.txt:
example.com
https://anotherdomain.it
www.somedomain.orgOne domain per line. URLs are normalized automatically.
./batch-run.shMAILSIEVE will:
- skip domains already listed in
processed.txt - append results to
results.csv - log trimmed evidence to
logs/evidence.jsonl
wc -l domains.txt processed.txt results.csv
tail -n 5 results.csvYou can tune behavior without editing code:
POLITE_CONCURRENCY=3 ./batch-run.shRATE_MS=1500 TIMEOUT_MS=20000 POLITE_CONCURRENCY=1 ./batch-run.shQUIET_ENV=0 ./batch-run.shTo start fresh:
rm -f processed.txt results.csv
rm -rf .cache/http
rm -f logs/evidence.jsonlThen rerun:
./batch-run.sh| File | Purpose |
|---|---|
results.csv |
Discovered emails |
processed.txt |
Domains already processed |
logs/evidence.jsonl |
Minimal evidence trail |
MAILSIEVE only processes publicly available information.
You are responsible for ensuring that your usage complies with:
- local laws and regulations
- website terms of service
- data protection frameworks (e.g. GDPR)
This tool is provided as-is, without warranty.
See LICENSE.