MIRSAD · مرصاد

An adversarial, explainable name-screening engine that matches sanctioned names across Arabic and Latin scripts, survives deliberate evasion, and explains every match.

Type a name in Arabic or Latin. MIRSAD screens it against the open OFAC SDN and UN Consolidated sanctions lists, returns ranked candidate matches with a calibrated confidence score, and shows why each one matched, signal by signal, well enough for a compliance analyst and a regulator.

Why this exists

Sanctions screening is a never-finished problem in regulated finance, and it is brutal on two fronts:

More than 90% of screening alerts are false positives. Every one is a human analyst-hour. Every miss is a sanctioned party slipping through and a regulatory penalty.
Arabic makes it dramatically harder. A single name like محمد romanizes to Mohammed, Muhammad, Mohamad, Mohamed, Muhammed. Particles (Al, Bin, Abu) shift word order. خ, ع, ق have no clean Latin equivalent. English-first tools underperform here, which is exactly the gap.

MIRSAD is the screening core built for that gap, and stress-tested against the thing real criminals actually do: deliberately misspell their names to evade screening.

What it does

Screen a name and get back, for each candidate:

a calibrated match score and confidence band (strong / possible / weak)
the cross-script name pairing (Arabic ⟷ Latin) when an original-script name is on file
a per-signal explanation (which features drove the score, positive and negative) plus a plain-language reason
accept / dismiss for human-in-the-loop review, because the tool flags and a person decides

When the matched entity has no Arabic name on file (OFAC records are Latin-only), the dossier says so plainly rather than faking it:

Results, measured on real data and reported honestly

On the full lists (8,192 individuals, 21,799 aliases including 332 Arabic-script names), evaluated on a held-out, entity-disjoint test split (no entity's aliases straddle train and test):

Metric	Learned fusion	Fuzzy baseline	Read
AUC	0.806	0.770	learned ranks better overall
Adversarial recall @ rank-1 (Tiers 0 to 3)	0.95 to 0.99	0.90 to 0.94	learned wins under every evasion tier
Recall @ ≤1% false-positive rate	0.450	0.490	the baseline edges it here
Expected Calibration Error	0.021	0.105 before isotonic	scores are well-calibrated
Latency per screen	25 ms median		real-time on a single machine
Blocking recall	1.000		nothing lost before scoring

The honest headline is two-sided, and that is the point. The learned model wins on ranking quality (AUC) and on ranking the true entity first under every evasion tier, but the simple fuzzy baseline edges it at the strict ≤1% false-positive operating point. That gap is a diagnosis, not an embarrassment: it shows exactly where the Pass-2 deep encoder earns its keep (the extreme low-false-positive tail). For a fiduciary-minded reviewer, naming that honestly is more credible than a single inflated number.

How it works

flowchart LR
    Q["Query name<br/>AR or EN"] --> N["Normalize<br/>diacritics, particles,<br/>cross-script"]
    N --> B["Block<br/>trigram candidate<br/>generation"]
    B --> F["Featurize<br/>edit-distance, phonetic,<br/>token, structural"]
    F --> S["Learned fusion<br/>logistic regression<br/>+ calibration"]
    S --> X["Explain<br/>per-signal<br/>contributions"]
    X --> R["Ranked matches<br/>+ reasons"]

Three layers, each a thin, single-purpose module:

Normalization and candidate generation. Ingest the sanctions XML, normalize Arabic and Latin names (strip diacritics, unify alef and hamza and taa-marbuta, handle Al/Bin/Abu particles), then a trigram inverted index returns a small candidate set so we never compare against all 21,799 aliases.
Hybrid learned scoring. Eight deterministic features (Jaro-Winkler, Levenshtein, token-set and token-sort, an Arabic-aware phonetic score, particle-aware overlap, length, first-token) feed a logistic-regression fusion that learns to weight them. Trained on real same-entity alias pairs versus block-together cross-entity pairs.
Explainability. Every match returns its per-signal contributions (the deterministic signals plus the learned weights) as a human-readable rationale, so an analyst can see and defend exactly why a name was flagged.

The adversarial benchmark

Most name-matching demos test on clean variants. Real criminals obfuscate. MIRSAD's benchmark generates evasive variants of held-out sanctioned names across four tiers, each grounded in documented evasion typologies (Wolfsberg, FATF, OFAC), and reports recall under that pressure versus a baseline:

Tier 0, clean: a legitimate alternate transliteration (Mohammed to Muhammad)
Tier 1, typo: one or two character edits
Tier 2, structural: particle drops, token reordering, dropped middle name
Tier 3, adversarial: homoglyph substitution, cross-script mixing, vowel manipulation, name splitting

Ethics. The variant generator is dual-use (effectively a sanctions-evasion cookbook), so it is internal-only: used by the evaluation harness, never exposed through the API or UI, never shipped as a runnable tool. This README publishes the methodology and the numbers; the generator itself is withheld deliberately.

Honest limitations

The linear fusion over-weights shared core tokens, so common name components (a lone "Mohammed") trigger false positives. This is visible in the signal panel, not hidden, and it is what the human-in-the-loop review is for.
The phonetic feature is English-biased and contributes nothing on Arabic script (metaphone returns empty for Arabic). The real cross-script bridge is the Pass-2 Siamese encoder.
The model trails the baseline at the strict ≤1% false-positive operating point (see above).
Evaluation uses synthetic adversarial variants for robustness direction and a held-out slice of genuine aliases for real performance. Both are reported.

Run it

# 1. Backend (Python 3.12)
cd mirsad
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
python data/download.py                  # fetch the open OFAC + UN lists
PYTHONPATH=. python scripts/serve_api.py  # screening API on :8000

# 2. Console (second terminal)
cd ui
npm install
npm run dev                              # http://localhost:5173

Reproduce the evaluation and regenerate the charts:

PYTHONPATH=. python scripts/run_eval.py

Note: the adversarial-tier evaluation calls the internal-only variant generator, which is withheld by design (see The adversarial benchmark). That portion runs only where the local component is present; the honest-alias metrics and every figure driven by the open lists reproduce directly.

Project structure

mirsad/
├── data/         download scripts + cached sanctions XML (raw data gitignored)
├── ingest/       OFAC + UN XML parsers -> normalized Entity records
├── normalize/    Arabic/Latin normalization (extracted as a reusable skill)
├── match/        features, blocking, learned fusion scorer, SHAP explanation, screen()
├── adversarial/  Tier 0 to 3 red-team generator (internal-only)
├── eval/         recall@FP, adversarial lift, calibration, latency, charts
├── api/          FastAPI /screen + /benchmark
├── ui/           React + Vite + TS + Tailwind "Watchtower" console
├── docs/         architecture, specs, plans, screenshots
├── PRODUCT.md    design context (users, tone)
└── DESIGN.md     the Watchtower design system

Tech stack

Python 3.12, lxml, rapidfuzz, jellyfish, scikit-learn, shap, pandas, matplotlib, FastAPI, uvicorn, pytest. Front-end: React, Vite, TypeScript, Tailwind, shadcn/ui, Radix.

Data and license

All data is open and public (OFAC SDN, UN Consolidated). No bank data, no private feeds, no NDA. This is a portfolio project built to demonstrate the screening problem class, not a deployed compliance system. The code is released under the MIT License.

Where the deeper knowledge lives

Concepts explained from first principles, the architecture decisions (ADRs), the full engineering journal, and a self-study curriculum live in the Obsidian vault: ../The Dome/MIRSAD/MIRSAD-MOC.md. See also docs/ARCHITECTURE.md and the design system in DESIGN.md. Inherits workspace standards from ../CLAUDE.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIRSAD · مرصاد

Why this exists

What it does

Results, measured on real data and reported honestly

How it works

The adversarial benchmark

Honest limitations

Run it

Project structure

Tech stack

Data and license

Where the deeper knowledge lives

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
adversarial		adversarial
api		api
data		data
docs		docs
eval		eval
ingest		ingest
match		match
normalize		normalize
scripts		scripts
tests		tests
ui		ui
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
DESIGN.md		DESIGN.md
LICENSE		LICENSE
PRODUCT.md		PRODUCT.md
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

MIRSAD · مرصاد

Why this exists

What it does

Results, measured on real data and reported honestly

How it works

The adversarial benchmark

Honest limitations

Run it

Project structure

Tech stack

Data and license

Where the deeper knowledge lives

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages