Skip to content

0xfaisl/mirsad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIRSAD · مرصاد

An adversarial, explainable name-screening engine that matches sanctioned names across Arabic and Latin scripts, survives deliberate evasion, and explains every match.

Python 3.12 FastAPI React + Vite Status: Pass 1

The MIRSAD screening console screening an Arabic name, showing the Arabic to Latin name pairing, a calibrated confidence gauge, and the per-signal match explanation

Type a name in Arabic or Latin. MIRSAD screens it against the open OFAC SDN and UN Consolidated sanctions lists, returns ranked candidate matches with a calibrated confidence score, and shows why each one matched, signal by signal, well enough for a compliance analyst and a regulator.


Why this exists

Sanctions screening is a never-finished problem in regulated finance, and it is brutal on two fronts:

  • More than 90% of screening alerts are false positives. Every one is a human analyst-hour. Every miss is a sanctioned party slipping through and a regulatory penalty.
  • Arabic makes it dramatically harder. A single name like محمد romanizes to Mohammed, Muhammad, Mohamad, Mohamed, Muhammed. Particles (Al, Bin, Abu) shift word order. خ, ع, ق have no clean Latin equivalent. English-first tools underperform here, which is exactly the gap.

MIRSAD is the screening core built for that gap, and stress-tested against the thing real criminals actually do: deliberately misspell their names to evade screening.

What it does

Screen a name and get back, for each candidate:

  • a calibrated match score and confidence band (strong / possible / weak)
  • the cross-script name pairing (Arabic ⟷ Latin) when an original-script name is on file
  • a per-signal explanation (which features drove the score, positive and negative) plus a plain-language reason
  • accept / dismiss for human-in-the-loop review, because the tool flags and a person decides

When the matched entity has no Arabic name on file (OFAC records are Latin-only), the dossier says so plainly rather than faking it:

The dossier for a Latin-only entity, showing the name as a clean hero with an honest note that no original-script name is on file

Results, measured on real data and reported honestly

On the full lists (8,192 individuals, 21,799 aliases including 332 Arabic-script names), evaluated on a held-out, entity-disjoint test split (no entity's aliases straddle train and test):

Metric Learned fusion Fuzzy baseline Read
AUC 0.806 0.770 learned ranks better overall
Adversarial recall @ rank-1 (Tiers 0 to 3) 0.95 to 0.99 0.90 to 0.94 learned wins under every evasion tier
Recall @ ≤1% false-positive rate 0.450 0.490 the baseline edges it here
Expected Calibration Error 0.021 0.105 before isotonic scores are well-calibrated
Latency per screen 25 ms median real-time on a single machine
Blocking recall 1.000 nothing lost before scoring

The honest headline is two-sided, and that is the point. The learned model wins on ranking quality (AUC) and on ranking the true entity first under every evasion tier, but the simple fuzzy baseline edges it at the strict ≤1% false-positive operating point. That gap is a diagnosis, not an embarrassment: it shows exactly where the Pass-2 deep encoder earns its keep (the extreme low-false-positive tail). For a fiduciary-minded reviewer, naming that honestly is more credible than a single inflated number.

The benchmark view: a grouped bar chart of adversarial recall by evasion tier, learned versus baseline, with the per-tier lift annotated, plus AUC, recall at one percent false positives, calibration, and latency

How it works

flowchart LR
    Q["Query name<br/>AR or EN"] --> N["Normalize<br/>diacritics, particles,<br/>cross-script"]
    N --> B["Block<br/>trigram candidate<br/>generation"]
    B --> F["Featurize<br/>edit-distance, phonetic,<br/>token, structural"]
    F --> S["Learned fusion<br/>logistic regression<br/>+ calibration"]
    S --> X["Explain<br/>per-signal<br/>contributions"]
    X --> R["Ranked matches<br/>+ reasons"]
Loading

Three layers, each a thin, single-purpose module:

  1. Normalization and candidate generation. Ingest the sanctions XML, normalize Arabic and Latin names (strip diacritics, unify alef and hamza and taa-marbuta, handle Al/Bin/Abu particles), then a trigram inverted index returns a small candidate set so we never compare against all 21,799 aliases.
  2. Hybrid learned scoring. Eight deterministic features (Jaro-Winkler, Levenshtein, token-set and token-sort, an Arabic-aware phonetic score, particle-aware overlap, length, first-token) feed a logistic-regression fusion that learns to weight them. Trained on real same-entity alias pairs versus block-together cross-entity pairs.
  3. Explainability. Every match returns its per-signal contributions (the deterministic signals plus the learned weights) as a human-readable rationale, so an analyst can see and defend exactly why a name was flagged.

The adversarial benchmark

Most name-matching demos test on clean variants. Real criminals obfuscate. MIRSAD's benchmark generates evasive variants of held-out sanctioned names across four tiers, each grounded in documented evasion typologies (Wolfsberg, FATF, OFAC), and reports recall under that pressure versus a baseline:

  • Tier 0, clean: a legitimate alternate transliteration (Mohammed to Muhammad)
  • Tier 1, typo: one or two character edits
  • Tier 2, structural: particle drops, token reordering, dropped middle name
  • Tier 3, adversarial: homoglyph substitution, cross-script mixing, vowel manipulation, name splitting

Ethics. The variant generator is dual-use (effectively a sanctions-evasion cookbook), so it is internal-only: used by the evaluation harness, never exposed through the API or UI, never shipped as a runnable tool. This README publishes the methodology and the numbers; the generator itself is withheld deliberately.

Honest limitations

  • The linear fusion over-weights shared core tokens, so common name components (a lone "Mohammed") trigger false positives. This is visible in the signal panel, not hidden, and it is what the human-in-the-loop review is for.
  • The phonetic feature is English-biased and contributes nothing on Arabic script (metaphone returns empty for Arabic). The real cross-script bridge is the Pass-2 Siamese encoder.
  • The model trails the baseline at the strict ≤1% false-positive operating point (see above).
  • Evaluation uses synthetic adversarial variants for robustness direction and a held-out slice of genuine aliases for real performance. Both are reported.

Run it

# 1. Backend (Python 3.12)
cd mirsad
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
python data/download.py                  # fetch the open OFAC + UN lists
PYTHONPATH=. python scripts/serve_api.py  # screening API on :8000

# 2. Console (second terminal)
cd ui
npm install
npm run dev                              # http://localhost:5173

Reproduce the evaluation and regenerate the charts:

PYTHONPATH=. python scripts/run_eval.py

Note: the adversarial-tier evaluation calls the internal-only variant generator, which is withheld by design (see The adversarial benchmark). That portion runs only where the local component is present; the honest-alias metrics and every figure driven by the open lists reproduce directly.

Project structure

mirsad/
├── data/         download scripts + cached sanctions XML (raw data gitignored)
├── ingest/       OFAC + UN XML parsers -> normalized Entity records
├── normalize/    Arabic/Latin normalization (extracted as a reusable skill)
├── match/        features, blocking, learned fusion scorer, SHAP explanation, screen()
├── adversarial/  Tier 0 to 3 red-team generator (internal-only)
├── eval/         recall@FP, adversarial lift, calibration, latency, charts
├── api/          FastAPI /screen + /benchmark
├── ui/           React + Vite + TS + Tailwind "Watchtower" console
├── docs/         architecture, specs, plans, screenshots
├── PRODUCT.md    design context (users, tone)
└── DESIGN.md     the Watchtower design system

Tech stack

Python 3.12, lxml, rapidfuzz, jellyfish, scikit-learn, shap, pandas, matplotlib, FastAPI, uvicorn, pytest. Front-end: React, Vite, TypeScript, Tailwind, shadcn/ui, Radix.

Data and license

All data is open and public (OFAC SDN, UN Consolidated). No bank data, no private feeds, no NDA. This is a portfolio project built to demonstrate the screening problem class, not a deployed compliance system. The code is released under the MIT License.

Where the deeper knowledge lives

Concepts explained from first principles, the architecture decisions (ADRs), the full engineering journal, and a self-study curriculum live in the Obsidian vault: ../The Dome/MIRSAD/MIRSAD-MOC.md. See also docs/ARCHITECTURE.md and the design system in DESIGN.md. Inherits workspace standards from ../CLAUDE.md.

About

Adversarial, explainable Arabic/Latin sanctions name-screening: cross-script matching, calibrated SHAP-explained confidence, and an adversarial-evasion benchmark. Open data only.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors