A real-time Anti-Money-Laundering (AML) data engineering pipeline on AWS, with end-to-end streaming detection and a live operational dashboard.
SENTINEL is a production-style anti-money-laundering pipeline built entirely on AWS serverless infrastructure. Synthetic banking transactions are generated on a schedule, streamed through Amazon Kinesis, evaluated against fourteen distinct AML detection rules in real time, and persisted as a partitioned data lake in S3. A FastAPI backend exposes the cleaned data through a REST interface, and a React/TypeScript dashboard renders it as a live operational console for compliance analysts.
The system was built as a Final Year Project to demonstrate a complete, cloud-native data engineering stack: ingestion, streaming, detection, storage, cataloguing, querying, and visualization, with realistic data quality and governance considerations.
The pipeline is organized into three logical layers:
Scheduling — Amazon EventBridge triggers two Lambdas on independent cadences. Entity-Generator runs every 5 minutes to produce customer and account snapshots. AML-pipeline-scheduler runs every minute to invoke a Step Functions workflow.
Data Generation — A Step Functions state machine orchestrates two Lambdas in sequence: GenerateTransactions reads the latest customer/account snapshots from S3 and produces a mix of normal and suspicious transactions, then publishes them to Kinesis; GenerateAlerts runs separate batch alerting for analytical purposes.
Stream Processing — A consumer Lambda subscribes to the Kinesis stream, applies the AML detection rules per transaction, and writes both raw transactions and generated alerts back to S3 in date-partitioned folders.
The downstream layer adds AWS Glue (cataloguing the partitioned S3 data into Hive-style tables), Amazon Athena (SQL views with type casting and data quality filters), a FastAPI backend (a thin REST bridge), and a React dashboard (the user-facing console).
| Layer | Technology |
|---|---|
| Cloud (AWS) | Lambda · Kinesis Data Streams · S3 · EventBridge · Step Functions · IAM · Glue Data Catalog · Athena |
| Data | CSV (entities/transactions), JSONL (real-time alerts), Hive partitioning (dt=YYYY-MM-DD) |
| Backend | Python 3.11 · FastAPI · PyAthena · boto3 · Uvicorn |
| Frontend | React 19 · TypeScript · Vite · Plotly-style custom charts (no chart library, hand-rendered SVG) |
| Detection | 14 AML patterns (structuring, layering, round-trip, shell company, trade-based, high velocity, impossible travel, etc.) |
- Real-time streaming — transactions move from generation to detection in seconds via Kinesis
- 14 AML detection rules — covering placement, layering, and integration phases of money laundering
- Cleaned data layer — Athena views handle type casting (CSV strings → booleans, ISO timestamps → typed
TIMESTAMP), filter invalid rows, and resolve slowly-changing customer dimensions - Live dashboard — KPI tiles, alert breakdown chart, top suspicious entities, jurisdictional risk panel, and a streaming alert feed, all driven by real Athena queries
- Multi-view navigation — Overview, Alerts, Transactions, and Entities pages share the same backend
- Partitioned data lake — every dataset is partitioned by date, optimized for Athena scan cost and query speed
- Automated cataloguing — Glue crawlers register new partitions, no manual table maintenance
Financial-Transaction-Monitoring/
├── aml-system/
│ ├── data-generators/ Lambda source for entity, transaction, and alert generation
│ ├── ingestion/ Kinesis consumer Lambda (real-time AML detection)
│ ├── pipelines/ Step Functions definitions
│ ├── frontend/ React + TypeScript dashboard (SENTINEL)
│ ├── ml/ Reserved for future ML-based scoring (planned)
│ ├── services/ Shared utilities
│ └── backend/ FastAPI bridge between Athena and the React UI
├── docs/ Repo-level documentation (this README's images)
└── README.md You are here
Each major folder has its own README.md explaining its contents, dependencies, and run instructions.
The project has three deployable components: AWS infrastructure, the FastAPI backend, and the React frontend. They are independent — the dashboard runs locally against AWS, no hosting required.
- AWS account with admin access (free tier sufficient for the load this project generates)
- Python 3.11+
- Node.js 18+
- AWS CLI configured (
aws configurewith credentials in regioneu-north-1)
The Lambdas, Step Functions, EventBridge schedules, and Kinesis stream are deployed manually through the AWS console. See aml-system/README.md for step-by-step provisioning instructions and the Glue crawler setup.
The S3 bucket structure assumed by all downstream code is:
aml-data/
├── customers/dt=YYYY-MM-DD/customers_*.csv
├── accounts/dt=YYYY-MM-DD/accounts_*.csv
├── transactions/dt=YYYY-MM-DD/transactions_*.csv
└── alerts/
├── batch/dt=YYYY-MM-DD/alerts_*.csv
└── realtime/dt=YYYY-MM-DD/*.jsonl
cd backend
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # macOS/Linux
pip install -r requirements.txt
cp .env.example .env # then edit with your bucket name
uvicorn main:app --reload --port 8000The backend serves on http://localhost:8000. Auto-generated API docs are at http://localhost:8000/docs.
See backend/README.md for endpoint reference.
cd aml-system/frontend
npm install
npm run devThe dashboard opens at http://localhost:5173. The frontend expects the backend to be running at http://localhost:8000.
See aml-system/frontend/README.md for component documentation.
EventBridge (every 1 min)
│
▼
Step Functions: AML-pipeline
├── GenerateTransactions Lambda
│ │
│ ├── Reads latest customers/accounts from S3
│ ├── Builds normal + suspicious transactions
│ ├── Writes CSV to s3://.../transactions/dt=.../
│ └── Publishes records to Kinesis stream
│
└── GenerateAlerts Lambda (batch path, parallel)
└── Reads transactions, writes alerts/batch/
Kinesis Stream
│
▼
AML Consumer Lambda (real-time path)
├── Evaluates 14 AML rules per transaction
├── Writes JSONL alerts to s3://.../alerts/realtime/dt=.../
└── (optional) Updates DynamoDB state for stateful rules
S3 Data Lake (raw)
│
▼
Glue Crawler — auto-detects partitions, registers tables in aml_db
Athena Views (cleaned)
├── customers_clean Latest snapshot, typed pep_flag
├── accounts_clean Typed balances, latest snapshot
├── transactions_clean Filters invalid rows, parses timestamps
├── alerts_clean Score-validated alerts
└── alerts_enriched Pre-joined alerts × customers × transactions
FastAPI Backend
└── 6 endpoints querying the cleaned views
React Dashboard (SENTINEL)
└── Auto-refreshes every 30s
The Kinesis consumer Lambda evaluates each incoming transaction against fourteen rules. Each rule that fires generates a separate alert with a 0–100 severity score.
| Pattern | Description | Typical score |
|---|---|---|
HIGH_VALUE |
Single transaction ≥ $10,000 | 85 |
HIGH_RISK_COUNTRY |
Sender or receiver in FATF-monitored jurisdiction | 60 |
CROSS_BORDER_HIGH |
Cross-border transaction with elevated value | 70 |
STRUCTURING |
Multiple deposits just below $10K reporting threshold | 90+ |
STRUCTURING_WINDOW |
Time-windowed variant of structuring | 90 |
LAYERING |
Funds cycled through 3-7 accounts rapidly | 95+ |
ROUND_TRIP |
Same amount returned to origin within hours | 100 |
SHELL_COMPANY |
Money flowing through known shell-company accounts | 100 |
TRADE_BASED |
Severe over- or under-invoicing on trade payments | 75+ |
LARGE_RAPID |
Single large wire to a high-risk jurisdiction | 99+ |
HIGH_VELOCITY |
Burst of 8–15 payments from one account in 30 minutes | 73+ |
VELOCITY_BURST |
Short-window high-frequency variant | 75 |
IMPOSSIBLE_TRAVEL |
Same account transacting in geographically impossible cities | 71+ |
RAPID_MOVEMENT |
Rapid net outflow from a single account | 80 |
Rules and scoring logic are implemented in aml-system/ingestion/aml-kinesis-consumer_LAMBDA.py.
A common pitfall in data lake design is querying raw data directly. SENTINEL's Athena views enforce a cleaning layer between raw S3 and consumers:
pep_flagis cast from CSV string"True"/"False"to actual boolean- ISO 8601 timestamp strings are parsed into
TIMESTAMPtype - Future-dated transactions are filtered out (handles a generator-side artifact)
- Self-transfers (sender = receiver) and zero/negative amounts are excluded
- Customer snapshots are deduplicated to the latest day's data
- Orphan alerts (referencing customers no longer present) are flagged and reportable
These views are defined in docs/sql/views.sql and cited in the project report as the "data cleaning layer."
The project is intentionally scoped to the data engineering pipeline; several adjacent capabilities are outlined but not implemented:
- Network Graph, Case Manager, Watchlists, SAR Reports, Audit Log, Rule Config views are present in the navigation as planned future deliverables, with the same FastAPI + Athena pattern reusable for each
- Mobile responsive navigation is a known limitation; the analyst console is desktop-first by design but mobile interaction polish remains
- ML-based scoring is reserved for a future iteration in
aml-system/ml/; the current detection layer is rule-based - KPI sparklines display flat indicators; a time-series endpoint would populate them
- Dead-letter handling on the Kinesis consumer is not configured; production deployment would add a DLQ
Moavia Mahmood BSc Software Engineering, OSTIM Technical University
Final Year Project, 2026
This project is provided for academic review. See LICENSE for details.