Financial Fraud Detection with Machine Learning

What the project does

This project builds an end-to-end, research-oriented fraud detection pipeline for financial transactions. It covers data ingestion and cleaning, feature engineering, supervised model training (including TabNet, XGBoost, CatBoost, and stacking), and causal analyses to explore treatment effects and drivers of fraud.

Business context and objectives

Financial fraud creates significant direct losses and downstream costs (chargebacks, investigations, and reputational risk). Traditional rules struggle to adapt to evolving fraud patterns, so this project focuses on data‑driven detection that can generalize over time.

Key objectives:

Improve fraud detection while limiting false positives that disrupt legitimate users.
Automate triage by surfacing high‑risk transactions for review.
Increase security and trust through interpretable and auditable model outputs.

Expected outcomes:

Reduced fraud losses via earlier identification of suspicious activity.
Operational efficiency by lowering manual review load.
Stronger customer trust through consistent, explainable decisions.

Feature engineering overview

The feature engineering notebook focuses on creating high‑signal tabular features before modeling. Key themes include:

Time-based features: transaction hour/day patterns and temporal aggregates.
- Examples: hour-of-day, day-of-week, weekend/holiday flags, and rolling window stats.
Customer and card behavior: rolling spend statistics, velocity features, and consistency checks.
- Velocity refers to frequency of transactions made e.g. daily_transaction_count and weekly_transaction_count, and by short gaps in time_since_last_txn (see temporal_features_client).
- Burst behavior refers to clusters of transactions in short time windows, reflected by low time_since_last_txn plus elevated daily/weekly counts.
- Volatility is measured via amount_change_rate and amount_change, with extreme shifts flagged by large_amount_change and large_txn_time_diff_change (see calculate_event_features).
Merchant/MCC enrichment: category-level behavior and outlier detection.
- Examples: per‑MCC spend baselines and merchant‑level rarity signals.
Geospatial features: distance between transaction locations to flag abnormal travel patterns.
- Not completed due to the large volume of geocoding API calls required, but the notebook includes the full workflow and rationale for this feature set.
Anomaly signals: isolation‑based scores and rare-pattern indicators.
- Examples: isolation forest scores and frequency‑based rarity flags assessed on an individual level and in combination with other features.

See Pre-processing/feature_engineering.ipynb for the full workflow and rationale.

Modeling approach (why these models and stacking)

This project treats fraud detection as a tabular classification problem with strong non‑linearities and class imbalance. The predictive workflow is designed to compare complementary model families and then combine their strengths in a stacking ensemble.Class imbalance is handled with sampling strategies in the modeling notebooks (e.g., SMOTE over‑sampling and random under‑sampling) to improve recall on rare fraud cases. Key modeling choices include:

XGBoost + CatBoost first: Gradient-boosted trees are strong baselines for tabular data, and both models are trained to compare performance and decide the most effective baseline to carry forward. CatBoost is robust to categorical features and reduces target leakage with ordered boosting, while XGBoost provides flexible regularization and strong performance on mixed numeric/categorical encodings.
TabNet next: TabNet uses attentive feature selection at each decision step, which can improve performance and interpretability on high‑dimensional tabular data where interactions matter.
Final stacking: The final stack combines CatBoost + TabNet predictions and trains a logistic regression meta‑learner on their probability outputs. This reduces individual model bias/variance and improves generalization on rare fraud cases.

The notebook order reflects this design: build strong base learners, then blend them in a stacking model to maximize detection quality.

End‑to‑end workflow

flowchart LR
	A[Raw data files] --> B[EDA + preprocessing]
	B --> C[Feature engineering]
	C --> D[XGBoost/CatBoost training]
	C --> E[TabNet training]
	D --> F[Select baseline + save artifacts]
	E --> F
	F --> G[Stacking meta‑learner]
	G --> H[Threshold tuning on PR curve]
	H --> I[Final fraud metrics]

Final tuning and fraud precision/recall

The stacking notebook tunes the decision threshold by maximizing F1 on the precision‑recall curve. In Predictive model/Final_Stacking_Model.ipynb, the best threshold is approximately 0.688. Summary of fraud‑class results on the test split:

Metric	Value
Threshold (best F1)	~0.688
Precision (fraud)	~0.92
Recall (fraud)	~0.59
F1 (fraud)	~0.72
Average precision	~0.728

Repository structure

Pre-processing/: Data exploration, cleaning, and feature engineering notebooks.
- Pre-processing/eda.ipynb – exploratory analysis and preprocessing.
- Pre-processing/feature_engineering.ipynb – feature engineering and transformations.
Predictive model/: Model training notebooks.
Causal inference/: Causal ML analysis notebooks.
- Causal inference/CausalNL_analysis.ipynb
- Causal inference/Extension_Causal_ML.ipynb

How users can get started

Prerequisites

Python 3.x
Jupyter Notebook or JupyterLab

Data setup

This project uses the Kaggle dataset created by Caixabank Tech for the 2024 AI Hackathon.

Download the data from Kaggle and place the following files in the repository root:
- transactions_data.csv
- cards_data.csv
- users_data.csv
- mcc_codes.json
- train_fraud_labels.json
Dataset link: https://www.kaggle.com/datasets/computingvictor/transactions-fraud-datasets/data?select=transactions_data.csv

Install dependencies

The notebooks and scripts use common data science libraries. Install the core set below, and add model-specific libraries as needed:

Core: pandas, numpy, scikit-learn, matplotlib, seaborn, joblib
Modeling: xgboost, catboost, pytorch-tabnet, torch, optuna
Imbalanced learning: imbalanced-learn
Explainability: shap
Causal inference: econml, causalml, dowhy
Feature engineering extras: geopy, requests, swifter, mlxtend, gdown

Usage examples

1) Run preprocessing and EDA

Open Pre-processing/eda.ipynb and run the notebook end-to-end.

2) Train or review models in notebooks

Run the predictive modeling notebooks in this order:

Causal notebooks can be run independently under Causal inference/.

3) Load a trained model artifact

An example artifact is available at Predictive model/catboost_precision.joblib:

import joblib
model = joblib.load("Predictive model/catboost_precision.joblib")

Where users can get help

Review the notebooks for workflow details and experiments:
If you are using GitHub, open an issue in the repository for questions or bugs.

Who maintains and contributes

Maintained by contributors in the McGill-MMA-EnterpriseAnalytics organization.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
Causal inference		Causal inference
Pre-processing		Pre-processing
Predictive model		Predictive model
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Financial Fraud Detection with Machine Learning

What the project does

Business context and objectives

Feature engineering overview

Modeling approach (why these models and stacking)

End‑to‑end workflow

Final tuning and fraud precision/recall

Repository structure

How users can get started

Prerequisites

Data setup

Install dependencies

Usage examples

1) Run preprocessing and EDA

2) Train or review models in notebooks

3) Load a trained model artifact

Where users can get help

Who maintains and contributes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Financial Fraud Detection with Machine Learning

What the project does

Business context and objectives

Feature engineering overview

Modeling approach (why these models and stacking)

End‑to‑end workflow

Final tuning and fraud precision/recall

Repository structure

How users can get started

Prerequisites

Data setup

Install dependencies

Usage examples

1) Run preprocessing and EDA

2) Train or review models in notebooks

3) Load a trained model artifact

Where users can get help

Who maintains and contributes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages