This project builds an end-to-end, research-oriented fraud detection pipeline for financial transactions. It covers data ingestion and cleaning, feature engineering, supervised model training (including TabNet, XGBoost, CatBoost, and stacking), and causal analyses to explore treatment effects and drivers of fraud.
Financial fraud creates significant direct losses and downstream costs (chargebacks, investigations, and reputational risk). Traditional rules struggle to adapt to evolving fraud patterns, so this project focuses on data‑driven detection that can generalize over time.
Key objectives:
- Improve fraud detection while limiting false positives that disrupt legitimate users.
- Automate triage by surfacing high‑risk transactions for review.
- Increase security and trust through interpretable and auditable model outputs.
Expected outcomes:
- Reduced fraud losses via earlier identification of suspicious activity.
- Operational efficiency by lowering manual review load.
- Stronger customer trust through consistent, explainable decisions.
The feature engineering notebook focuses on creating high‑signal tabular features before modeling. Key themes include:
- Time-based features: transaction hour/day patterns and temporal aggregates.
- Examples: hour-of-day, day-of-week, weekend/holiday flags, and rolling window stats.
- Customer and card behavior: rolling spend statistics, velocity features, and consistency checks.
- Velocity refers to frequency of transactions made e.g.
daily_transaction_countandweekly_transaction_count, and by short gaps intime_since_last_txn(seetemporal_features_client). - Burst behavior refers to clusters of transactions in short time windows, reflected by low
time_since_last_txnplus elevated daily/weekly counts. - Volatility is measured via
amount_change_rateandamount_change, with extreme shifts flagged bylarge_amount_changeandlarge_txn_time_diff_change(seecalculate_event_features).
- Velocity refers to frequency of transactions made e.g.
- Merchant/MCC enrichment: category-level behavior and outlier detection.
- Examples: per‑MCC spend baselines and merchant‑level rarity signals.
- Geospatial features: distance between transaction locations to flag abnormal travel patterns.
- Not completed due to the large volume of geocoding API calls required, but the notebook includes the full workflow and rationale for this feature set.
- Anomaly signals: isolation‑based scores and rare-pattern indicators.
- Examples: isolation forest scores and frequency‑based rarity flags assessed on an individual level and in combination with other features.
See Pre-processing/feature_engineering.ipynb for the full workflow and rationale.
This project treats fraud detection as a tabular classification problem with strong non‑linearities and class imbalance. The predictive workflow is designed to compare complementary model families and then combine their strengths in a stacking ensemble.Class imbalance is handled with sampling strategies in the modeling notebooks (e.g., SMOTE over‑sampling and random under‑sampling) to improve recall on rare fraud cases. Key modeling choices include:
- XGBoost + CatBoost first: Gradient-boosted trees are strong baselines for tabular data, and both models are trained to compare performance and decide the most effective baseline to carry forward. CatBoost is robust to categorical features and reduces target leakage with ordered boosting, while XGBoost provides flexible regularization and strong performance on mixed numeric/categorical encodings.
- TabNet next: TabNet uses attentive feature selection at each decision step, which can improve performance and interpretability on high‑dimensional tabular data where interactions matter.
- Final stacking: The final stack combines CatBoost + TabNet predictions and trains a logistic regression meta‑learner on their probability outputs. This reduces individual model bias/variance and improves generalization on rare fraud cases.
The notebook order reflects this design: build strong base learners, then blend them in a stacking model to maximize detection quality.
flowchart LR
A[Raw data files] --> B[EDA + preprocessing]
B --> C[Feature engineering]
C --> D[XGBoost/CatBoost training]
C --> E[TabNet training]
D --> F[Select baseline + save artifacts]
E --> F
F --> G[Stacking meta‑learner]
G --> H[Threshold tuning on PR curve]
H --> I[Final fraud metrics]
The stacking notebook tunes the decision threshold by maximizing F1 on the precision‑recall curve. In Predictive model/Final_Stacking_Model.ipynb, the best threshold is approximately 0.688. Summary of fraud‑class results on the test split:
| Metric | Value |
|---|---|
| Threshold (best F1) | ~0.688 |
| Precision (fraud) | ~0.92 |
| Recall (fraud) | ~0.59 |
| F1 (fraud) | ~0.72 |
| Average precision | ~0.728 |
- Pre-processing/: Data exploration, cleaning, and feature engineering notebooks.
- Pre-processing/eda.ipynb – exploratory analysis and preprocessing.
- Pre-processing/feature_engineering.ipynb – feature engineering and transformations.
- Predictive model/: Model training notebooks.
- Causal inference/: Causal ML analysis notebooks.
- Python 3.x
- Jupyter Notebook or JupyterLab
This project uses the Kaggle dataset created by Caixabank Tech for the 2024 AI Hackathon.
-
Download the data from Kaggle and place the following files in the repository root:
transactions_data.csvcards_data.csvusers_data.csvmcc_codes.jsontrain_fraud_labels.json
Dataset link: https://www.kaggle.com/datasets/computingvictor/transactions-fraud-datasets/data?select=transactions_data.csv
The notebooks and scripts use common data science libraries. Install the core set below, and add model-specific libraries as needed:
- Core:
pandas,numpy,scikit-learn,matplotlib,seaborn,joblib - Modeling:
xgboost,catboost,pytorch-tabnet,torch,optuna - Imbalanced learning:
imbalanced-learn - Explainability:
shap - Causal inference:
econml,causalml,dowhy - Feature engineering extras:
geopy,requests,swifter,mlxtend,gdown
Open Pre-processing/eda.ipynb and run the notebook end-to-end.
Run the predictive modeling notebooks in this order:
- Predictive model/xgb_catboost.ipynb
- Predictive model/model_tabnet.ipynb
- Predictive model/Final_Stacking_Model.ipynb
Causal notebooks can be run independently under Causal inference/.
An example artifact is available at Predictive model/catboost_precision.joblib:
import joblib
model = joblib.load("Predictive model/catboost_precision.joblib")- Review the notebooks for workflow details and experiments:
- If you are using GitHub, open an issue in the repository for questions or bugs.
Maintained by contributors in the McGill-MMA-EnterpriseAnalytics organization.