A production-style, end-to-end machine learning pipeline for detecting fraudulent credit card transactions β featuring a 4-model benchmark, SMOTE class balancing, SHAP per-prediction explainability, DistilGPT-2 plain-English summaries, and an interactive Streamlit dashboard.
π View Repository Β· π Report Bug Β· β¨ Request Feature
- Project Overview
- Key Features
- 8-Stage Pipeline
- Project Structure
- Dataset
- Feature Engineering
- Models Trained
- Evaluation Metrics
- SHAP Explainability
- Hugging Face Plain-English Explanations
- Streamlit Dashboard
- Generated Outputs
- Configuration
- Installation
- Running the Project
- Running the Dashboard
- Requirements
- Roadmap
- License
Credit card fraud costs the global financial system billions of dollars annually. Fraudulent transactions are rare β typically less than 0.2% of all activity β which makes detection extremely difficult using standard machine learning approaches. A naive model that simply predicts "legitimate" for every transaction would achieve over 99% accuracy while catching zero fraud cases.
This project builds a complete, production-style AI pipeline that:
- Ingests raw transaction data from the Kaggle Credit Card Fraud dataset
- Engineers meaningful features from raw inputs
- Handles the severe class imbalance using SMOTE (Synthetic Minority Oversampling Technique)
- Trains and benchmarks four machine learning models side by side
- Evaluates them using metrics designed for imbalanced classification
- Explains every prediction using SHAP (SHapley Additive exPlanations)
- Converts technical SHAP output into plain English using a Hugging Face DistilGPT-2 language model
- Presents everything through an interactive Streamlit dashboard with three input modes
- Auto-generates a full Markdown training report after every run
| Feature | Description |
|---|---|
| Multi-model training | Logistic Regression Β· Random Forest Β· XGBoost Β· LightGBM |
| Class imbalance handling | SMOTE resampling before training (configurable ratio) |
| Rigorous evaluation | ROC-AUC Β· PR-AUC Β· Precision Β· Recall Β· F1 |
| SHAP explainability | Per-prediction feature contribution breakdown |
| Human-readable explanations | DistilGPT-2 narrates the SHAP output in plain English |
| Interactive UI | Streamlit dashboard β Manual Β· Random Β· CSV input modes |
| Reproducible pipeline | Seeded RNGs Β· saved scaler Β· saved feature name order |
| Centralized config | All paths and hyperparameters in one config.py |
| Automated reporting | Markdown report + metrics CSV generated after every run |
Run the entire pipeline with a single command: python main.py
Raw CSV
β
βΌ
Stage 1 β Load Data
Reads data/raw/transactions.csv Β· validates shape Β· logs fraud rate
β
βΌ
Stage 2 β EDA
Class balance chart Β· amount histogram Β· time histogram Β· correlation heatmap
β All charts saved to reports/figures/
β
βΌ
Stage 3 β Preprocessing
Feature engineering (Amount_Log, Hour, Is_Large_Amount)
β train/test split (80/20, stratified) β StandardScaler
β saves scaler.pkl + feature_names.json to data/processed/
β
βΌ
Stage 4 β SMOTE Resampling
Resamples fraud class to SMOTE_RATIO (default 0.2) of majority
β prevents dominant legitimate class from biasing all models
β
βΌ
Stage 5 β Model Training
Trains all 4 models on SMOTE-resampled training data
β saves each model to models/
β
βΌ
Stage 6 β Evaluation
Scores all models: ROC-AUC Β· PR-AUC Β· Precision Β· Recall Β· F1
β ranks by ROC-AUC Β· saves best model to models/fraud_model.pkl
β saves confusion matrix, ROC, PR, feature importance charts per model
β
βΌ
Stage 7 β Explainability
SHAP values computed for best model
β beeswarm summary plot saved β DistilGPT-2 plain-English narration
β
βΌ
Stage 8 β Report
Writes reports/report.md with full results table
Writes reports/metrics.csv for downstream analysis
fraud-detection-ai/
β
βββ app/
β βββ streamlit_app.py # Interactive Streamlit dashboard
β
βββ data/
β βββ raw/
β β βββ transactions.csv # β Place Kaggle dataset here
β βββ processed/
β β βββ X_train.csv # Scaled training features
β β βββ X_test.csv # Scaled test features
β β βββ y_train.csv # Training labels
β β βββ y_test.csv # Test labels
β β βββ scaler.pkl # Fitted StandardScaler
β β βββ feature_names.json # Ordered feature column names
β βββ external/
β βββ huggingface_cache/ # Cached HF model weights
β
βββ models/
β βββ fraud_model.pkl # Best model (selected by ROC-AUC)
β βββ random_forest.pkl
β βββ xgboost_model.pkl
β βββ lightgbm_model.pkl
β
βββ notebooks/ # Jupyter notebooks for exploration
β
βββ reports/
β βββ figures/ # All generated charts
β βββ report.md # Auto-generated training report
β βββ metrics.csv # Model comparison table
β
βββ src/
β βββ data/
β β βββ load_data.py # Loads raw CSV from disk
β β βββ preprocess.py # Cleaning, splitting, and scaling
β β βββ feature_engineering.py # Derives smart features from raw columns
β β
β βββ models/
β β βββ train_model.py # Trains all candidate models
β β βββ evaluate_model.py # Scores and ranks all models
β β βββ predict.py # Single-transaction prediction pipeline
β β βββ huggingface_model.py # Hugging Face plain-English explanation
β β
β βββ explainability/
β β βββ shap_explainer.py # SHAP values and summary plots
β β
β βββ visualization/
β β βββ eda.py # Exploratory data analysis charts
β β βββ plots.py # Confusion matrix, ROC, PR, importance
β β
β βββ utils/
β βββ config.py # All paths and hyperparameters
β βββ helpers.py # Shared utility functions
β
βββ main.py # Pipeline entry point
βββ requirements.txt # Python dependencies
βββ project_structure.md # Extended structure documentation
βββ workflow.md # Pipeline workflow documentation
βββ README.md
This project uses the Credit Card Fraud Detection dataset published by the Machine Learning Group at UniversitΓ© Libre de Bruxelles (ULB).
Download: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
After downloading, rename and place at: data/raw/transactions.csv
| Property | Value |
|---|---|
| Total Transactions | 284,807 |
| Fraudulent Transactions | 492 (0.17%) |
| Legitimate Transactions | 284,315 (99.83%) |
| Raw Features | 30 (Time Β· V1βV28 PCA Β· Amount) |
| Target Column | Class (0 = legitimate Β· 1 = fraud) |
Note: The V1βV28 columns are PCA-transformed by the dataset authors to protect cardholder privacy. Original feature names are not available.
Three smart features are derived from the raw dataset in src/data/feature_engineering.py:
| Feature | Source | Description |
|---|---|---|
Amount_Log |
Amount | Log-transformed transaction amount β handles the wide right-skewed distribution |
Hour |
Time | Hour of day extracted from the Unix-style Time column (0β23) |
Is_Large_Amount |
Amount | Binary flag: 1 if Amount > $200 (configurable via LARGE_AMOUNT_THRESHOLD) |
All models are trained on SMOTE-resampled data where the fraud class is resampled to 20% of the majority class size (SMOTE_RATIO = 0.2).
| Model | Key Settings | Class Imbalance Strategy |
|---|---|---|
| Logistic Regression | max_iter=2000 |
class_weight=balanced |
| Random Forest | n_estimators=200 |
class_weight=balanced_subsample |
| XGBoost | n_estimators=200 Β· learning_rate=0.05 Β· max_depth=5 |
SMOTE pre-training |
| LightGBM | n_estimators=200 Β· learning_rate=0.05 |
SMOTE pre-training |
Standard accuracy is deliberately excluded from model selection β a model predicting "legitimate" for every transaction would achieve 99.83% accuracy while catching zero fraud.
| Metric | Purpose | Used For |
|---|---|---|
| ROC-AUC | Ranking quality across all thresholds | β Primary model selection |
| PR-AUC | Best metric for severely imbalanced datasets | β Model selection |
| Precision | Of all flagged transactions, how many were actual fraud | β Business impact |
| Recall | Of all actual fraud cases, how many were caught | β Business impact |
| F1 Score | Harmonic mean of precision and recall | β Balanced comparison |
| Accuracy | Reference only | β Not used for selection |
Fraud detection systems must be explainable β regulators and end users need to understand why a specific transaction was flagged.
SHAP assigns each feature a contribution score for every individual prediction:
- Positive score β pushes the model toward predicting fraud
- Negative score β pushes the model toward predicting legitimate
Example SHAP output:
V14 (-2.341) β pushes toward safe π’
Amount_Log (+1.823) β pushes toward fraud π΄
Hour (+0.912) β pushes toward fraud π΄
V17 (-0.748) β pushes toward safe π’
V4 (+0.631) β pushes toward fraud π΄
Generated SHAP outputs:
- Per-prediction top feature contributions (shown in the Streamlit dashboard)
- Global beeswarm summary plot saved to
reports/figures/shap_summary.png
The SHAP feature summary is passed as a structured prompt to DistilGPT-2, which generates a short, beginner-friendly explanation of the model's decision.
Example output:
"This transaction looks suspicious because the amount is unusually large for
this time of day, and several anonymised signals are elevated. The model
estimated a fraud risk of 0.91."
If the Hugging Face model is unavailable (no internet or download fails), a clean, readable static fallback explanation is returned automatically.
Once main.py has completed at least once, launch the interactive dashboard:
streamlit run app/streamlit_app.pyThree input modes:
| Mode | Description | Best For |
|---|---|---|
| Manual | Type raw transaction values into the form directly | Testing specific or hypothetical transactions |
| Random Sample | Picks a random row from the test dataset | Quick demo on real data |
| Upload CSV | Upload a single-row CSV file | Integration testing |
Every prediction shows:
- Fraud probability score (0.00 β 1.00)
- Risk level: Low / Medium / High
- Pass β or Fail π¨ banner
- DistilGPT-2 plain-English explanation
- Top SHAP feature contributions ranked by absolute impact
After running main.py, the following are created automatically:
| File | Description |
|---|---|
data/processed/X_train.csv |
Scaled training feature matrix |
data/processed/X_test.csv |
Scaled test feature matrix |
data/processed/scaler.pkl |
Fitted StandardScaler for inference |
data/processed/feature_names.json |
Ordered feature column list |
models/fraud_model.pkl |
Best model selected by ROC-AUC |
models/random_forest.pkl |
Saved Random Forest |
models/xgboost_model.pkl |
Saved XGBoost |
models/lightgbm_model.pkl |
Saved LightGBM |
reports/figures/class_balance.png |
Fraud vs legitimate count chart |
reports/figures/correlation_heatmap.png |
Feature correlation heatmap |
reports/figures/*_confusion_matrix.png |
Confusion matrix per model |
reports/figures/*_roc_curve.png |
ROC curve per model |
reports/figures/*_pr_curve.png |
Precision-Recall curve per model |
reports/figures/*_feature_importance.png |
Feature importance per model |
reports/figures/shap_summary.png |
SHAP beeswarm plot for best model |
reports/report.md |
Full Markdown training summary |
reports/metrics.csv |
Model comparison table (CSV) |
All paths and hyperparameters are centralised in src/utils/config.py. Nothing is hardcoded elsewhere.
| Setting | Default | Description |
|---|---|---|
RANDOM_STATE |
42 |
Seeds all RNGs β ensures full reproducibility |
TEST_SIZE |
0.2 |
Fraction of data reserved for evaluation |
THRESHOLD |
0.5 |
Minimum probability to classify as fraud |
SMOTE_RATIO |
0.2 |
Fraud class target ratio after resampling |
LARGE_AMOUNT_THRESHOLD |
200.0 |
USD threshold for Is_Large_Amount flag |
HF_MODEL_NAME |
distilgpt2 |
Hugging Face model for explanation generation |
1. Clone the repository:
git clone https://github.com/ibtesaamaslam/Fraud-Detection-Model.git
cd Fraud-Detection-Model2. Create a virtual environment (recommended):
python -m venv .venv
source .venv/bin/activate # macOS / Linux
.venv\Scripts\activate # Windows3. Install dependencies:
pip install -r requirements.txt4. Place the dataset:
Download creditcard.csv from Kaggle, rename it to transactions.csv, and place at:
data/raw/transactions.csv
Run the full 8-stage pipeline:
python main.pyThis executes all stages in order and produces:
- Processed data in
data/processed/ - Trained models in
models/ - EDA and evaluation charts in
reports/figures/ - Final report at
reports/report.md
After main.py completes:
streamlit run app/streamlit_app.pyβ Opens at http://localhost:8501
pandas
numpy
scikit-learn
imbalanced-learn
xgboost
lightgbm
shap
transformers
joblib
matplotlib
seaborn
streamlit
tabulate
pip install -r requirements.txt- FastAPI endpoint β expose
/predictfor banking system integration - LIME explainability β add alongside SHAP for comparison
- Threshold optimisation β auto-tune decision threshold by maximising F1
- Deep learning baseline β neural network benchmark vs tree models
- Drift detection β flag when retraining is needed
- Docker containerisation β Dockerfile for reproducible deployment
- MLflow experiment tracking β log all runs and metrics
- Hugging Face Spaces deployment β public demo
MIT License β Copyright (c) 2024 Ibtesaam Aslam
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies, subject to the copyright notice appearing in all copies.
Provided "as is" β without warranty of any kind.
Built with β€οΈ by Ibtesaam Aslam
β If this project helped you learn fraud detection or ML pipelines, please give it a star!
Explainable AI for financial fraud detection.