Fraud-Detection-Classification

A machine learning model to detect fraudulent credit card transactions, focusing on handling imbalanced data.

Credit Card Fraud Detection

Project Overview

This project focuses on building a machine learning model to detect fraudulent credit card transactions. The primary challenge is the highly imbalanced nature of the dataset, where fraudulent transactions account for a very small fraction (0.17%) of the total. The goal is to develop a reliable classification model that can effectively identify fraud while minimizing false positives to ensure a good customer experience.

Dataset

The dataset used is a public dataset from Kaggle containing credit card transactions made over a period of two days. It consists of 284,807 transactions, of which only 492 are fraudulent. Features V1 through V28 are the result of a PCA transformation to protect user privacy.

Link to Original Kaggle Dataset

Methodology

The project followed a structured machine learning workflow:

Data Exploration (EDA): Initial analysis confirmed the extreme class imbalance and identified that the Time and Amount features required scaling.
Preprocessing: Applied StandardScaler from Scikit-learn to the Time and Amount columns to standardize their scales.
Model Training & Comparison:
- Baseline Model (Logistic Regression): A simple model was first trained to establish a performance baseline. It achieved high recall (92%) but very poor precision (6%), making it impractical due to a high number of false alarms.
- Advanced Model (Random Forest): A Random Forest Classifier was then trained. This model demonstrated a much better balance between precision and recall.
Evaluation: The key challenge was selecting the right evaluation metric. Instead of relying on accuracy, the models were evaluated based on their Precision, Recall, and F1-Score, especially for the minority (fraud) class.

Results & Business Impact

The final Random Forest model achieved:

Precision: 96%
Recall: 76%

This demonstrates a successful precision-recall trade-off. While the model doesn't catch every single fraudulent transaction (76% recall), the transactions it does flag are highly likely to be fraudulent (96% precision).

From a business perspective, this is a highly valuable outcome. It provides the fraud investigation team with a manageable and reliable list of alerts, drastically reducing the time wasted on false alarms and preventing the frustration of blocking legitimate customer transactions.

Technologies Used

Python 3
Pandas & NumPy: For data manipulation and analysis.
Matplotlib & Seaborn: For data visualization.
Scikit-learn: For preprocessing, model training (Logistic Regression, Random Forest), and evaluation.

➡️ Explore the details of the project

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
Fraud_Detection.ipynb		Fraud_Detection.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud-Detection-Classification

A machine learning model to detect fraudulent credit card transactions, focusing on handling imbalanced data.

Credit Card Fraud Detection

Project Overview

Dataset

Methodology

Results & Business Impact

Technologies Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fraud-Detection-Classification

A machine learning model to detect fraudulent credit card transactions, focusing on handling imbalanced data.

Credit Card Fraud Detection

Project Overview

Dataset

Methodology

Results & Business Impact

Technologies Used

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages