Skip to content

yeick010/Fraud-Detection-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Fraud-Detection-Classification

A machine learning model to detect fraudulent credit card transactions, focusing on handling imbalanced data.

Credit Card Fraud Detection

Project Overview

This project focuses on building a machine learning model to detect fraudulent credit card transactions. The primary challenge is the highly imbalanced nature of the dataset, where fraudulent transactions account for a very small fraction (0.17%) of the total. The goal is to develop a reliable classification model that can effectively identify fraud while minimizing false positives to ensure a good customer experience.


Dataset

The dataset used is a public dataset from Kaggle containing credit card transactions made over a period of two days. It consists of 284,807 transactions, of which only 492 are fraudulent. Features V1 through V28 are the result of a PCA transformation to protect user privacy.

Link to Original Kaggle Dataset


Methodology

The project followed a structured machine learning workflow:

  1. Data Exploration (EDA): Initial analysis confirmed the extreme class imbalance and identified that the Time and Amount features required scaling.
  2. Preprocessing: Applied StandardScaler from Scikit-learn to the Time and Amount columns to standardize their scales.
  3. Model Training & Comparison:
    • Baseline Model (Logistic Regression): A simple model was first trained to establish a performance baseline. It achieved high recall (92%) but very poor precision (6%), making it impractical due to a high number of false alarms.
    • Advanced Model (Random Forest): A Random Forest Classifier was then trained. This model demonstrated a much better balance between precision and recall.
  4. Evaluation: The key challenge was selecting the right evaluation metric. Instead of relying on accuracy, the models were evaluated based on their Precision, Recall, and F1-Score, especially for the minority (fraud) class.

Results & Business Impact

The final Random Forest model achieved:

  • Precision: 96%
  • Recall: 76%

This demonstrates a successful precision-recall trade-off. While the model doesn't catch every single fraudulent transaction (76% recall), the transactions it does flag are highly likely to be fraudulent (96% precision).

From a business perspective, this is a highly valuable outcome. It provides the fraud investigation team with a manageable and reliable list of alerts, drastically reducing the time wasted on false alarms and preventing the frustration of blocking legitimate customer transactions.


Technologies Used

  • Python 3
  • Pandas & NumPy: For data manipulation and analysis.
  • Matplotlib & Seaborn: For data visualization.
  • Scikit-learn: For preprocessing, model training (Logistic Regression, Random Forest), and evaluation.

➡️ Explore the details of the project

About

A machine learning model to detect fraudulent credit card transactions, focusing on handling imbalanced data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors