- Real-Time, AI-Driven Fraud Detection System
This repository is for my Final Year Project (FYP) at FAST NUCES, focused on building an end-to-end system to detect credit card fraud in real-time.
Project Status: Phase 1: In Progress
- 🎯 Project Overview
This project aims to design and build a scalable data pipeline that can ingest a high-velocity stream of credit card transactions, use a trained Machine Learning model to "score" each transaction for fraud in real-time, and flag suspicious activities for immediate review.
This system addresses three core computer science challenges:
Volume (Big Data): Handling a massive scale of transaction data.
Velocity (Real-Time): Making a fraud decision in milliseconds, before a transaction is approved.
Class Imbalance (AI): Training an accurate model when fraudulent transactions are extremely rare (less than 0.2% of all data).
- 🛠️ Technology Stack
- ⚙️ How It Works (Proposed Architecture)
The system is built in three distinct phases:
Phase 1: The "AI Brain" (Offline Model Training)
Goal: To train and save an AI model that can accurately detect anomalies.
Data: Using the public Kaggle "Credit Card Fraud" dataset.
Model: An Isolation Forest model is trained using Scikit-learn to handle the severe class imbalance.
Phase 2: The "Data Highway" (Real-Time Pipeline)
Goal: To simulate a high-speed stream of new transactions.
Process: An Apache Kafka "producer" will send transaction data to a topic, mimicking a real bank.
Technology: Apache Spark (Streaming) will be used to read this data in real-time as it arrives.
Phase 3: The "Decision Engine" (Real-Time Integration)
Goal: To combine the AI model and the data pipeline.
Process: The Spark Streaming application will load the saved model from Phase 1.
Action: For every new transaction from Kafka, Spark will instantly use the model to get a prediction. Flagged "Fraud" transactions will be sent to an alert queue.