Skip to content

prakhar-189/Financial-Fraud-Analysis

Repository files navigation

🔍 Financial Fraud Analysis

A multi-tool data analytics capstone project that investigates credit card transaction fraud using MySQL, Python (EDA), Microsoft Excel, and Tableau. The project covers the full analytics pipeline — from raw data ingestion and SQL querying to exploratory analysis in Python and an interactive Tableau dashboard.


📁 Repository Structure

Financial-Fraud-Analysis/
│
├── Financial Fraud.sql            # MySQL script: DB setup, data import & SQL analysis
├── EDA & Python.ipynb             # Jupyter Notebook: EDA & visualizations in Python
├── Excel Tasks.xlsb               # Excel-based analysis and tasks
├── Financial Fraud Dashboard.twb  # Tableau workbook for the interactive dashboard
├── Problem Statement.pdf          # Project brief and objectives
├── requirements.txt               # Python dependencies
├── Dataset/                       # Raw dataset files
├── SQL Outputs/                   # Screenshots / exports of SQL query results
└── LICENSE                        # MIT License

🗂️ Dataset

The dataset contains credit card transaction records with both legitimate and fraudulent entries. Key columns include:

Column Description
TRANS_DATE_TRANS_TIME Date and time of the transaction
CC_NUM Credit card number (anonymized)
MERCHANT Merchant name
CATEGORY Transaction category (e.g., grocery, travel)
AMT Transaction amount
FIRST_NAME / LAST_NAME Cardholder name
GENDER Cardholder gender
CITY / STATE / ZIP Cardholder location
LAT / LONGITUDE Cardholder's geographic coordinates
CITY_POP Population of cardholder's city
JOB Cardholder's occupation
DOB Date of birth
TRANS_NUM Unique transaction identifier
MERCH_LAT / MERCH_LONG Merchant's geographic coordinates
IS_FRAUD Fraud label — 1 = Fraudulent, 0 = Legitimate

A separate Location Data table (Location Data.csv) is used for geographic joins in SQL.


🛠️ Tools & Technologies

Tool Purpose
MySQL Database setup, data import, SQL-based analysis
Python (Jupyter Notebook) Exploratory Data Analysis & visualizations
Microsoft Excel Supplementary analysis and task documentation
Tableau Interactive fraud dashboard
pandas, matplotlib, seaborn, scikit-learn Python libraries for analysis & ML

🔄 Project Workflow

1. SQL Analysis (Financial Fraud.sql)

Database Setup:

  • Created FINANCE database with a CC_DATA table mirroring the raw CSV schema.
  • Loaded the dataset using LOAD DATA LOCAL INFILE.
  • Created a separate LOCATION_DATA table for geographic coordinates, joined to CC_DATA via CC_NUM.

SQL Queries Performed:

Analysis Description
Total Transactions Count of all records in the dataset
Top 10 Merchants Most frequent merchants by transaction count
Avg. Transaction Amount by Category Average spend per transaction category
Fraud Count & Percentage Total fraudulent transactions and their share of all transactions
Transaction Geolocation Joins CC_DATA with LOCATION_DATA to map each transaction to lat/long
City with Highest Population Identifies the most populous city in the dataset
Transaction Date Range Earliest and latest transaction timestamps
Total Transaction Value Sum of all transaction amounts
Transactions by Category Count of transactions per spending category
Avg. Amount by Gender Average transaction amount split by cardholder gender
Transactions by Day of Week Average transaction amount grouped by weekday

2. Python EDA (EDA & Python.ipynb)

The Jupyter Notebook covers end-to-end exploratory data analysis using pandas, matplotlib, seaborn, and scikit-learn:

  • Data Loading & Inspection — shape, dtypes, null checks, descriptive statistics
  • Fraud Distribution — class imbalance visualization (fraudulent vs. legitimate)
  • Transaction Amount Analysis — distribution plots, outlier detection
  • Category-wise Analysis — fraud rates per spending category
  • Time-based Analysis — fraud patterns by hour, day, and month
  • Geographic Analysis — mapping transaction and fraud locations
  • Correlation Analysis — heatmaps to identify feature relationships
  • Feature Engineering — extracting time-based features from transaction timestamps

3. Excel Analysis (Excel Tasks.xlsb)

Supplementary analysis covering pivot tables, summaries, and structured task outputs aligned with the project's problem statement.


4. Tableau Dashboard (Financial Fraud Dashboard.twb)

An interactive Tableau dashboard presenting:

  • 🗺️ Geographic fraud map — fraud hotspots by location
  • 📊 Fraud by category — which transaction types are most vulnerable
  • 👤 Demographic breakdown — fraud patterns by gender and age group
  • 📅 Time trends — fraud activity over time
  • 💳 Top fraudulent merchants — most targeted merchant names

Open Financial Fraud Dashboard.twb in Tableau Desktop to explore the visuals.


🚀 How to Run This Project

Prerequisites

  • MySQL Server (v8.0+) with local_infile enabled
  • Python 3.8+ with Jupyter Notebook
  • Tableau Desktop (for .twb file)
  • Microsoft Excel (for .xlsb file)

Steps

1. Clone the repository

git clone https://github.com/PrakharSri18-data/Financial-Fraud-Analysis.git
cd Financial-Fraud-Analysis

2. Set up the database

  • Locate the dataset files inside the Dataset/ folder
  • Open Financial Fraud.sql in MySQL Workbench
  • Update the LOAD DATA LOCAL INFILE path to match your local file location
  • Run the full script

3. Run the Python notebook

pip install -r requirements.txt
jupyter notebook "EDA & Python.ipynb"

4. Open the Tableau dashboard

  • Launch Tableau Desktop
  • Open Financial Fraud Dashboard.twb
  • Re-connect the data source if prompted

📦 Python Dependencies

pandas==3.0.2
numpy==2.4.4
matplotlib==3.10.8
seaborn==0.13.2
scikit-learn==1.8.0
scipy==1.17.1

Full list available in requirements.txt


💡 Key Insights

  • The dataset is heavily imbalanced — fraudulent transactions make up only a small percentage of all records, highlighting the real-world challenge of fraud detection.
  • Certain transaction categories show disproportionately high fraud rates relative to their volume.
  • Geographic clustering reveals specific regions with elevated fraud activity.
  • Time-of-day patterns suggest fraudulent transactions tend to spike during off-peak hours.
  • Gender and city population show measurable differences in average transaction amounts.

📄 License

This project is licensed under the MIT License © 2026 Prakhar Srivastava.


🙋 Author

Prakhar Srivastava

Data Analyst, Data Scientist & AI Engineer | Machine Learning, Deep Learning, Generative AI, Prompt Engineering & Agentic AI

About

Multi-tool financial fraud analysis using MySQL, Python, Excel & Tableau — uncovering fraud patterns, geographic hotspots, and transaction trends across credit card data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors