🔍 Financial Fraud Analysis

A multi-tool data analytics capstone project that investigates credit card transaction fraud using MySQL, Python (EDA), Microsoft Excel, and Tableau. The project covers the full analytics pipeline — from raw data ingestion and SQL querying to exploratory analysis in Python and an interactive Tableau dashboard.

📁 Repository Structure

Financial-Fraud-Analysis/
│
├── Financial Fraud.sql            # MySQL script: DB setup, data import & SQL analysis
├── EDA & Python.ipynb             # Jupyter Notebook: EDA & visualizations in Python
├── Excel Tasks.xlsb               # Excel-based analysis and tasks
├── Financial Fraud Dashboard.twb  # Tableau workbook for the interactive dashboard
├── Problem Statement.pdf          # Project brief and objectives
├── requirements.txt               # Python dependencies
├── Dataset/                       # Raw dataset files
├── SQL Outputs/                   # Screenshots / exports of SQL query results
└── LICENSE                        # MIT License

🗂️ Dataset

The dataset contains credit card transaction records with both legitimate and fraudulent entries. Key columns include:

Column	Description
`TRANS_DATE_TRANS_TIME`	Date and time of the transaction
`CC_NUM`	Credit card number (anonymized)
`MERCHANT`	Merchant name
`CATEGORY`	Transaction category (e.g., grocery, travel)
`AMT`	Transaction amount
`FIRST_NAME` / `LAST_NAME`	Cardholder name
`GENDER`	Cardholder gender
`CITY` / `STATE` / `ZIP`	Cardholder location
`LAT` / `LONGITUDE`	Cardholder's geographic coordinates
`CITY_POP`	Population of cardholder's city
`JOB`	Cardholder's occupation
`DOB`	Date of birth
`TRANS_NUM`	Unique transaction identifier
`MERCH_LAT` / `MERCH_LONG`	Merchant's geographic coordinates
`IS_FRAUD`	Fraud label — `1` = Fraudulent, `0` = Legitimate

A separate Location Data table (Location Data.csv) is used for geographic joins in SQL.

🛠️ Tools & Technologies

Tool	Purpose
MySQL	Database setup, data import, SQL-based analysis
Python (Jupyter Notebook)	Exploratory Data Analysis & visualizations
Microsoft Excel	Supplementary analysis and task documentation
Tableau	Interactive fraud dashboard
pandas, matplotlib, seaborn, scikit-learn	Python libraries for analysis & ML

🔄 Project Workflow

1. SQL Analysis (`Financial Fraud.sql`)

Database Setup:

Created FINANCE database with a CC_DATA table mirroring the raw CSV schema.
Loaded the dataset using LOAD DATA LOCAL INFILE.
Created a separate LOCATION_DATA table for geographic coordinates, joined to CC_DATA via CC_NUM.

SQL Queries Performed:

Analysis	Description
Total Transactions	Count of all records in the dataset
Top 10 Merchants	Most frequent merchants by transaction count
Avg. Transaction Amount by Category	Average spend per transaction category
Fraud Count & Percentage	Total fraudulent transactions and their share of all transactions
Transaction Geolocation	Joins `CC_DATA` with `LOCATION_DATA` to map each transaction to lat/long
City with Highest Population	Identifies the most populous city in the dataset
Transaction Date Range	Earliest and latest transaction timestamps
Total Transaction Value	Sum of all transaction amounts
Transactions by Category	Count of transactions per spending category
Avg. Amount by Gender	Average transaction amount split by cardholder gender
Transactions by Day of Week	Average transaction amount grouped by weekday

2. Python EDA (`EDA & Python.ipynb`)

The Jupyter Notebook covers end-to-end exploratory data analysis using pandas, matplotlib, seaborn, and scikit-learn:

Data Loading & Inspection — shape, dtypes, null checks, descriptive statistics
Fraud Distribution — class imbalance visualization (fraudulent vs. legitimate)
Transaction Amount Analysis — distribution plots, outlier detection
Category-wise Analysis — fraud rates per spending category
Time-based Analysis — fraud patterns by hour, day, and month
Geographic Analysis — mapping transaction and fraud locations
Correlation Analysis — heatmaps to identify feature relationships
Feature Engineering — extracting time-based features from transaction timestamps

3. Excel Analysis (`Excel Tasks.xlsb`)

Supplementary analysis covering pivot tables, summaries, and structured task outputs aligned with the project's problem statement.

4. Tableau Dashboard (`Financial Fraud Dashboard.twb`)

An interactive Tableau dashboard presenting:

🗺️ Geographic fraud map — fraud hotspots by location
📊 Fraud by category — which transaction types are most vulnerable
👤 Demographic breakdown — fraud patterns by gender and age group
📅 Time trends — fraud activity over time
💳 Top fraudulent merchants — most targeted merchant names

Open Financial Fraud Dashboard.twb in Tableau Desktop to explore the visuals.

🚀 How to Run This Project

Prerequisites

MySQL Server (v8.0+) with local_infile enabled
Python 3.8+ with Jupyter Notebook
Tableau Desktop (for .twb file)
Microsoft Excel (for .xlsb file)

Steps

1. Clone the repository

git clone https://github.com/PrakharSri18-data/Financial-Fraud-Analysis.git
cd Financial-Fraud-Analysis

2. Set up the database

Locate the dataset files inside the Dataset/ folder
Open Financial Fraud.sql in MySQL Workbench
Update the LOAD DATA LOCAL INFILE path to match your local file location
Run the full script

3. Run the Python notebook

pip install -r requirements.txt
jupyter notebook "EDA & Python.ipynb"

4. Open the Tableau dashboard

Launch Tableau Desktop
Open Financial Fraud Dashboard.twb
Re-connect the data source if prompted

📦 Python Dependencies

pandas==3.0.2
numpy==2.4.4
matplotlib==3.10.8
seaborn==0.13.2
scikit-learn==1.8.0
scipy==1.17.1

Full list available in requirements.txt

💡 Key Insights

The dataset is heavily imbalanced — fraudulent transactions make up only a small percentage of all records, highlighting the real-world challenge of fraud detection.
Certain transaction categories show disproportionately high fraud rates relative to their volume.
Geographic clustering reveals specific regions with elevated fraud activity.
Time-of-day patterns suggest fraudulent transactions tend to spike during off-peak hours.
Gender and city population show measurable differences in average transaction amounts.

📄 License

🙋 Author

Prakhar Srivastava

Data Analyst, Data Scientist & AI Engineer | Machine Learning, Deep Learning, Generative AI, Prompt Engineering & Agentic AI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 Financial Fraud Analysis

📁 Repository Structure

🗂️ Dataset

🛠️ Tools & Technologies

🔄 Project Workflow

1. SQL Analysis (`Financial Fraud.sql`)

2. Python EDA (`EDA & Python.ipynb`)

3. Excel Analysis (`Excel Tasks.xlsb`)

4. Tableau Dashboard (`Financial Fraud Dashboard.twb`)

🚀 How to Run This Project

Prerequisites

Steps

📦 Python Dependencies

💡 Key Insights

📄 License

🙋 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Dataset		Dataset
SQL Outputs		SQL Outputs
.gitattributes		.gitattributes
.gitignore		.gitignore
EDA & Python.ipynb		EDA & Python.ipynb
Excel Tasks.xlsb		Excel Tasks.xlsb
Financial Fraud Dashboard.twb		Financial Fraud Dashboard.twb
Financial Fraud.sql		Financial Fraud.sql
LICENSE		LICENSE
Problem Statement.pdf		Problem Statement.pdf
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🔍 Financial Fraud Analysis

📁 Repository Structure

🗂️ Dataset

🛠️ Tools & Technologies

🔄 Project Workflow

1. SQL Analysis (Financial Fraud.sql)

2. Python EDA (EDA & Python.ipynb)

3. Excel Analysis (Excel Tasks.xlsb)

4. Tableau Dashboard (Financial Fraud Dashboard.twb)

🚀 How to Run This Project

Prerequisites

Steps

📦 Python Dependencies

💡 Key Insights

📄 License

🙋 Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. SQL Analysis (`Financial Fraud.sql`)

2. Python EDA (`EDA & Python.ipynb`)

3. Excel Analysis (`Excel Tasks.xlsb`)

4. Tableau Dashboard (`Financial Fraud Dashboard.twb`)

Packages