🔬 AI-Driven Criminal Scam Analysis & Case Tracking System

Module: AI-Powered Criminology Tools
Author: Ahmad Raza

📋 Project Overview

A full-stack forensics application that uses a Naive Bayes AI model to classify suspicious messages as:

Badge	Verdict	Threshold
🔴	Critical Scam	AI confidence ≥ 70%
🟡	Suspicious	AI confidence 40–69%
🟢	Legitimate	AI confidence < 40%

Every analysis is logged to a persistent CSV case database, and a formal PDF Forensic Report can be downloaded for each case.

📁 File Structure

scam_analysis/
│
├── model_trainer.py        # Train the AI model → saves spam_model.pkl
├── database_manager.py     # All CSV/TXT file I/O operations
├── pdf_report_generator.py # ReportLab PDF forensic report builder
├── app.py                  # Main Streamlit UI (3 pages)
│
├── spam.csv                # Training dataset (label, message)
├── spam_model.pkl          # Generated by model_trainer.py (do not edit)
├── crime_database.csv      # Auto-generated: append-only case ledger
├── session_report.txt      # Auto-generated: per-session audit summary
│
├── requirements.txt        # Python dependencies
└── README.md               # This file

⚙️ Setup Instructions

1. Create a Virtual Environment (Recommended)

python -m venv venv

# Windows
venv\Scripts\activate

# macOS / Linux
source venv/bin/activate

2. Install Dependencies

pip install -r requirements.txt

3. Train the AI Model (run once)

python model_trainer.py

Expected output:

============================================================
  AR Forensics & CyberSecurity Labs — Model Training Session
============================================================

[1/5] Loading & validating dataset …
  ✓ Loaded 5572 records  |  Dropped 0 nulls
  ✓ Class distribution:
    ham     4825
    spam    747

[2/5] Cleaning evidence text …
[3/5] Splitting into training and evaluation sets …
  ✓ Training samples : 4179
  ✓ Test samples     : 1393

[4/5] Training Forensic AI Pipeline …
[5/5] Evaluating model performance …

  ACCURACY  : 98.28%
  ----------------------------------------
  ...classification report...

  ✓ Model saved  →  'spam_model.pkl'

4. Launch the Streamlit App

streamlit run app.py

The app opens at http://localhost:8501

🖥️ Application Pages

📊 Dashboard

Live KPI cards: Total Cases, Critical Scams, Suspicious, Legitimate
Pie Chart — verdict distribution across all cases
Bar Chart — red-flag keyword frequency in the evidence database
Recent cases table

🔬 New Analysis

Paste any suspicious message text into the evidence field
Quick-fill examples provided (scam, suspicious, legitimate)
AI returns colour-coded verdict + confidence gauge
Download PDF Forensic Report (ReportLab, A4, court-ready format)

📁 Crime Records

Search cases by keyword in evidence text
Search by Case ID (e.g., AR-A3F1B2C4)
Filter by verdict type
Detail viewer: expand any case to inspect full evidence + download its PDF
Export filtered results as CSV

📄 Forensic PDF Report Contents

Each PDF includes:

AR Forensics & CyberSecurity Labs Letter Head
Case metadata (Case ID, Timestamp, AI Engine, Model version)
Full evidence text (Exhibit A)
AI Verdict panel with colour-coded badge
Analyst interpretation notes
Digital signature block for physical signing

🧪 Self-Test Commands

Run the database manager in isolation to verify I/O:

python database_manager.py

🗂️ Dataset Format

spam.csv supports two formats automatically:

Standard format (manual/custom datasets):

label,message
spam,"WINNER!! You have been selected..."
ham,"Hey, are you coming to the lecture..."

Kaggle format (SMS Spam Collection — recommended):

v1,v2
spam,"WINNER!! You have been selected..."
ham,"Hey, are you coming to the lecture..."

The trainer auto-remaps v1 → label and v2 → message, and drops any extra columns (v3, v4, v5).

To use the Kaggle dataset:

Download from: https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset
Place it in the project root
Run python model_trainer.py — no other changes needed

Expected dataset stats (Kaggle):

	Value
Total records	5,572
Ham (legitimate)	4,825 (87%)
Spam	747 (13%)
Training samples (75%)	4,179
Test samples (25%)	1,393

🔑 Key Variable Name Reference (Viva Prep)

Variable	Purpose
`crime_evidence_text`	Raw input message from investigator
`cleaned_evidence`	Sanitised text after noise removal
`forensic_ai_model`	Loaded sklearn Pipeline (TF-IDF + NaiveBayes)
`scam_probability`	P(scam) from `predict_proba()`
`ai_verdict`	Final classification: CRITICAL SCAM / SUSPICIOUS / LEGITIMATE
`assigned_case_id`	UUID-based case identifier (e.g. AR-A3F1B2C4)
`crime_dataset`	Full pandas DataFrame loaded from CSV
`forensic_ml_pipeline`	sklearn Pipeline object
`pdf_byte_buffer`	In-memory BytesIO object for PDF generation

🛡️ Error Handling Coverage

Scenario	Handler
Model file not found	`st.error()` with instructions to run trainer
Dataset file missing	`FileNotFoundError` with descriptive message
CSV columns wrong	`ValueError` listing expected vs found columns
Database write fails	`IOError` surfaced to Streamlit UI
PDF generation fails	`try-except` in Crime Records detail view
Empty database reads	Returns empty DataFrame with correct columns

📚 Technologies Used

Library	Role
`scikit-learn`	ML pipeline: TF-IDF vectorizer + MultinomialNB
`joblib`	Model serialisation / deserialisation
`streamlit`	Web UI framework
`pandas`	Data manipulation and CSV I/O
`matplotlib`	Pie chart + bar chart visualisations
`reportlab`	Professional A4 PDF report generation

AI-Driven Criminal Scam Analysis & Case Tracking System — For Academic Use & Learning Purposes Only

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔬 AI-Driven Criminal Scam Analysis & Case Tracking System

📋 Project Overview

📁 File Structure

⚙️ Setup Instructions

1. Create a Virtual Environment (Recommended)

2. Install Dependencies

3. Train the AI Model (run once)

4. Launch the Streamlit App

🖥️ Application Pages

📊 Dashboard

🔬 New Analysis

📁 Crime Records

📄 Forensic PDF Report Contents

🧪 Self-Test Commands

🗂️ Dataset Format

🔑 Key Variable Name Reference (Viva Prep)

🛡️ Error Handling Coverage

📚 Technologies Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.devcontainer		.devcontainer
.gitignore		.gitignore
README.md		README.md
app.py		app.py
database_manager.py		database_manager.py
model_trainer.py		model_trainer.py
pdf_report_generator.py		pdf_report_generator.py
requirements.txt		requirements.txt
spam.csv		spam.csv

Folders and files

Latest commit

History

Repository files navigation

🔬 AI-Driven Criminal Scam Analysis & Case Tracking System

📋 Project Overview

📁 File Structure

⚙️ Setup Instructions

1. Create a Virtual Environment (Recommended)

2. Install Dependencies

3. Train the AI Model (run once)

4. Launch the Streamlit App

🖥️ Application Pages

📊 Dashboard

🔬 New Analysis

📁 Crime Records

📄 Forensic PDF Report Contents

🧪 Self-Test Commands

🗂️ Dataset Format

🔑 Key Variable Name Reference (Viva Prep)

🛡️ Error Handling Coverage

📚 Technologies Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages