Skip to content

Latest commit

 

History

History
454 lines (337 loc) · 10.6 KB

File metadata and controls

454 lines (337 loc) · 10.6 KB

📡 TeleSentry AI

Explainable Telecom Fraud Detection Platform using Machine Learning, Rule-Based Intelligence, SHAP, FastAPI, and Streamlit

Python Scikit-Learn Streamlit FastAPI SHAP License


📖 Overview

TeleSentry AI is an end-to-end Telecom Fraud Detection Platform designed to identify suspicious calling behavior using a combination of:

  • Rule-Based Fraud Intelligence
  • Isolation Forest Anomaly Detection
  • Random Forest Classification
  • SHAP Explainability
  • Interactive Streamlit Dashboard
  • FastAPI Prediction Service

The system simulates realistic telecom users and fraudsters, engineers behavioral telecom features, detects suspicious activities, explains predictions, and exposes results through a dashboard and API.


🎯 Problem Statement

Telecommunication fraud has become increasingly sophisticated.

Common fraud patterns include:

  • Digital Arrest Scams
  • Mass Calling Operations
  • Long Distance Fraud Rings
  • Social Engineering Networks
  • Automated Calling Bots

Traditional rule-based systems fail to detect new fraud patterns, while pure machine learning systems often lack interpretability.

TeleSentry AI combines both approaches to deliver:

  • High detection accuracy
  • Transparent predictions
  • Real-time fraud assessment

🏗 Architecture

┌─────────────────────────────────────────┐
│          Synthetic Data Generator       │
│        (Telecom User Simulation)        │
└──────────────────┬──────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│           Raw Synthetic Dataset         │
│      generated_dataset.csv (13k+)       │
└──────────────────┬──────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│         Data Preprocessing Layer        │
│                                         │
│ • Cleaning                              │
│ • Validation                            │
│ • Train/Test Split                      │
└──────────────────┬──────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│        Feature Engineering Layer        │
│                                         │
│ • call_intensity                        │
│ • distance_per_call                     │
│ • contact_circle_ratio                  │
│ • delivery_pattern                      │
│ • high_freq_long_distance               │
└──────────────────┬──────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│          Rule Engine Layer              │
│                                         │
│ • Digital Arrest Detection              │
│ • Mass Calling Detection                │
│ • Long Distance Scam Detection          │
│ • Traveler Detection                    │
│ • Business User Detection               │
└──────────────────┬──────────────────────┘
                   │
                   ▼
      ┌─────────────────────────┐
      │      ML Layer           │
      │                         │
      │ Isolation Forest        │
      │ Random Forest           │
      └───────────┬─────────────┘
                  │
                  ▼
┌─────────────────────────────────────────┐
│          Evaluation Layer               │
│                                         │
│ Accuracy                                │
│ Precision                               │
│ Recall                                  │
│ F1 Score                                │
│ ROC-AUC                                 │
└──────────────────┬──────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│         Explainability Layer            │
│                                         │
│ SHAP Summary                            │
│ SHAP Waterfall                          │
│ Feature Importance                      │
└──────────────────┬──────────────────────┘
                   │
          ┌────────┴─────────┐
          ▼                  ▼
┌────────────────┐  ┌──────────────────┐
│ Streamlit UI   │  │   FastAPI API    │
│                │  │                  │
│ Dashboard      │  │ /predict         │
│ Analytics      │  │ /health          │
│ Live Predict   │  │ Swagger Docs     │
└────────────────┘  └──────────────────┘

System Flow

Synthetic Data Generation
          ↓
Data Preprocessing
          ↓
Feature Engineering
          ↓
Rule Engine
          ↓
Machine Learning Layer
          ↓
Evaluation Layer
          ↓
SHAP Explainability
          ↓
Streamlit Dashboard + FastAPI

📂 Project Structure

TeleSentry-AI/
│
├── api/
├── dashboard/
├── data/
├── notebooks/
├── reports/
├── saved_models/
├── src/
├── tests/
│
├── README.md
├── requirements.txt
├── requirements-lock.txt
├── LICENSE
├── VERSION
└── .env.example

⚙️ Features

Synthetic Telecom Dataset Generator

Generates realistic telecom profiles:

Legitimate Users

  • Delivery Partners
  • Business Users
  • Regular Subscribers
  • Traveling Professionals

Fraud Profiles

  • Digital Arrest Bots
  • Traditional Scammers
  • Low Volume Fraudsters

Feature Engineering

Generated telecom intelligence features:

Feature Description
call_intensity Calling activity level
distance_per_call Average call distance ratio
contact_circle_ratio Contact diversity ratio
delivery_pattern Delivery behavior pattern
high_freq_long_distance Suspicious high-volume calling

Rule Engine

Fraud intelligence layer:

  • Digital Arrest Detection
  • Mass Calling Detection
  • Long Distance Scam Detection
  • Traveler Detection
  • Business User Detection
  • Delivery Pattern Detection

Machine Learning Models

Isolation Forest

Purpose:

  • Unsupervised anomaly detection
  • Detection of unusual telecom behavior

Random Forest

Purpose:

  • Supervised fraud classification
  • Fraud probability estimation

📊 Model Performance

Metric Score
Accuracy 98%+
Precision 97%+
Recall 98%+
F1 Score 98%+
ROC-AUC 99%+

🧠 Explainable AI

TeleSentry AI uses SHAP (SHapley Additive Explanations).

Generated explanations include:

  • SHAP Summary Plot
  • SHAP Waterfall Plot
  • Feature Importance Analysis

Top fraud indicators:

  • avgCallDistance
  • circleDiversity
  • call_intensity
  • avgDuration
  • high_freq_long_distance

📈 Dashboard

Interactive Streamlit dashboard provides:

Dataset Overview

  • Dataset statistics
  • Fraud distribution
  • User type analysis
  • Operator analysis

Model Analytics

  • Accuracy
  • Precision
  • Recall
  • F1 Score
  • ROC Curve
  • Confusion Matrix

Live Fraud Prediction

Predict fraud risk using telecom activity metrics.

Rule Engine Analytics

Visualize fraud intelligence triggers.

SHAP Explainability

Interpret model decisions.


🚀 FastAPI Backend

Endpoints:

Root

GET /

Health Check

GET /health

Prediction

POST /predict

Example Request:

{
  "avg_duration": 5,
  "call_frequency": 150,
  "unique_contacts": 100,
  "avg_distance": 600,
  "circle_diversity": 8
}

Example Response:

{
  "prediction": "FRAUD",
  "fraud_probability": 0.98,
  "risk_level": "CRITICAL"
}

🛠 Installation

Clone Repository

git clone https://github.com/7vik2005/TeleSentry-AI.git

cd TeleSentry-AI

Install Dependencies

pip install -r requirements.txt

▶ Running The Project

Generate Dataset

python -m src.data_generation.generator

Apply Rule Engine

python -m src.rule_engine.rules

Train Models

python -m src.models.random_forest

Generate SHAP Explanations

python -m src.explainability.shap_explainer

Launch Dashboard

python -m streamlit run dashboard/app.py

Launch API

python -m uvicorn api.app:app --reload

📚 Technologies Used

  • Python
  • Pandas
  • NumPy
  • Scikit-Learn
  • SHAP
  • FastAPI
  • Streamlit
  • Plotly
  • Matplotlib
  • Faker

🔮 Future Enhancements

  • XGBoost Integration
  • Real Telecom Data Support
  • Real-Time Streaming Detection
  • Docker Deployment
  • Cloud Deployment
  • Automated Retraining Pipeline
  • MLOps Integration

👨‍💻 Author

Satvik Jambagi

Machine Learning | Data Science | AI Engineering


📜 License

This project is licensed under the MIT License.