📡 TeleSentry AI

Explainable Telecom Fraud Detection Platform using Machine Learning, Rule-Based Intelligence, SHAP, FastAPI, and Streamlit

📖 Overview

TeleSentry AI is an end-to-end Telecom Fraud Detection Platform designed to identify suspicious calling behavior using a combination of:

Rule-Based Fraud Intelligence
Isolation Forest Anomaly Detection
Random Forest Classification
SHAP Explainability
Interactive Streamlit Dashboard
FastAPI Prediction Service

The system simulates realistic telecom users and fraudsters, engineers behavioral telecom features, detects suspicious activities, explains predictions, and exposes results through a dashboard and API.

🎯 Problem Statement

Telecommunication fraud has become increasingly sophisticated.

Common fraud patterns include:

Digital Arrest Scams
Mass Calling Operations
Long Distance Fraud Rings
Social Engineering Networks
Automated Calling Bots

Traditional rule-based systems fail to detect new fraud patterns, while pure machine learning systems often lack interpretability.

TeleSentry AI combines both approaches to deliver:

High detection accuracy
Transparent predictions
Real-time fraud assessment

🏗 Architecture

┌─────────────────────────────────────────┐
│          Synthetic Data Generator       │
│        (Telecom User Simulation)        │
└──────────────────┬──────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│           Raw Synthetic Dataset         │
│      generated_dataset.csv (13k+)       │
└──────────────────┬──────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│         Data Preprocessing Layer        │
│                                         │
│ • Cleaning                              │
│ • Validation                            │
│ • Train/Test Split                      │
└──────────────────┬──────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│        Feature Engineering Layer        │
│                                         │
│ • call_intensity                        │
│ • distance_per_call                     │
│ • contact_circle_ratio                  │
│ • delivery_pattern                      │
│ • high_freq_long_distance               │
└──────────────────┬──────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│          Rule Engine Layer              │
│                                         │
│ • Digital Arrest Detection              │
│ • Mass Calling Detection                │
│ • Long Distance Scam Detection          │
│ • Traveler Detection                    │
│ • Business User Detection               │
└──────────────────┬──────────────────────┘
                   │
                   ▼
      ┌─────────────────────────┐
      │      ML Layer           │
      │                         │
      │ Isolation Forest        │
      │ Random Forest           │
      └───────────┬─────────────┘
                  │
                  ▼
┌─────────────────────────────────────────┐
│          Evaluation Layer               │
│                                         │
│ Accuracy                                │
│ Precision                               │
│ Recall                                  │
│ F1 Score                                │
│ ROC-AUC                                 │
└──────────────────┬──────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│         Explainability Layer            │
│                                         │
│ SHAP Summary                            │
│ SHAP Waterfall                          │
│ Feature Importance                      │
└──────────────────┬──────────────────────┘
                   │
          ┌────────┴─────────┐
          ▼                  ▼
┌────────────────┐  ┌──────────────────┐
│ Streamlit UI   │  │   FastAPI API    │
│                │  │                  │
│ Dashboard      │  │ /predict         │
│ Analytics      │  │ /health          │
│ Live Predict   │  │ Swagger Docs     │
└────────────────┘  └──────────────────┘

System Flow

Synthetic Data Generation
          ↓
Data Preprocessing
          ↓
Feature Engineering
          ↓
Rule Engine
          ↓
Machine Learning Layer
          ↓
Evaluation Layer
          ↓
SHAP Explainability
          ↓
Streamlit Dashboard + FastAPI

📂 Project Structure

TeleSentry-AI/
│
├── api/
├── dashboard/
├── data/
├── notebooks/
├── reports/
├── saved_models/
├── src/
├── tests/
│
├── README.md
├── requirements.txt
├── requirements-lock.txt
├── LICENSE
├── VERSION
└── .env.example

⚙️ Features

Synthetic Telecom Dataset Generator

Generates realistic telecom profiles:

Legitimate Users

Delivery Partners
Business Users
Regular Subscribers
Traveling Professionals

Fraud Profiles

Digital Arrest Bots
Traditional Scammers
Low Volume Fraudsters

Feature Engineering

Generated telecom intelligence features:

Feature	Description
call_intensity	Calling activity level
distance_per_call	Average call distance ratio
contact_circle_ratio	Contact diversity ratio
delivery_pattern	Delivery behavior pattern
high_freq_long_distance	Suspicious high-volume calling

Rule Engine

Fraud intelligence layer:

Digital Arrest Detection
Mass Calling Detection
Long Distance Scam Detection
Traveler Detection
Business User Detection
Delivery Pattern Detection

Machine Learning Models

Isolation Forest

Purpose:

Unsupervised anomaly detection
Detection of unusual telecom behavior

Random Forest

Purpose:

Supervised fraud classification
Fraud probability estimation

📊 Model Performance

Metric	Score
Accuracy	98%+
Precision	97%+
Recall	98%+
F1 Score	98%+
ROC-AUC	99%+

🧠 Explainable AI

TeleSentry AI uses SHAP (SHapley Additive Explanations).

Generated explanations include:

SHAP Summary Plot
SHAP Waterfall Plot
Feature Importance Analysis

Top fraud indicators:

avgCallDistance
circleDiversity
call_intensity
avgDuration
high_freq_long_distance

📈 Dashboard

Interactive Streamlit dashboard provides:

Dataset Overview

Dataset statistics
Fraud distribution
User type analysis
Operator analysis

Model Analytics

Accuracy
Precision
Recall
F1 Score
ROC Curve
Confusion Matrix

Live Fraud Prediction

Predict fraud risk using telecom activity metrics.

Rule Engine Analytics

Visualize fraud intelligence triggers.

SHAP Explainability

Interpret model decisions.

🚀 FastAPI Backend

Endpoints:

Root

GET /

Health Check

GET /health

Prediction

POST /predict

Example Request:

{
  "avg_duration": 5,
  "call_frequency": 150,
  "unique_contacts": 100,
  "avg_distance": 600,
  "circle_diversity": 8
}

Example Response:

{
  "prediction": "FRAUD",
  "fraud_probability": 0.98,
  "risk_level": "CRITICAL"
}

🛠 Installation

Clone Repository

git clone https://github.com/7vik2005/TeleSentry-AI.git

cd TeleSentry-AI

Install Dependencies

pip install -r requirements.txt

▶ Running The Project

Generate Dataset

python -m src.data_generation.generator

Apply Rule Engine

python -m src.rule_engine.rules

Train Models

python -m src.models.random_forest

Generate SHAP Explanations

python -m src.explainability.shap_explainer

Launch Dashboard

python -m streamlit run dashboard/app.py

Launch API

python -m uvicorn api.app:app --reload

📚 Technologies Used

Python
Pandas
NumPy
Scikit-Learn
SHAP
FastAPI
Streamlit
Plotly
Matplotlib
Faker

🔮 Future Enhancements

XGBoost Integration
Real Telecom Data Support
Real-Time Streaming Detection
Docker Deployment
Cloud Deployment
Automated Retraining Pipeline
MLOps Integration

FilesExpand file tree

README.md

Latest commit

History