Skip to content

Code-r4Life/Fraud-Call-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ” Real-Time Scam Call Detection System

Team: AUDIO

Members:

This project was developed during a national-level hackathon hosted at Jadavpur University (6โ€“8 March).

Out of 2000+ registrations, only 30 teams were selected for the offline hackathon. Our team ranked Top 4 during the first round (PPT submission) and progressed to the main hackathon stage.

Although we did not reach the final pitching round, the project successfully demonstrated a fully functional real-time scam call detection pipeline during the live mentor demo.

The system focuses on protecting vulnerable users (especially elderly people) from scam calls by detecting fraudulent patterns during live conversations using AI and speech analysis.


๐Ÿ“Œ 1. Project Overview

This repository contains a real-time AI system capable of detecting scam calls during live conversations.

The system analyzes audio streams in 4-second chunks, extracting multiple layers of features and combining the results of four machine learning models to produce a final risk score.

The architecture combines:

  • Speech signal processing
  • Acoustic feature analysis
  • Prosody detection
  • Scam keyword detection
  • Speech-to-text semantic analysis

The final system outputs a real-time risk score indicating whether a call is safe or potentially fraudulent.

Youtube Pitch

Watch the video

Repository Scope

This repository focuses on the Machine Learning components of the complete Scam Call Detection System.

It includes:

  • Training pipelines for the ML models
  • Dataset preprocessing and feature extraction
  • Model experimentation notebooks
  • Inference scripts for testing models
  • Evaluation utilities

The full end-to-end system also includes:

  • Android VoIP application for capturing call audio
  • FastAPI backend server for real-time inference
  • WebSocket streaming infrastructure

Those components are maintained in the main system repository, while this repository focuses specifically on the ML research and development layer.

๐Ÿ”— Full System Repository: Spam Call Detection System


โš™๏ธ 2. Key Features

โœ”๏ธ Real-time scam detection during live calls

โœ”๏ธ Multi-model AI ensemble

โœ”๏ธ Speech signal processing using MFCC features

โœ”๏ธ Urgency detection using prosody analysis

โœ”๏ธ Scam keyword detection using wakeword-style models

โœ”๏ธ Conversation stage tracking using Whisper transcription

โœ”๏ธ FastAPI backend with WebSocket streaming

โœ”๏ธ Android VoIP integration for live audio streaming


๐Ÿง  3. System Architecture

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         WebSocket          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Android App    โ”‚ โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚  Python Backend  โ”‚
โ”‚  (Kotlin)       โ”‚    4-sec audio chunks      โ”‚  (FastAPI)       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚                                               โ”‚
        โ”‚                                               โ–ผ
        โ”‚                                      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚                                      โ”‚  4 AI Models   โ”‚
        โ”‚                                      โ”‚  - Phoneme CNN โ”‚
        โ”‚                                      โ”‚  - Urgency XGB โ”‚
        โ”‚                                      โ”‚  - Repetition  โ”‚
        โ”‚                                      โ”‚  - Stage Track โ”‚
        โ”‚                                      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚                                               โ”‚
        โ”‚                                               โ–ผ
        โ”‚                                      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚                                      โ”‚  Risk Score    โ”‚
        โ”‚ โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚  (0.0 - 1.0)   โ”‚
        โ”‚         Risk Assessment              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  User Alert     โ”‚
โ”‚  โœ… SAFE        โ”‚
โ”‚  ๐ŸŸก LOW RISK    โ”‚
โ”‚  ๐ŸŸ  MODERATE    โ”‚
โ”‚  ๐Ÿ”ด HIGH RISK   โ”‚
โ”‚  ๐Ÿšจ SCAM ALERT  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿงฉ 4. Machine Learning Architecture

The detection system is composed of four independent AI layers, each focusing on a different aspect of scam behavior.


4.1 Phoneme Layer (MFCC + CNN)

This model analyzes phonetic patterns in speech.

Input:

  • 4-second audio chunk
  • 120 MFCC features

Model:

  • CNN-based binary classifier

Output:

  • Probability of scam speech patterns.

4.2 Prosody Layer (Urgency Detection)

Scammers often speak with high urgency and pressure.

This layer analyzes:

  • Pitch variation
  • Speech energy
  • Speech rate

Output score example:

1 โ†’ Normal speech
2 โ†’ Slight urgency
3 โ†’ High urgency (potential scam)

4.3 Repetition Detection Layer

This layer detects scam-related keywords using a wakeword-style detection model.

Example scam keywords:

OTP
UPI
Refund
Lottery
Reward
Emergency
Code

Since no public dataset existed for scam wakewords, we created our own dataset using recordings from friends and family.

Dataset statistics:

  • 15 keyword classes
  • 30+ recordings per class
  • ~1200 positive samples
  • Large negative dataset (Google Speech Commands + noise)

Model performance:

  • Validation Accuracy: 98.21%
  • Test Accuracy: 98.44%

4.4 Semantic Layer (Speech-to-Text Analysis)

To reduce false positives, we implemented a speech transcription stage.

Using Whisper / Whisper.cpp, the system transcribes each audio chunk and analyzes the text for scam conversation patterns.

Example scam phrases:

Hello sir, I am calling from SBI bank
Sir I have sent you an OTP
You have won a lottery
Can you confirm your card details?

The system tracks conversation stages, such as:

  1. Greeting
  2. Authority claim
  3. Problem creation
  4. Urgency
  5. Data request

Each stage increases the risk score.


โšก 5. Risk Fusion Engine

Outputs from all four models are combined using a weighted linear fusion.

Final Risk Score =
w1 ร— semantic_score +
w2 ร— repetition_score +
w3 ร— phoneme_score +
w4 ร— urgency_score

The result is mapped into risk categories:

โœ… SAFE
๐ŸŸก LOW RISK
๐ŸŸ  MODERATE RISK
๐Ÿ”ด HIGH RISK
๐Ÿšจ SCAM ALERT

๐Ÿ“ก 6. Real-Time Audio Pipeline

The backend processes audio using the following pipeline:

  1. Receive 4-second audio chunk
  2. Apply WebRTC Voice Activity Detection
  3. Remove silence
  4. Extract audio features
  5. Run models in parallel
  6. Combine results
  7. Send risk prediction back to mobile app

Running models asynchronously reduces latency and improves real-time performance.


๐Ÿ“ 7. Repository Structure

FRAUD-CALL-DETECTION/
โ”‚
โ”œโ”€โ”€ features/                         # Raw audio features
โ”‚   โ”œโ”€โ”€ NORMAL_CALLS/                 # Normal call features
โ”‚   โ””โ”€โ”€ SCAM_CALLS/                   # Scam call features
โ”‚
โ”œโ”€โ”€ mfcc_labels.csv                   # MFCC feature labels
โ”œโ”€โ”€ prosody_labels.csv                # Prosody feature labels
โ”‚
โ”œโ”€โ”€ models/                           # Trained ML models
โ”‚   โ”œโ”€โ”€ best_phoneme_model.keras
โ”‚   โ”œโ”€โ”€ best_prosody_xgb_model.pkl
โ”‚   โ””โ”€โ”€ best_repetition_model.keras
โ”‚
โ”œโ”€โ”€ notebooks/                        # Training notebooks
โ”‚   โ”œโ”€โ”€ phoneme_layer.ipynb
โ”‚   โ”œโ”€โ”€ prosody_layer.ipynb
โ”‚   โ”œโ”€โ”€ repetition_layer.ipynb
โ”‚   โ””โ”€โ”€ scam_detection_whisper_pipeline.ipynb
โ”‚
โ”œโ”€โ”€ rep_features/                  # Repetition detection features
โ”œโ”€โ”€ rep_features/                  # Repetition detection features
โ”‚   โ”œโ”€โ”€ train/                     # Train features
โ”‚   โ”œโ”€โ”€ val/                       # Validation features
โ”‚   โ””โ”€โ”€ test/                      # Test features
โ”‚
โ”œโ”€โ”€ train_labels.csv              # Train labels
โ”œโ”€โ”€ val_labels.csv                # Validation labels
โ”œโ”€โ”€ test_labels.csv               # Test labels
โ”‚
โ”œโ”€โ”€ scam_detection/                   # Core inference pipeline
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ audio_pipeline.py
โ”‚   โ”œโ”€โ”€ config.py
โ”‚   โ”œโ”€โ”€ feature_extraction.py
โ”‚   โ””โ”€โ”€ main.py
โ”‚
โ”œโ”€โ”€ convert.py                        # Dataset conversion utilities
โ”œโ”€โ”€ preprocessing.py                  # Dataset preprocessing
โ”œโ”€โ”€ repetition_preprocessing.py       # Repetition dataset preprocessing
โ”œโ”€โ”€ realtime_whisper.py               # Whisper-based transcription pipeline
โ”œโ”€โ”€ final_inference.py                # End-to-end inference testing
โ”‚
โ”œโ”€โ”€ ASA_Meeting_Spam_Detection_in_Voice.pdf  # Research reference
โ”‚
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ LICENSE
โ””โ”€โ”€ README.md

๐Ÿ“Š 8. Technologies Used

Backend

  • FastAPI
  • TensorFlow / Keras
  • XGBoost
  • Librosa
  • WebRTC VAD
  • Whisper.cpp
  • Sentence Transformers
  • NumPy / Pandas

โš ๏ธ Android components are part of the full system repository.


๐Ÿ”ฎ 9. Future Improvements

  • Multi-language scam detection (Hindi + regional languages)
  • On-device inference with TensorFlow Lite
  • Larger scam speech dataset
  • Sentiment-based conversation analysis
  • Integration with telecom spam detection systems

๐Ÿ“ฌ Interested in a Similar Project?

I build smart, ML-integrated applications and responsive web platforms. Letโ€™s build something powerful together!

๐Ÿ“ง shinjansaha00@gmail.com

๐Ÿ”— LinkedIn Profile

About

Real-time scam call detection system using speech analysis, MFCC features, and multi-layer machine learning models to identify fraudulent calls during live conversations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors