Members:
- Shinjan Saha - AI/ML Development
- Arya Gupta - All rounder (Team Lead + System Design + Contribution in all other aspects)
- Srijan Sarkar - Android App Development
- Pritam Paul - AI/ML Developement
This project was developed during a national-level hackathon hosted at Jadavpur University (6โ8 March).
Out of 2000+ registrations, only 30 teams were selected for the offline hackathon. Our team ranked Top 4 during the first round (PPT submission) and progressed to the main hackathon stage.
Although we did not reach the final pitching round, the project successfully demonstrated a fully functional real-time scam call detection pipeline during the live mentor demo.
The system focuses on protecting vulnerable users (especially elderly people) from scam calls by detecting fraudulent patterns during live conversations using AI and speech analysis.
This repository contains a real-time AI system capable of detecting scam calls during live conversations.
The system analyzes audio streams in 4-second chunks, extracting multiple layers of features and combining the results of four machine learning models to produce a final risk score.
The architecture combines:
- Speech signal processing
- Acoustic feature analysis
- Prosody detection
- Scam keyword detection
- Speech-to-text semantic analysis
The final system outputs a real-time risk score indicating whether a call is safe or potentially fraudulent.
This repository focuses on the Machine Learning components of the complete Scam Call Detection System.
It includes:
- Training pipelines for the ML models
- Dataset preprocessing and feature extraction
- Model experimentation notebooks
- Inference scripts for testing models
- Evaluation utilities
The full end-to-end system also includes:
- Android VoIP application for capturing call audio
- FastAPI backend server for real-time inference
- WebSocket streaming infrastructure
Those components are maintained in the main system repository, while this repository focuses specifically on the ML research and development layer.
๐ Full System Repository: Spam Call Detection System
โ๏ธ Real-time scam detection during live calls
โ๏ธ Multi-model AI ensemble
โ๏ธ Speech signal processing using MFCC features
โ๏ธ Urgency detection using prosody analysis
โ๏ธ Scam keyword detection using wakeword-style models
โ๏ธ Conversation stage tracking using Whisper transcription
โ๏ธ FastAPI backend with WebSocket streaming
โ๏ธ Android VoIP integration for live audio streaming
โโโโโโโโโโโโโโโโโโโ WebSocket โโโโโโโโโโโโโโโโโโโโ
โ Android App โ โโโโโโโโโโโโโโโโโโโโโโโโโโบ โ Python Backend โ
โ (Kotlin) โ 4-sec audio chunks โ (FastAPI) โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ โ
โ โผ
โ โโโโโโโโโโโโโโโโโโ
โ โ 4 AI Models โ
โ โ - Phoneme CNN โ
โ โ - Urgency XGB โ
โ โ - Repetition โ
โ โ - Stage Track โ
โ โโโโโโโโโโโโโโโโโโ
โ โ
โ โผ
โ โโโโโโโโโโโโโโโโโโ
โ โ Risk Score โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ (0.0 - 1.0) โ
โ Risk Assessment โโโโโโโโโโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโ
โ User Alert โ
โ โ
SAFE โ
โ ๐ก LOW RISK โ
โ ๐ MODERATE โ
โ ๐ด HIGH RISK โ
โ ๐จ SCAM ALERT โ
โโโโโโโโโโโโโโโโโโโ
The detection system is composed of four independent AI layers, each focusing on a different aspect of scam behavior.
This model analyzes phonetic patterns in speech.
Input:
- 4-second audio chunk
- 120 MFCC features
Model:
- CNN-based binary classifier
Output:
- Probability of scam speech patterns.
Scammers often speak with high urgency and pressure.
This layer analyzes:
- Pitch variation
- Speech energy
- Speech rate
Output score example:
1 โ Normal speech
2 โ Slight urgency
3 โ High urgency (potential scam)
This layer detects scam-related keywords using a wakeword-style detection model.
Example scam keywords:
OTP
UPI
Refund
Lottery
Reward
Emergency
Code
Since no public dataset existed for scam wakewords, we created our own dataset using recordings from friends and family.
Dataset statistics:
- 15 keyword classes
- 30+ recordings per class
- ~1200 positive samples
- Large negative dataset (Google Speech Commands + noise)
Model performance:
- Validation Accuracy: 98.21%
- Test Accuracy: 98.44%
To reduce false positives, we implemented a speech transcription stage.
Using Whisper / Whisper.cpp, the system transcribes each audio chunk and analyzes the text for scam conversation patterns.
Example scam phrases:
Hello sir, I am calling from SBI bank
Sir I have sent you an OTP
You have won a lottery
Can you confirm your card details?
The system tracks conversation stages, such as:
- Greeting
- Authority claim
- Problem creation
- Urgency
- Data request
Each stage increases the risk score.
Outputs from all four models are combined using a weighted linear fusion.
Final Risk Score =
w1 ร semantic_score +
w2 ร repetition_score +
w3 ร phoneme_score +
w4 ร urgency_score
The result is mapped into risk categories:
โ
SAFE
๐ก LOW RISK
๐ MODERATE RISK
๐ด HIGH RISK
๐จ SCAM ALERT
The backend processes audio using the following pipeline:
- Receive 4-second audio chunk
- Apply WebRTC Voice Activity Detection
- Remove silence
- Extract audio features
- Run models in parallel
- Combine results
- Send risk prediction back to mobile app
Running models asynchronously reduces latency and improves real-time performance.
FRAUD-CALL-DETECTION/
โ
โโโ features/ # Raw audio features
โ โโโ NORMAL_CALLS/ # Normal call features
โ โโโ SCAM_CALLS/ # Scam call features
โ
โโโ mfcc_labels.csv # MFCC feature labels
โโโ prosody_labels.csv # Prosody feature labels
โ
โโโ models/ # Trained ML models
โ โโโ best_phoneme_model.keras
โ โโโ best_prosody_xgb_model.pkl
โ โโโ best_repetition_model.keras
โ
โโโ notebooks/ # Training notebooks
โ โโโ phoneme_layer.ipynb
โ โโโ prosody_layer.ipynb
โ โโโ repetition_layer.ipynb
โ โโโ scam_detection_whisper_pipeline.ipynb
โ
โโโ rep_features/ # Repetition detection features
โโโ rep_features/ # Repetition detection features
โ โโโ train/ # Train features
โ โโโ val/ # Validation features
โ โโโ test/ # Test features
โ
โโโ train_labels.csv # Train labels
โโโ val_labels.csv # Validation labels
โโโ test_labels.csv # Test labels
โ
โโโ scam_detection/ # Core inference pipeline
โ โโโ __init__.py
โ โโโ audio_pipeline.py
โ โโโ config.py
โ โโโ feature_extraction.py
โ โโโ main.py
โ
โโโ convert.py # Dataset conversion utilities
โโโ preprocessing.py # Dataset preprocessing
โโโ repetition_preprocessing.py # Repetition dataset preprocessing
โโโ realtime_whisper.py # Whisper-based transcription pipeline
โโโ final_inference.py # End-to-end inference testing
โ
โโโ ASA_Meeting_Spam_Detection_in_Voice.pdf # Research reference
โ
โโโ .gitignore
โโโ requirements.txt
โโโ LICENSE
โโโ README.md
- FastAPI
- TensorFlow / Keras
- XGBoost
- Librosa
- WebRTC VAD
- Whisper.cpp
- Sentence Transformers
- NumPy / Pandas
- Multi-language scam detection (Hindi + regional languages)
- On-device inference with TensorFlow Lite
- Larger scam speech dataset
- Sentiment-based conversation analysis
- Integration with telecom spam detection systems
I build smart, ML-integrated applications and responsive web platforms. Letโs build something powerful together!
๐ LinkedIn Profile
