Skip to content

A machine learning project for detecting and classifying spam messages using Python, built with a Jupyter Notebook for interactive exploration and model development.

Notifications You must be signed in to change notification settings

Code-With-Samuel/Spam_Email_Classifier

Repository files navigation

Email Spam Classifier

A machine learning project for detecting and classifying spam messages using Python, built with a Jupyter Notebook for interactive exploration and model development.

Overview

This project implements an SMS/Email spam detection system using machine learning techniques. It includes comprehensive data exploration, feature engineering, model training, evaluation, and visualization.

Features

  • Data Analysis: Comprehensive exploratory data analysis (EDA) of spam/ham messages
  • Text Preprocessing: Tokenization, cleaning, and vectorization of text data
  • Multiple Models: Implementation and comparison of various classification algorithms
  • Visualization: Detailed visualizations of data distributions and model performance
  • Model Evaluation: Comprehensive metrics including confusion matrices, ROC curves, and performance comparisons

Project Structure

email_spam_classifier/
├── sms-spam-detection.ipynb    # Main Jupyter notebook with full analysis and modeling
├── spam.csv                     # Dataset containing SMS messages labeled as spam/ham
├── requirements.txt             # Python dependencies
├── pyproject.toml              # Project configuration
├── hello.py                    # Basic project entry point
└── README.md                   # This file

Dataset

The project uses the spam.csv file containing SMS messages with labels:

  • Ham: Legitimate messages
  • Spam: Unwanted/spam messages

Getting Started

Prerequisites

  • Python 3.10 or higher
  • pip or conda for package management

Installation

  1. Clone or download this repository
  2. Install dependencies:
pip install -r requirements.txt
  1. Launch Jupyter Notebook:
jupyter notebook
  1. Open sms-spam-detection.ipynb and run the cells

Dependencies

Core dependencies:

  • notebook - Jupyter notebook support
  • pandas - Data manipulation
  • scikit-learn - Machine learning algorithms
  • matplotlib/seaborn - Data visualization
  • numpy - Numerical computing

See requirements.txt for complete list.

Notebook Contents

The sms-spam-detection.ipynb includes:

  1. Data Loading & Exploration - Load and examine the dataset structure
  2. Exploratory Data Analysis - Distribution analysis, text statistics
  3. Data Preprocessing - Cleaning, tokenization, and vectorization
  4. Model Training - Multiple classifier implementations
  5. Model Evaluation - Performance metrics and comparisons
  6. Visualization - Confusion matrices, ROC curves, feature importance
  7. Predictions - Making predictions on new messages

Usage

Run the Jupyter notebook cells sequentially to:

  1. Load and explore the spam dataset
  2. Preprocess text data
  3. Train multiple classification models
  4. Evaluate and compare model performance
  5. Generate visualizations and insights

Configuration

Project settings are defined in pyproject.toml:

  • Project name: email-spam-classifier
  • Version: 0.1.0
  • Python requirement: >=3.10

License

This project is provided as-is for educational purposes.

Contributing

Feel free to fork, modify, and improve this project for your learning purposes.


Note: This is a learning/development project. The notebook contains experimental code and iterative model development.

About

A machine learning project for detecting and classifying spam messages using Python, built with a Jupyter Notebook for interactive exploration and model development.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published