AI-Powered Hybrid Intrusion Detection System (IDS)

A State-of-the-Art Hybrid Machine Learning Pipeline for Real-Time Network Traffic Classification and Zero-Day Anomaly Detection.

📌 Project Overview

Modern network infrastructures are constantly exposed to sophisticated cyber threats. Traditional Signature-based Intrusion Detection Systems (SIDS) fail against novel, zero-day attacks, while Anomaly-based Intrusion Detection Systems (AIDS) tend to suffer from high false-alarm rates.

This project implements an AI-Powered Hybrid Intrusion Detection System (IDS) that harmonizes supervised classification and unsupervised anomaly detection:

Supervised Classification (XGBoost): Matches incoming traffic features against known threat signatures (e.g., DDoS, Brute Force, Port Scan) with ultra-high precision and low latency.
Unsupervised Anomaly Detection (PyTorch Autoencoder): Reconstructs normal traffic patterns. Deviations in reconstruction error (Mean Squared Error) act as a threshold-safe detection mechanism for previously unseen or zero-day anomalies.

📐 System Architecture Diagram

graph TD
    %% Define Styles
    classDef inputStyle fill:#e1f5fe,stroke:#0288d1,stroke-width:2px,font-weight:bold;
    classDef preprocessStyle fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,font-weight:bold;
    classDef supervisedStyle fill:#fff3e0,stroke:#f57c00,stroke-width:2px,font-weight:bold;
    classDef unsupervisedStyle fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,font-weight:bold;
    classDef stStyle fill:#eceff1,stroke:#455a64,stroke-width:2px,font-weight:bold;
    classDef outputStyle fill:#ffebee,stroke:#c62828,stroke-width:2px,font-weight:bold;

    %% Nodes
    A["Raw Network Traffic <br>(CIC-IDS2017 CSV)"]:::inputStyle
    
    B["Data Preprocessing <br>(SimpleImputer & StandardScaler)"]:::preprocessStyle
    
    %% Split Parallel Paths
    subgraph Supervised Pipeline
        C1["XGBoost Classifier"]:::supervisedStyle
        C2["Signature Detection"]:::supervisedStyle
        C3["Predicts 'Known Attacks'"]:::supervisedStyle
    end
    
    subgraph Unsupervised Pipeline
        D1["PyTorch Autoencoder"]:::unsupervisedStyle
        D2["Reconstruction Error (MSE)"]:::unsupervisedStyle
        D3["Predicts 'Zero-Day Anomalies'"]:::unsupervisedStyle
    end
    
    E["Streamlit Hybrid Dashboard <br>(Decision Consolidation)"]:::stStyle
    
    F["Final Alert: BENIGN or THREAT"]:::outputStyle

    %% Connections
    A --> B
    B -->|"Parallel Flow"| C1
    B -->|"Parallel Flow"| D1
    
    C1 --> C2 --> C3
    D1 --> D2 --> D3
    
    C3 --> E
    D3 --> E
    
    E --> F

📂 Project Structure

Below is the directory structure showing the clean isolation of data pipelines, model weights, notebook experiments, and modular source code:

├── data/
│   ├── raw/
│   │   └── dataset.csv                 # Original raw network traffic captures
│   └── processed/
│       └── clean_traffic.csv           # Normalized and engineered baseline traffic
├── models/
│   ├── autoencoder.pth                 # Trained PyTorch Autoencoder weights
│   ├── xgb_model.pkl                   # Trained XGBoost Booster object
│   ├── preprocessor.pkl                # Fitted RobustScaler/MinMaxScaler artifact
│   └── label_encoder.pkl               # LabelEncoder mapping for known classes
├── notebooks/
│   └── exploration.ipynb               # Jupyter notebook containing EDA & modeling experiments
├── src/
│   ├── data_pipeline.py                # Preprocessing, normalization, and split scripts
│   ├── train_supervised.py             # Supervised XGBoost training routines
│   ├── train_anomaly.py                # PyTorch Autoencoder training execution
│   └── evaluate.py                     # Performance testing & validation framework
├── app.py                              # Hardware-aware Streamlit web dashboard
├── Dockerfile                          # Production container specification
├── requirements.txt                    # Project dependency specification
├── .gitignore                          # Excluded files and dataset boundaries
└── README.md                           # Project documentation

🌟 Key Features

Dual-Engine Hybrid Architecture: Synergizes supervised classification for swift known threat blocking and deep unsupervised autoencoders for zero-day threat discovery.
Hardware-Aware Adaptive Loading: Dynamically auto-detects CUDA hardware for accelerated PyTorch tensor computing on Nvidia GPUs while gracefully falling back to CPU mode in containerized cloud environments (e.g., Hugging Face Spaces).
Interactive Analytics Dashboard: Real-time evaluation dashboard powered by Streamlit, allowing manual single-row feature entry or batch CSV uploads with immediate comparison against baseline MSE metrics.
Enterprise-Grade Containerization: Fully containerized using Docker, isolating environmental dependencies and facilitating seamless on-premise or cloud deployments.

⚙️ Installation & Local Setup

Ensure you have Python 3.10+ and Pip installed on your system.

1. Clone and Navigate to the Repository

git clone https://github.com/your-username/ids-hybrid-system.git
cd ids-hybrid-system

2. Create and Activate a Virtual Environment

On Windows:

python -m venv venv
.\venv\Scripts\activate

On Linux/macOS:

python3 -m venv venv
source venv/bin/activate

3. Install Core Dependencies

Install the required packages, including PyTorch, XGBoost, Scikit-Learn, and Streamlit:

pip install -r requirements.txt

🚀 Usage

Running the Streamlit Dashboard locally:

Once dependencies are installed and training artifacts are generated in the models/ directory, launch the application:

streamlit run app.py

Open your browser and navigate to http://localhost:8501 to interact with the visual dashboard.

🐳 Docker Deployment

To build and run the application as a portable, production-ready container:

1. Build the Docker Image

docker build -t hybrid-ids:latest .

2. Run the Container

docker run -p 8501:8501 hybrid-ids:latest

Access the application at http://localhost:8501.

☁️ Cloud Deployment

This project is prepared for dual-deployment and is currently hosted live on Hugging Face Spaces. The application runs on a CPU-only hardware allocation on the cloud, leveraging our hardware-aware model loader to ensure stability under low-resource environments.

🔗 Live Hugging Face Spaces App: https://spandan228-ids-dashboard.hf.space

📊 Model Performance

The hybrid system was evaluated using standard evaluation metrics on the reference benchmark testing partition ($N = 45,149$ samples):

Supervised Classifier (XGBoost)

Provides high-speed matching against signature attacks with ultra-high accuracy.

Metric	Calculation Formula	Score (Known Attacks)
Accuracy	$\frac{TP + TN}{\text{Total}}$	99.99% (45,148 / 45,149)
Precision	$\frac{TP}{TP + FP}$	100.00%
Recall	$\frac{TP}{TP + FN}$	99.99%
F1-Score	$2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$	99.99%

Unsupervised Anomaly Engine (Autoencoder)

Acts as a fallback to detect zero-day or unknown attacks by assessing deviations in reconstruction.

Metric / Baseline Parameter	Value	Description
Baseline Normal MSE	0.063459	Average reconstruction error for benign traffic
Average Malicious MSE	0.613365	Average reconstruction error for attack traffic
Error Multiplier	9.67x	Reconstruction error ratio ($\text{Attack} / \text{Benign}$)
Anomaly Decision Threshold	0.180529	95th percentile of normal benign traffic error
Zero-Day Detection Rate	63.82%	Attacks flagged without any signature matching

⏱️ Detection Latency Analysis

The hybrid system is optimized for high-throughput, low-latency enterprise environments:

XGBoost Inference: $\approx 0.08\text{ ms}$ per packet.
Autoencoder Inference: $\approx 0.20\text{ ms}$ per packet (on CPU).
Total Pipeline Latency: $0.28\text{ ms}$ per packet (approx. $280\ \mu\text{s}$).
Throughput Capacity: Able to process ~3,570 Packets Per Second (PPS) under a CPU-bound single-thread regime, and up to ~15,000 PPS with multi-threaded batching on GPU-accelerated local deployments.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
models		models
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
setup_guide.md		setup_guide.md
student_briefing.md		student_briefing.md
test_hybrid.py		test_hybrid.py
test_inference.py		test_inference.py
viva_study_guide.md		viva_study_guide.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Powered Hybrid Intrusion Detection System (IDS)

📌 Project Overview

📐 System Architecture Diagram

📂 Project Structure

🌟 Key Features

⚙️ Installation & Local Setup

1. Clone and Navigate to the Repository

2. Create and Activate a Virtual Environment

3. Install Core Dependencies

🚀 Usage

Running the Streamlit Dashboard locally:

🐳 Docker Deployment

1. Build the Docker Image

2. Run the Container

☁️ Cloud Deployment

📊 Model Performance

Supervised Classifier (XGBoost)

Unsupervised Anomaly Engine (Autoencoder)

⏱️ Detection Latency Analysis

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Hybrid Intrusion Detection System (IDS)

📌 Project Overview

📐 System Architecture Diagram

📂 Project Structure

🌟 Key Features

⚙️ Installation & Local Setup

1. Clone and Navigate to the Repository

2. Create and Activate a Virtual Environment

3. Install Core Dependencies

🚀 Usage

Running the Streamlit Dashboard locally:

🐳 Docker Deployment

1. Build the Docker Image

2. Run the Container

☁️ Cloud Deployment

📊 Model Performance

Supervised Classifier (XGBoost)

Unsupervised Anomaly Engine (Autoencoder)

⏱️ Detection Latency Analysis

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages