Skip to content

Spandan228/AI-Intrusion-Detection-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI-Powered Hybrid Intrusion Detection System (IDS)

A State-of-the-Art Hybrid Machine Learning Pipeline for Real-Time Network Traffic Classification and Zero-Day Anomaly Detection.


πŸ“Œ Project Overview

Modern network infrastructures are constantly exposed to sophisticated cyber threats. Traditional Signature-based Intrusion Detection Systems (SIDS) fail against novel, zero-day attacks, while Anomaly-based Intrusion Detection Systems (AIDS) tend to suffer from high false-alarm rates.

This project implements an AI-Powered Hybrid Intrusion Detection System (IDS) that harmonizes supervised classification and unsupervised anomaly detection:

  1. Supervised Classification (XGBoost): Matches incoming traffic features against known threat signatures (e.g., DDoS, Brute Force, Port Scan) with ultra-high precision and low latency.
  2. Unsupervised Anomaly Detection (PyTorch Autoencoder): Reconstructs normal traffic patterns. Deviations in reconstruction error (Mean Squared Error) act as a threshold-safe detection mechanism for previously unseen or zero-day anomalies.

πŸ“ System Architecture Diagram

graph TD
    %% Define Styles
    classDef inputStyle fill:#e1f5fe,stroke:#0288d1,stroke-width:2px,font-weight:bold;
    classDef preprocessStyle fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,font-weight:bold;
    classDef supervisedStyle fill:#fff3e0,stroke:#f57c00,stroke-width:2px,font-weight:bold;
    classDef unsupervisedStyle fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,font-weight:bold;
    classDef stStyle fill:#eceff1,stroke:#455a64,stroke-width:2px,font-weight:bold;
    classDef outputStyle fill:#ffebee,stroke:#c62828,stroke-width:2px,font-weight:bold;

    %% Nodes
    A["Raw Network Traffic <br>(CIC-IDS2017 CSV)"]:::inputStyle
    
    B["Data Preprocessing <br>(SimpleImputer & StandardScaler)"]:::preprocessStyle
    
    %% Split Parallel Paths
    subgraph Supervised Pipeline
        C1["XGBoost Classifier"]:::supervisedStyle
        C2["Signature Detection"]:::supervisedStyle
        C3["Predicts 'Known Attacks'"]:::supervisedStyle
    end
    
    subgraph Unsupervised Pipeline
        D1["PyTorch Autoencoder"]:::unsupervisedStyle
        D2["Reconstruction Error (MSE)"]:::unsupervisedStyle
        D3["Predicts 'Zero-Day Anomalies'"]:::unsupervisedStyle
    end
    
    E["Streamlit Hybrid Dashboard <br>(Decision Consolidation)"]:::stStyle
    
    F["Final Alert: BENIGN or THREAT"]:::outputStyle

    %% Connections
    A --> B
    B -->|"Parallel Flow"| C1
    B -->|"Parallel Flow"| D1
    
    C1 --> C2 --> C3
    D1 --> D2 --> D3
    
    C3 --> E
    D3 --> E
    
    E --> F
Loading

πŸ“‚ Project Structure

Below is the directory structure showing the clean isolation of data pipelines, model weights, notebook experiments, and modular source code:

β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/
β”‚   β”‚   └── dataset.csv                 # Original raw network traffic captures
β”‚   └── processed/
β”‚       └── clean_traffic.csv           # Normalized and engineered baseline traffic
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ autoencoder.pth                 # Trained PyTorch Autoencoder weights
β”‚   β”œβ”€β”€ xgb_model.pkl                   # Trained XGBoost Booster object
β”‚   β”œβ”€β”€ preprocessor.pkl                # Fitted RobustScaler/MinMaxScaler artifact
β”‚   └── label_encoder.pkl               # LabelEncoder mapping for known classes
β”œβ”€β”€ notebooks/
β”‚   └── exploration.ipynb               # Jupyter notebook containing EDA & modeling experiments
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ data_pipeline.py                # Preprocessing, normalization, and split scripts
β”‚   β”œβ”€β”€ train_supervised.py             # Supervised XGBoost training routines
β”‚   β”œβ”€β”€ train_anomaly.py                # PyTorch Autoencoder training execution
β”‚   └── evaluate.py                     # Performance testing & validation framework
β”œβ”€β”€ app.py                              # Hardware-aware Streamlit web dashboard
β”œβ”€β”€ Dockerfile                          # Production container specification
β”œβ”€β”€ requirements.txt                    # Project dependency specification
β”œβ”€β”€ .gitignore                          # Excluded files and dataset boundaries
└── README.md                           # Project documentation

🌟 Key Features

  • Dual-Engine Hybrid Architecture: Synergizes supervised classification for swift known threat blocking and deep unsupervised autoencoders for zero-day threat discovery.
  • Hardware-Aware Adaptive Loading: Dynamically auto-detects CUDA hardware for accelerated PyTorch tensor computing on Nvidia GPUs while gracefully falling back to CPU mode in containerized cloud environments (e.g., Hugging Face Spaces).
  • Interactive Analytics Dashboard: Real-time evaluation dashboard powered by Streamlit, allowing manual single-row feature entry or batch CSV uploads with immediate comparison against baseline MSE metrics.
  • Enterprise-Grade Containerization: Fully containerized using Docker, isolating environmental dependencies and facilitating seamless on-premise or cloud deployments.

βš™οΈ Installation & Local Setup

Ensure you have Python 3.10+ and Pip installed on your system.

1. Clone and Navigate to the Repository

git clone https://github.com/your-username/ids-hybrid-system.git
cd ids-hybrid-system

2. Create and Activate a Virtual Environment

  • On Windows:
    python -m venv venv
    .\venv\Scripts\activate
  • On Linux/macOS:
    python3 -m venv venv
    source venv/bin/activate

3. Install Core Dependencies

Install the required packages, including PyTorch, XGBoost, Scikit-Learn, and Streamlit:

pip install -r requirements.txt

πŸš€ Usage

Running the Streamlit Dashboard locally:

Once dependencies are installed and training artifacts are generated in the models/ directory, launch the application:

streamlit run app.py

Open your browser and navigate to http://localhost:8501 to interact with the visual dashboard.


🐳 Docker Deployment

To build and run the application as a portable, production-ready container:

1. Build the Docker Image

docker build -t hybrid-ids:latest .

2. Run the Container

docker run -p 8501:8501 hybrid-ids:latest

Access the application at http://localhost:8501.


☁️ Cloud Deployment

This project is prepared for dual-deployment and is currently hosted live on Hugging Face Spaces. The application runs on a CPU-only hardware allocation on the cloud, leveraging our hardware-aware model loader to ensure stability under low-resource environments.

πŸ”— Live Hugging Face Spaces App: https://spandan228-ids-dashboard.hf.space


πŸ“Š Model Performance

The hybrid system was evaluated using standard evaluation metrics on the reference benchmark testing partition ($N = 45,149$ samples):

Supervised Classifier (XGBoost)

Provides high-speed matching against signature attacks with ultra-high accuracy.

Metric Calculation Formula Score (Known Attacks)
Accuracy $\frac{TP + TN}{\text{Total}}$ 99.99% (45,148 / 45,149)
Precision $\frac{TP}{TP + FP}$ 100.00%
Recall $\frac{TP}{TP + FN}$ 99.99%
F1-Score $2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$ 99.99%

Unsupervised Anomaly Engine (Autoencoder)

Acts as a fallback to detect zero-day or unknown attacks by assessing deviations in reconstruction.

Metric / Baseline Parameter Value Description
Baseline Normal MSE 0.063459 Average reconstruction error for benign traffic
Average Malicious MSE 0.613365 Average reconstruction error for attack traffic
Error Multiplier 9.67x Reconstruction error ratio ($\text{Attack} / \text{Benign}$)
Anomaly Decision Threshold 0.180529 95th percentile of normal benign traffic error
Zero-Day Detection Rate 63.82% Attacks flagged without any signature matching

⏱️ Detection Latency Analysis

The hybrid system is optimized for high-throughput, low-latency enterprise environments:

  • XGBoost Inference: $\approx 0.08\text{ ms}$ per packet.
  • Autoencoder Inference: $\approx 0.20\text{ ms}$ per packet (on CPU).
  • Total Pipeline Latency: $0.28\text{ ms}$ per packet (approx. $280\ \mu\text{s}$).
  • Throughput Capacity: Able to process ~3,570 Packets Per Second (PPS) under a CPU-bound single-thread regime, and up to ~15,000 PPS with multi-threaded batching on GPU-accelerated local deployments.

About

An AI-Powered Hybrid Intrusion Detection System (IDS) combining Supervised ML (XGBoost) for signature threats and Unsupervised Deep Learning (PyTorch Autoencoder) for zero-day network anomaly detection. Features a hardware-aware real-time Streamlit dashboard.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages