This project builds a complete Machine Learning pipeline to detect and analyze Distributed Denial of Service (DDoS) attacks using both supervised and unsupervised learning techniques.
It follows the ML4N Project 2 specifications from Politecnico di Torino.
The dataset contains network traffic flows (benign and DDoS) generated with CICFlowMeter-V3.
Each flow includes detailed features about packets, bytes, timing, and flags.
The project focuses on:
- Data Exploration and Preprocessing
- Supervised Learning (Classification)
- Unsupervised Learning (Clustering)
- Explainability and Feature Analysis
This project uses Conda for dependency management.
Clone this repository and create the environment using the provided YAML file:
git clone git@github.com:POOYASP2/DDoS-attacks-detection.git
cd DDoS-attacks-detection
conda env create -f environment.ymlconda activate DDoS-attacks-detectionCheck that core packages are available:
python -c "import pandas, sklearn, xgboost; print('Environment ready!')"If you want to use the environment in Jupyter Notebooks:
python -m ipykernel install --user --name=DDoS-attacks-detection --display-name "DDoS ML Env"Then, select "DDoS ML Env" from the Jupyter kernel list.
DDoS-attacks-detection/
│
├── data/ # Dataset files (raw and processed)
├── notebooks/ # Jupyter notebooks for each section
│ ├── 01_data_preprocessing.ipynb
│ ├── 02_supervised_learning.ipynb
│ ├── 03_unsupervised_learning.ipynb
│ └── 04_explainability.ipynb
│
├── reports/ # Results, plots, and summary analyses
├── environment.yml # Conda environment configuration
├── README.md # This file
└── .gitignore
- Launch Jupyter Lab:
jupyter lab
- Open the notebooks in order:
01_data_preprocessing.ipynb02_supervised_learning.ipynb03_unsupervised_learning.ipynb04_explainability.ipynb
- Execute all cells to reproduce the results.
| Type | Libraries |
|---|---|
| Core ML | scikit-learn, xgboost, lightgbm |
| Data Handling | pandas, numpy, scipy |
| Visualization | matplotlib, seaborn, plotly, yellowbrick |
| Clustering | hdbscan, umap-learn |
| Explainability | shap |
| Notebook Tools | jupyterlab, tqdm |
| Section | Description |
|---|---|
| 1. Data Preparation & EDA | Load, clean, visualize, and analyze dataset features. |
| 2. Supervised Learning | Train classifiers and evaluate performance using confusion matrices. |
| 3. Unsupervised Learning | Apply clustering (K-Means, DBSCAN, HDBSCAN) to discover attack families. |
| 4. Explainability | Identify key features and interpret cluster behaviors. |
conda env update -f environment.yml --pruneconda remove --name DDoS-attacks-detection --allconda env export --from-history > environment.yml- Feature correlation and PCA plots
- Classification metrics and confusion matrices
- Cluster visualizations (2D embeddings, ECDFs)
- SHAP and permutation feature importance plots
Developed for ML4N Project – Politecnico di Torino
Based on dataset and specifications from Luca Gioacchini and Giordano Paoletti.
© 2025 — DDoS Attacks Detection Group Project