A modular, ML-powered framework designed to detect command-and-control (C2) malware using network traffic data. Developed as part of a research project targeting the detection of malware through sequence-aware deep learning models (e.g., LSTM).
This framework is focused on experimentation, extensibility, and reproducibility — with full MLflow support and Python environment management via Poetry.
malware-model-training/
│
├── data/
│ ├── raw/
│ ├── csv/
│ └── [raw data files for malware in csv]
│ ├── pcap/
│ └── [raw data files for malware in pcap]
│ ├── processed/
│ ├── malware_1.csv
│ ├── malware_2.csv
│ └── labelled/
│ ├── malware_1.csv
│ ├── malware_2.csv
│
├── models/
│ ├── malware_1/
│ └── [trained models for malware_1]
│ ├── malware_2/
│ └── [trained models for malware_2]
│
├── notebooks/
│ ├── data_processing/
│ ├── malware_1.ipynb
│ ├── malware_2.ipynb
│ ├── modeling/
│ ├── malware_1.ipynb
│ ├── malware_2.ipynb
│ ├── data_labelling/
│ ├── malware_1.ipynb
│ ├── malware_2.ipynb
│ ├── data_parsing/
│ ├── malware_1.py
│ ├── malware_2.py
│
├── variables/
│ ├── malware_1/
│ └── scaler.pkl
│ ├── malware_2/
│ └── scaler.pkl
│
└── [other project files, e.g., README.md, requirements.txt, etc.]- End-to-end ML pipeline for malware traffic detection
- Reproducible experiments with MLflow
- Dependency isolation using Poetry
- Real-world datasets including Dridex and Emotet
- Deep learning models (LSTM-based) for temporal pattern recognition
-
Clone the Repository
git clone https://github.com/Yousinator/C2-ML-Detection-Framework.git cd C2-ML-Detection-Framework -
Install Poetry (if you haven’t)
curl -sSL https://install.python-poetry.org | python3 - -
Install Dependencies
poetry install
-
Activate the Virtual Environment
poetry shell
All experimentation is done through notebooks inside the notebooks/ directory. Each notebook is self-contained and includes:
- Data Parsing
- Data loading and preprocessing
- Feature engineering
- Model training and evaluation
MLflow artifacts and metrics will be logged automatically to the mlruns/ folder.
The framework supports labeled datasets for C2 malware such as:
- Dridex C2 traffic
- Emotet C2 traffic
Data is under the data/ directory. Structure and preprocessing steps are detailed in the relevant Jupyter notebooks under notebooks/.
- Core model: LSTM-based malware traffic classifier
- Input features: Sequence of flow-level and packet-level statistics
- Output: Binary label (malicious / benign)
If you use this framework in your research or project, please consider citing:
@misc{musabeh2025c2ml,
author = {Yousef Musabeh},
title = {A Machine Learning Framework for Detecting Command-and-Control Malware via Network Behavior},
year = {2025},
url = {https://github.com/Yousinator/C2-ML-Detection-Framework}
}This project is licensed under the MIT License. See the LICENSE file for more details. Let me know if you want sections for Contributing, Environment Variables, or more advanced usage examples!