This project implements a machine learning-based Intrusion Detection and Prevention System (IDPS) for detecting and mitigating Denial-of-Service (DoS) attacks in corporate networks.
It uses the Random Forest algorithm trained on the CIC-DDoS2019 dataset, optimized for accuracy and practical deployment in Kenyan enterprise environments.
A Graphical User Interface (GUI) provides real-time traffic monitoring, alert management, and report generation.
- Detect and classify malicious DoS traffic with high accuracy.
- Isolate and block suspicious traffic in real-time.
- Provide a usable GUI dashboard for administrators.
- Support explainability with feature importance and SHAP analysis.
- Deliver a modular, scalable solution aligned with enterprise security needs.
- Data Preprocessing: Cleaning, scaling, and feature engineering pipeline
- Model Training: Random Forest classifier with evaluation metrics (Accuracy, Precision, Recall, F1, AUC)
- Testing & Evaluation: CIC-DDoS2019 dataset split into training/test sets; evaluated for robustness
- Real-time Detection: Live traffic monitoring and DoS attack detection
- Desktop GUI: PyQt5-based dashboard for system management
- Multi-Factor Authentication: TOTP-based 2FA with Google Authenticator
- User Management: Role-based access control (Admin/Analyst)
- Alert Management: Real-time alert monitoring and response
- Automated Setup: One-command installation and configuration
- Virtual Environment: Isolated Python environment for stability
- Database Integration: PostgreSQL with Alembic migrations
- Audit Logging: Comprehensive event logging for security
- API Documentation: Auto-generated Swagger/OpenAPI docs
Random-Forest-Based-IDPS/
│
├── Automation Scripts
│ ├── setup.sh # Complete project setup
│ ├── run_backend.sh # Start backend with venv
│ ├── run_gui.sh # Start GUI with venv
│ └── run_full_system.sh # Start both backend & GUI
│
├── GUI Application
│ ├── gui/
│ │ ├── main.py # GUI entry point
│ │ ├── login_window.py # Login & MFA dialogs
│ │ ├── dashboard_window.py # Main dashboard
│ │ └── api_client.py # Backend communication
│
├── Backend API
│ ├── backend/
│ │ ├── app/
│ │ │ ├── main.py # FastAPI application
│ │ │ ├── auth.py # Authentication logic
│ │ │ ├── totp.py # MFA implementation
│ │ │ ├── models.py # Database models
│ │ │ └── routers/ # API endpoints
│
├── Documentation
│ ├── README.md # Main project docs
│ ├── README_MFA.md # MFA overview
│ ├── QUICK_START_MFA.md # Quick MFA setup
│ ├── MFA_SETUP_GUIDE.md # Complete MFA guide
│ └── MFA_VISUAL_GUIDE.md # Visual MFA walkthrough
│
├── Analysis & Models
│ ├── notebooks/ # Jupyter notebooks
│ ├── config/ # Model configurations
│ ├── models/ # Trained ML models
│ └── reports/ # Evaluation reports
│
└── Configuration
├── requirements.txt # Python dependencies
├── .gitignore # Ignored files
└── venv/ # Virtual environment (created by setup)
- Python – Core development
- scikit-learn – Random Forest training & evaluation
- pandas, numpy – Data preprocessing
- matplotlib, seaborn – Visualization
- PyQt5 – Graphical User Interface
- SHAP – Explainability
- VirtualBox + Kali Linux – Traffic simulation
-
Clone the repository:
git clone https://github.com/annKimani-ICS/Random-Forest-Based-IDPS.git cd Random-Forest-Based-IDPS -
Run automated setup:
chmod +x setup.sh ./setup.sh
-
Start the system:
# Start backend only (defaults to port 3000; override with PORT=8000) ./run_backend.sh # or specify a custom port PORT=8000 ./run_backend.sh # Or start GUI only (in new terminal) ./run_gui.sh # Or start both together ./run_full_system.sh
If you prefer manual setup or encounter issues with the automated scripts:
- Python 3.8+ (3.10+ recommended)
- Git
- Virtual environment support
-
Clone the repository:
git clone https://github.com/annKimani-ICS/Random-Forest-Based-IDPS.git cd Random-Forest-Based-IDPS -
Create virtual environment:
python3 -m venv venv source venv/bin/activate # Linux/Mac # or venv\Scripts\activate # Windows
-
Install backend dependencies:
cd backend pip install -r requirements.txt -
Install GUI dependencies:
cd ../gui pip install -r requirements.txt -
Initialize database (if needed):
cd ../backend alembic upgrade head # Run migrations
-
Run the system:
# Terminal 1 - Backend (recommended: local venv inside backend) cd backend python3 -m venv .venv && source .venv/bin/activate pip install -r requirements.txt uvicorn app.main:app --reload --host 0.0.0.0 --port 3000 # Terminal 2 - GUI cd gui source ../venv/bin/activate python main.py
This system includes TOTP-based Multi-Factor Authentication using Google Authenticator:
- After logging in, navigate to the Security tab
- Click "Enable Two-Factor Authentication"
- Scan QR code with Google Authenticator app
- Enter verification code to activate
- Save recovery codes for backup access
Detailed MFA guides:
QUICK_START_MFA.md- Quick 5-minute setupMFA_SETUP_GUIDE.md- Complete admin guideREADME_MFA.md- MFA documentation index
#Results (Fourth Iteration) Performance Metrics: Accuracy: 90.48% F1-Score: 90.51% Precision: 90.62% Recall: 90.48% Holdout Validation F1-Score: 89.76% Performance Consistency: 0.0076 (Excellent)
Key Features: hour (0.218) - Time-based attack patterns day_of_week (0.182) - Weekly traffic behavior Fwd Packet Length Max (0.057) - Network traffic analysis Packet Length Mean (0.057) - Statistical network metrics Subflow Fwd Bytes (0.050) - Flow analysis Max Packet Length (0.050) - Traffic volume indicators Fwd Packet Length Mean (0.048) - Forward packet statistics Avg Fwd Segment Size (0.043) - Segment-level analysis Total Length of Fwd Packets (0.039) - Packet aggregation Average Packet Size (0.032) - Size-based detection
Technical Achievements: Training Time: < 15 minutes (99%+ speed improvement) Data Optimization: 581K → 50K samples (91% reduction) Feature Selection: 87 → 30 features (65% reduction) Model Architecture: Voting Ensemble (Random Forest + Random Forest) Class Balancing: SMOTE applied for balanced training
Application/Capability: High-performance detection of DDoS attack flows with real-time processing capabilities, achieving enterprise-grade accuracy while maintaining sub-15-minute training cycles for rapid model deployment and updates in production network environments.
Improvement Over Previous Iterations: +25.63% F1-Score improvement over Iteration 3 +20.31% Accuracy improvement over Iteration 3 99%+ faster training compared to initial iterations
#Roadmap Sprint 1 – Data Cleaning & Preprocessing Sprint 2 – Model Training & Evaluation Sprint 3 – GUI Development (PyQt5 Dashboard) Sprint 4 – Integration with VM Simulation (Ubuntu + Kali) Sprint 5 – Final Evaluation & Defense
#Author: Kimani Ann Wangari BSc Informatics and Computer Science, Strathmore University, Nairobi, Kenya Supervisor: Mr. James Gikera
#License: This project is for academic and research purposes only. Unauthorized use in production environments is not advised without further security hardening.
Git cheatsheet: https://philomatics.com/git-cheatsheet-release