A complete End-to-End MLOps Project for predicting phishing threats from network data, featuring automated machine learning pipelines, containerized deployment, and continuous integration/delivery.
π Live Web Application: https://net-sec.onrender.com
- π― Project Overview
- β¨ Key Features
- ποΈ Architecture
- π§ Tech Stack
- π Quick Start
- π Model Performance
- π CI/CD Pipeline
- π Project Structure
- π οΈ Local Development
- π Monitoring & Logging
Network security threats, particularly phishing attacks, pose significant risks to organizations and individuals. This project implements a comprehensive MLOps solution that automatically detects malicious network activities using machine learning algorithms.
Phishing attacks attempt to steal sensitive information (passwords, financial credentials, personal data) through deceptive online activities. Traditional signature-based detection methods often fail against evolving threats, making machine learning-based approaches essential for modern cybersecurity.
This project delivers an end-to-end automated system that:
- β Ingests and validates network security data from multiple sources
- β Trains multiple ML models to detect phishing threats with high accuracy
- β Provides real-time prediction capabilities via web interface and API
- β Maintains model performance through automated retraining pipelines
- β Ensures reliable deployment with comprehensive CI/CD automation
- Data Ingestion: Automated data collection from MongoDB with validation
- Data Processing: Schema validation, drift detection, and feature engineering
- Model Training: Multi-algorithm training with hyperparameter optimization
- Model Evaluation: Comprehensive metrics tracking and comparison
- Deployment: Automated containerized deployment with zero-downtime updates
- Multiple Algorithms: Logistic Regression, Decision Tree, Random Forest, AdaBoost, Gradient Boosting
- Feature Engineering: Automated preprocessing and transformation pipelines
- Model Selection: Automated best model selection based on performance metrics
- Batch Predictions: Support for bulk prediction on CSV uploads
- Interactive UI: User-friendly interface for training and predictions
- RESTful API: Complete API documentation with Swagger UI
- Real-time Processing: Instant predictions on uploaded data
- Responsive Design: Mobile-friendly interface
- Experiment Tracking: MLflow integration via DagsHub
- Performance Monitoring: Model metrics and drift detection
- Logging: Comprehensive application and pipeline logging
- Health Checks: API health monitoring and status endpoints
flowchart TD
A[Data Sources] --> B[Data Ingestion]
B --> C[Data Validation]
C --> D[Data Transformation]
D --> E[Model Training]
E --> F[Model Evaluation]
F --> G[Model Registry]
G --> H[Deployment Pipeline]
H --> I[Production API]
I --> J[Web Interface]
K[GitHub Actions] --> H
L[Docker Registry] --> H
M[AWS S3] --> G
N[MongoDB] --> B
O[MLflow/DagsHub] --> F
- Ingestion: Raw network data collected from MongoDB
- Validation: Schema validation and data quality checks
- Transformation: Feature engineering and preprocessing
- Training: Multi-model training with cross-validation
- Evaluation: Performance metrics and model comparison
- Storage: Best model artifacts saved to AWS S3
- Deployment: Automated deployment via CI/CD pipeline
- Serving: Real-time predictions via FastAPI
| Category | Technologies | Purpose |
|---|---|---|
| Machine Learning | scikit-learn, pandas, numpy | Model development and data processing |
| MLOps & Tracking | MLflow, DagsHub | Experiment tracking and model versioning |
| Data Storage | MongoDB, AWS S3 | Data persistence and artifact storage |
| Backend Framework | FastAPI, Uvicorn | High-performance API development |
| Frontend | Jinja2 Templates, HTML5/CSS3 | User interface and visualization |
| Containerization | Docker, Docker Hub | Application packaging and distribution |
| CI/CD | GitHub Actions | Automated testing and deployment |
| Cloud Platform | Render | Production hosting and deployment |
| Development | Python 3.10+, Git | Core development tools |
- Python 3.10 or higher
- Docker (optional, for containerized deployment)
- Git
Visit https://net-sec.onrender.com to:
- Train models directly in your browser
- Upload CSV files for batch predictions
- Explore the interactive API documentation
# Clone the repository
git clone https://github.com/ananthakr1shnan/netsecurity-mlops.git
cd netsecurity-mlops
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install --upgrade pip
pip install -e .
# Set up environment variables
cp .env.example .env
# Edit .env with your configurations
# Run the application
python app.py# Pull the latest image
docker pull ananthakrishnank/netsecurity-mlops:latest
# Run the container
docker run -p 8000:8000 ananthakrishnank/netsecurity-mlops:latestAccess the application at http://localhost:8000
| Algorithm | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Random Forest | 0.94 | 0.93 | 0.95 | 0.94 |
| Gradient Boosting | 0.92 | 0.91 | 0.93 | 0.92 |
| AdaBoost | 0.89 | 0.88 | 0.90 | 0.89 |
| Logistic Regression | 0.86 | 0.85 | 0.87 | 0.86 |
| Decision Tree | 0.84 | 0.83 | 0.85 | 0.84 |
Results based on cross-validation with the latest dataset
The top features contributing to phishing detection:
- URL Length - Suspicious URLs are typically longer
- Domain Age - Newer domains are more likely to be malicious
- SSL Certificate - Missing or invalid certificates indicate threats
- Redirect Count - Multiple redirects suggest obfuscation attempts
- Special Character Count - Excessive special characters in URLs
This automated pipeline ensures reliable and consistent deployments:
- Push to
mainbranch - Pull request creation/update
- Manual workflow dispatch
stages:
- name: Build & Deploy
steps:
- Docker image build
- Push to Docker Hub
- Deploy to Render
- Health check validation- β Automated Testing: Every commit is tested
- β Security Scanning: Vulnerability checks on dependencies
- β Zero-Downtime Deployment: Rolling updates with health checks
- β Rollback Capability: Automatic rollback on deployment failures
# Install development dependencies
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
# Run tests
pytest tests/ -v --cov=src
# Start development server with hot reload
uvicorn app:app --reload --host 0.0.0.0 --port 8000Create a .env file with the following variables:
# Database Configuration
MONGO_DB_URL=your_mongodb_connection_string
DATABASE_NAME=your_database_name
# AWS Configuration
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=your_aws_region
AWS_BUCKET_NAME=your_s3_bucket_name
# MLflow Configuration
MLFLOW_TRACKING_URI=your_mlflow_tracking_uri
MLFLOW_TRACKING_USERNAME=your_username
MLFLOW_TRACKING_PASSWORD=your_password
# Application Configuration
DEBUG=True
LOG_LEVEL=INFOTrack and compare model experiments:
- Metrics: Accuracy, precision, recall, F1-score
- Parameters: Hyperparameters and configuration
- Artifacts: Model files, preprocessors, and visualizations
- Comparison: Side-by-side model performance analysis
- π Live Application: https://net-sec.onrender.com
- π MLflow Dashboard: DagsHub Experiments
- π³ Docker Image: Docker Hub Repository
- π Project Board: GitHub Issues