<<<<<<< HEAD
This project implements a robust bot detection system based on the research paper's approach, combining web log analysis and mouse movement patterns with intelligent fusion for enhanced detection accuracy.
The system consists of three main components that work together to provide comprehensive bot detection:
- Web Log Detection (
web_log_detection_bot.py) - Analyzes server logs for bot patterns - Mouse Movement Detection (
mouse_movements_detection_bot.py) - Analyzes mouse movement patterns using CNN - Fusion Module (
fusion.py) - Combines both signals using decision-level fusion
- Purpose: Analyzes Apache web server logs to extract session-based features
- Method: Ensemble classifier (SVM, Random Forest, AdaBoost, MLP)
- Features: HTTP requests, status codes, content types, browsing behavior
- Output: Bot probability score (0-1)
- Purpose: Analyzes mouse movement patterns to detect non-human behavior
- Method: Convolutional Neural Network (CNN)
- Input: Mouse movement matrices (spatial-temporal data)
- Output: Bot probability score (0-1)
- Purpose: Combines scores from both detection modules
- Method: Decision-level fusion with intelligent thresholds
- Logic:
- If mouse score > 0.7 or < 0.3: Use mouse score only
- Otherwise: Weighted average (0.5 Γ mouse + 0.5 Γ web_log)
- Output: Final bot classification and confidence score
CAPSTONE/
βββ web_log_detection_bot.py # Web log analysis module
βββ mouse_movements_detection_bot.py # Mouse movement analysis module
βββ fusion.py # Score fusion module
βββ main.py # Main system integration
βββ bot.py # Humanoid bot simulator (for testing)
βββ dataset/ # Training and test data
β βββ phase1/ # Phase 1 datasets
β β βββ D1/ # Humans vs Moderate Bots
β β βββ D2/ # Humans vs Advanced Bots
β βββ phase2/ # Phase 2 datasets
β βββ D1/ # Humans vs Moderate & Advanced Bots
β βββ D2/ # Humans vs Advanced Bots
βββ web_log_detector_comprehensive.pkl # Trained web log model
βββ mouse_movement_detector_comprehensive.h5 # Trained mouse movement model
βββ .gitignore # Git ignore file
βββ README.md # This file
pip3 install -r requirements.txt# Train web log detection model (sequential training on all phases)
python3 web_log_detection_bot.py
# Train mouse movement detection model (sequential training on all phases)
python3 mouse_movements_detection_bot.py# Run the main system (trains models and demonstrates fusion)
python3 main.py
# Test fusion logic with example scenarios
python3 fusion.pyUsing the Main System:
# Run the complete system with all components
python3 main.pyThe main system will:
- Train web log detection model on all phases
- Train mouse movement detection model on all phases
- Demonstrate fusion logic with example scenarios
- Show example session detection results
Web Log Detection:
from web_log_detection_bot import WebLogDetectionBot
# Initialize and train
detector = WebLogDetectionBot()
detector.train_sequentially() # Trains on all phases sequentially
# Predict on new data
score = detector.predict(web_log_features)Mouse Movement Detection:
from mouse_movements_detection_bot import MouseMovementDetectionBot
# Initialize and train
detector = MouseMovementDetectionBot()
detector.train_sequentially() # Trains on all phases sequentially
# Predict on new data
score = detector.predict_session(mouse_matrices)from fusion import BotDetectionFusion
# Initialize fusion with trained models
fusion = BotDetectionFusion(
web_log_model_path='models/web_log_detector_comprehensive.pkl',
mouse_movement_model_path='models/mouse_movement_detector_comprehensive.h5'
)
# Process a session
result = fusion.process_session(
mouse_score=0.85, # High bot probability from mouse movements
web_log_score=0.45 # Moderate bot probability from web logs
)
print(f"Final Classification: {'BOT' if result['is_bot'] else 'HUMAN'}")
print(f"Confidence Score: {result['final_score']:.3f}")
print(f"Fusion Method: {result['fusion_method']}")Option 1: Using Main System (Simplest)
# Run everything with one command
python3 main.pyOption 2: Manual Step-by-Step
# Step 1: Train web log detection model
from web_log_detection_bot import WebLogDetectionBot
web_log_detector = WebLogDetectionBot()
web_log_detector.train_sequentially()
# Step 2: Train mouse movement detection model
from mouse_movements_detection_bot import MouseMovementDetectionBot
mouse_detector = MouseMovementDetectionBot()
mouse_detector.train_sequentially()
# Step 3: Use fusion for final classification
from fusion import BotDetectionFusion
fusion = BotDetectionFusion(
web_log_model_path='models/web_log_detector_comprehensive.pkl',
mouse_movement_model_path='models/mouse_movement_detector_comprehensive.h5'
)
# Step 4: Detect bot in a session
result = fusion.process_session(
mouse_score=0.75,
web_log_score=0.60
)The system uses a comprehensive dataset with two phases:
- D1: Humans vs Moderate Bots
- Web logs:
access_1.logtoaccess_5.log(humans) +access_moderate_bots.log - Mouse movements: JSON files with session data
- Web logs:
- D2: Humans vs Advanced Bots
- Web logs:
access_1.logtoaccess_5.log(humans) +access_advanced_bots.log - Mouse movements: JSON files with session data
- Web logs:
- D1: Humans vs Moderate & Advanced Bots
- Web logs: Multiple human files +
access_moderate_and_advanced_bots.log - Mouse movements: URL sequence data
- Web logs: Multiple human files +
- D2: Humans vs Advanced Bots
- Web logs: Multiple human files +
access_moderate_and_advanced_bots.log - Mouse movements: URL sequence data
- Web logs: Multiple human files +
- Web Logs: Apache server logs with session IDs and comprehensive request data
- Mouse Movements: JavaScript-collected mouse movement sequences and URL patterns
- Annotations: Ground truth labels for training and evaluation
The fusion module implements intelligent decision-making:
IF mouse_score > 0.7 OR mouse_score < 0.3:
final_score = mouse_score # High confidence mouse movement
fusion_method = "mouse_only"
ELSE:
final_score = 0.5 Γ mouse_score + 0.5 Γ web_log_score # Weighted average
fusion_method = "weighted_average"
IF final_score > 0.5:
classification = "BOT"
ELSE:
classification = "HUMAN"
The system achieves robust performance through:
- Sequential Training: Models train on Phase 1 first, then incrementally on Phase 2
- Ensemble Methods: Multiple classifiers (SVM, Random Forest, AdaBoost, MLP) for web log analysis
- CNN Architecture: Deep learning for mouse movement pattern recognition
- Intelligent Fusion: Decision-level combination of signals with adaptive thresholds
- Majority Voting: For mouse movement matrices within sessions
- Single Model Files: Each component saves one comprehensive model after all training phases
- Robust Detection: Harder for advanced bots to evade due to dual-signal approach
- Sequential Learning: Incremental training across phases preserves learned features
- Modular Design: Each component can be used independently or together
- Comprehensive Evaluation: Performance metrics for all components and fusion
- Production Ready: Single model files for easy deployment
- Research Paper Implementation: Follows the exact approach described in the paper
- Web Log Features: Modify
extract_features()inweb_log_detection_bot.py - Mouse Movement Features: Modify matrix generation in
mouse_movements_detection_bot.py - Fusion Logic: Adjust thresholds and weights in
fusion.py
# Test the complete system (recommended)
python3 main.py
# Test individual components
python3 web_log_detection_bot.py
python3 mouse_movements_detection_bot.py
python3 fusion.py- Web Log Model:
web_log_detector_comprehensive.pkl(contains model, scaler, and selected features) - Mouse Movement Model:
mouse_movement_detector_comprehensive.h5(Keras CNN model)
This system implements the approach described in the research paper:
- Session Extraction: PHP session IDs from Apache web logs
- Feature Engineering: 19 comprehensive web log features + mouse movement matrices
- Model Training: Ensemble classifier + CNN with sequential learning across phases
- Decision Fusion: Intelligent combination of detection signals with adaptive thresholds
- Evaluation: Comprehensive metrics across all datasets (D1, D2, Phase 1, Phase 2)
β
Web Log Detection: Fully implemented and tested
β
Mouse Movement Detection: Fully implemented and tested
β
Fusion Module: Fully implemented and tested
β
Sequential Training: Working across all phases
β
Model Persistence: Single comprehensive model files
β
Documentation: Complete with examples
- Fork the repository
- Create a feature branch
- Implement your changes
- Add tests and documentation
- Submit a pull request
This project is for academic research purposes. Please cite the original research paper if using this implementation.
For issues and questions:
- Check the documentation in each module
- Review the example usage in the README
- Examine the test outputs for debugging information
- Check the
.gitignorefile for excluded files
Note: This system is designed for research and educational purposes. The models should be retrained with your specific data for production use.
A comprehensive web application that implements advanced bot detection using machine learning models, combining web log analysis and mouse movement patterns to identify automated behavior.
CAPSTONE-main/
βββ π login_page/ # Frontend React Application
β βββ π src/
β β βββ π components/ # React Components
β β β βββ LoginPage.jsx # Main login interface
β β β βββ Dashboard.jsx # Post-login dashboard
β β β βββ VisualCaptcha.jsx # Gamified CAPTCHA system
β β β βββ BotDetectionAlert.jsx # Bot detection notifications
β β β βββ HoneypotAlert.jsx # Honeypot trap alerts
β β β βββ MLDetectionMonitor.jsx # Real-time ML monitoring
β β βββ π api/ # Backend API handlers
β β β βββ botDetection.js # ML bot detection endpoint
β β β βββ log.js # Data logging endpoint
β β βββ π logs/ # Session data storage
β β β βββ mouse_movements.json
β β β βββ web_logs.json
β β β βββ behavior.json
β β β βββ login_attempts.json
β β βββ π utils/ # Utility functions
β β βββ eventLogger.js # Event tracking utilities
β βββ server.js # Express.js backend server
β βββ package.json # Node.js dependencies
β βββ vite.config.js # Vite build configuration
βββ π src/ # Core ML Detection System
β βββ π core/ # ML Detection Modules
β β βββ optimized_bot_detection.py # Fast ML processing
β β βββ web_log_detection_bot.py # Web log analysis
β β βββ mouse_movements_detection_bot.py # Mouse pattern analysis
β β βββ fusion.py # Score fusion algorithm
β βββ π utils/ # ML utilities
β βββ session_processor.py # Session data processing
βββ π models/ # Pre-trained ML Models
β βββ web_log_detector_comprehensive.pkl
β βββ mouse_movement_detector_comprehensive.h5
βββ π scripts/ # Test and Demo Scripts
β βββ main.py # Main demonstration script
β βββ bot.py # Bot simulation script
β βββ login_bot.py # Login automation bot
β βββ run_demo.py # Demo runner
βββ requirements.txt # Python dependencies
βββ README.md # This file
The React frontend (login_page/) collects comprehensive user behavior data:
- Mouse Movements: Real-time tracking of cursor coordinates
- Web Logs: HTTP requests, page interactions, and navigation patterns
- Behavior Signals: Keystroke timing, scroll patterns, focus/blur events
- Honeypot Traps: Hidden form fields to catch automated tools
User Interaction β Event Logging β Data Storage β ML Analysis β Decision
- Mouse Tracking: Continuous coordinate logging with session IDs
- Event Logging: All user interactions captured via
eventLogger.js - Behavior Analysis: Keystroke intervals, click trustworthiness, scroll variance
- Storage: JSON files in
login_page/src/logs/for ML processing
A. reCAPTCHA v3 Integration
- Google's invisible bot detection
- Score-based analysis (0.0 - 1.0)
- No user friction or challenges
- Real-time risk assessment
- Thresholds: High (0.7+), Medium (0.5+), Low (0.3+), Critical (0.1+)
B. Web Log Detection (web_log_detection_bot.py)
- Analyzes HTTP request patterns
- Features: request counts, status codes, timing patterns
- Model: Ensemble classifier (Random Forest, XGBoost)
- Output: Bot probability score (0-1)
C. Mouse Movement Detection (mouse_movements_detection_bot.py)
- CNN-based pattern recognition
- Input: 480x1320 normalized mouse movement matrices
- Features: Movement trajectories, acceleration, click patterns
- Output: Bot probability score (0-1)
D. Intelligent Fusion (fusion.py)
- Multi-layer decision fusion
- Combines reCAPTCHA + ML scores
- Logic:
- If mouse score > 0.65 or < 0.35: Use mouse score only
- Otherwise: Weighted average (60% mouse + 40% web log)
- Final threshold: 0.45 for bot classification
- Combined risk assessment with reCAPTCHA validation
- Model Caching: Singleton pattern for fast model loading
- Preprocessing: Optimized feature extraction
- Parallel Processing: Concurrent analysis of multiple data streams
- Result Fusion: Intelligent score combination
- reCAPTCHA v3: Invisible Google bot detection
- Honeypot Detection: Immediate CAPTCHA trigger
- ML Analysis: Comprehensive behavior scoring
- Combined Analysis: reCAPTCHA + ML fusion
- Adaptive CAPTCHA: Difficulty based on combined risk
- Visual Indicators: Real-time security status display
- Python 3.8+ with ML libraries (TensorFlow, scikit-learn, pandas)
- Node.js 16+ with npm
- Modern web browser with JavaScript enabled
- Google reCAPTCHA v3 API keys (optional but recommended)
# Install Python dependencies
pip install -r requirements.txt
# Verify model files exist
ls models/
# Should show:
# - web_log_detector_comprehensive.pkl
# - mouse_movement_detector_comprehensive.h5cd login_page/
# Install Node.js dependencies
npm install
# Configure reCAPTCHA v3 (optional)
# Copy login_page/src/config/recaptcha.js and update with your keys
# Get keys from: https://www.google.com/recaptcha/admin
# Start development server
npm run dev
# This runs both Vite dev server and Express backend concurrently# 1. Visit https://www.google.com/recaptcha/admin
# 2. Create a new site with reCAPTCHA Enterprise
# 3. Add your domain (localhost for development)
# 4. Copy the Site Key and API Key
# 5. Update login_page/src/config/recaptcha.js with your keys
# 6. Set environment variables:
# REACT_APP_RECAPTCHA_SITE_KEY=your_site_key
# REACT_APP_RECAPTCHA_API_KEY=your_api_key
# Enterprise API Configuration:
# - Project ID: endless-gamma-457506-a0
# - Site Key: 6LekL9ArAAAAAFGpIoMxyUuz5GkXnhT-DQocifhO
# - API Endpoint: https://recaptchaenterprise.googleapis.com/v1/projects/endless-gamma-457506-a0/assessments# Terminal 1: Start ML backend
cd scripts/
python main.py
# Terminal 2: Start web application
cd login_page/
npm run dev# Build and start production server
cd login_page/
npm run build
npm run serverEdit src/core/fusion.py to adjust detection thresholds:
high_threshold: float = 0.65 # High confidence mouse threshold
low_threshold: float = 0.35 # Low confidence mouse threshold
final_threshold: float = 0.45 # Final bot classification thresholdEdit login_page/src/config/recaptcha.js to customize:
scoreThresholds: {
high: 0.7, // High confidence human
medium: 0.5, // Medium confidence
low: 0.3, // Low confidence - likely bot
critical: 0.1 // Very likely bot
}Modify login_page/src/components/LoginPage.jsx:
- CAPTCHA difficulty levels
- Honeypot field configuration
- ML analysis triggers
- reCAPTCHA execution timing
# Run automated bot tests
python scripts/bot.py
# Test login automation
python scripts/login_bot.py
# Run comprehensive demo
python scripts/run_demo.py- Human Behavior: Normal mouse movements, realistic timing
- Bot Simulation: Automated clicks, rapid movements
- Edge Cases: Mixed behavior patterns
- Triple Layer Architecture: reCAPTCHA v3 + Web logs + Mouse movements
- Intelligent Fusion: Adaptive score combination with Google validation
- Real-time Processing: Sub-second detection
- Model Caching: Optimized performance
- reCAPTCHA v3: Google's invisible bot detection
- Honeypot Traps: Hidden form fields
- Visual CAPTCHA: Gamified verification
- Behavior Analysis: Keystroke timing, scroll patterns
- Adaptive Responses: Dynamic security levels
- Modern UI: Material-UI with dark theme
- Real-time Feedback: Live ML analysis status
- Progressive Enhancement: Graceful degradation
- Responsive Design: Mobile-friendly interface
- Local Storage: Session data stored locally
- No External APIs: Fully self-contained system
- Anonymized Logs: No personal data collection
- Secure Transmission: HTTPS in production
- Multiple Signals: reCAPTCHA + ML patterns make evasion extremely difficult
- Google Validation: Leverages Google's massive bot detection database
- Temporal Analysis: Time-based behavior validation
- Adaptive Thresholds: Dynamic detection sensitivity
- Triple Fusion Logic: reCAPTCHA + ML + Behavioral analysis
- reCAPTCHA v3: Google's neural network with 0.0-1.0 scoring
- Web Log Model: Ensemble of Random Forest + XGBoost
- Mouse Model: CNN with 480x1320x1 input shape
- Triple Fusion: reCAPTCHA + ML decision-level combination with confidence weighting
- Model Caching: Singleton pattern for memory efficiency
- Batch Processing: Parallel data analysis
- Lazy Loading: On-demand model initialization
- Result Caching: Avoid redundant computations
- Detection Accuracy: Bot vs Human classification rates
- Processing Speed: ML analysis timing
- User Behavior: Interaction patterns and trends
- System Performance: Resource utilization
- Session Tracking: Complete user journey mapping
- Behavior Profiling: Detailed interaction analysis
- Security Events: Honeypot triggers and CAPTCHA challenges
- System Health: Error rates and performance metrics
- Feature Development: Create feature branches
- Testing: Comprehensive bot simulation tests
- Code Review: ML model validation
- Documentation: Update README and inline comments
- Python: PEP 8 compliance, type hints
- JavaScript: ESLint configuration, modern ES6+
- React: Functional components, hooks
- Documentation: Comprehensive inline comments
This project is developed for educational and research purposes. Please ensure compliance with applicable laws and regulations when implementing bot detection systems in production environments.
- Install Dependencies:
pip install -r requirements.txt && cd login_page && npm install - Start System:
cd login_page && npm run dev - Access Application: Open
http://localhost:3001 - Test Detection: Try both human and bot-like behavior patterns
- Monitor Results: Check console logs and ML analysis results
The system provides a complete end-to-end bot detection solution with modern web interface and advanced machine learning capabilities.
master