SentinelAI does not just detect threats — it teaches users how to think critically in the face of cyber manipulation.
Features • Installation • Usage • Demo Mode • Architecture
- Overview
- Key Features
- Tech Stack
- Installation
- Usage
- Demo Mode
- Model Performance
- Architecture
- Project Structure
- Screenshots
- Contributing
- License
SentinelAI is an advanced, full-stack cybersecurity application that combines machine learning with educational psychology to detect phishing attempts and teach users how to recognize cyber threats. Unlike traditional security tools that simply flag threats, SentinelAI provides comprehensive explanations, helping users develop critical thinking skills to protect themselves.
- 98.65% Detection Accuracy - Trained on 5,500+ real-world spam/phishing messages
- Educational Focus - Explains WHY messages are dangerous, not just IF they are
- Cognitive Analysis - Identifies 8 psychological manipulation tactics used by attackers
- Real-time Feedback - Instant analysis with actionable recommendations
- Student-Friendly - Designed for cybersecurity education and awareness training
- Three-tier classification: SAFE / WARNING / DANGER
- Phishing probability score: 0-100% with confidence metrics
- Ensemble ML model: Combines Logistic Regression, Naive Bayes, and Random Forest
- Real-time analysis: Instant results with detailed breakdowns
Analyzes 8 psychological manipulation tactics:
- 😨 Fear-based language - Panic-inducing threats
- ⏰ Urgency pressure - Time-sensitive demands
- 👔 Authority impersonation - Fake official communications
- 💰 Financial threats - Money-related pressure
- 🎁 Reward bait - Too-good-to-be-true offers
- ⚡ Scarcity tactics - Limited availability claims
- 😔 Guilt manipulation - Emotional pressure
- 🔒 Trust exploitation - False security claims
- Risk Meter: Animated progress bar with color-coded threat levels
- Radar Chart: 8-axis visualization of manipulation tactics
- Feature Importance: Ranked list of detected threat indicators
- Statistics Dashboard: Track scans, threats, and safe messages
Dynamic educational guidance that explains:
- Why it's dangerous: Plain-language explanation of threats
- Attacker's goal: What they want you to do
- What to do instead: Actionable security steps
- 30-second safety tips: Quick, memorable advice
- Suspicious word detection: Highlights dangerous phrases
- Feature contribution weights: Shows impact levels (Critical/High/Medium)
- Plain-language explanations: No technical jargon
- Context-aware feedback: Tailored to detected threats
- 15 rotating security insights: Educational tips on various topics
- Interactive learning: "Get New Insight" button for continuous education
- Best practices: Password security, 2FA, link verification, and more
Perfect for presentations and training:
- Safe Email: Legitimate business communication
- Medium Risk: Suspicious verification request
- High Risk Phishing: Multi-vector attack with red flags
- Python 3.8+ - Core programming language
- Flask 2.0+ - Web framework
- Scikit-learn - Machine learning library
- Pandas - Data manipulation
- NumPy - Numerical computing
- TF-IDF Vectorizer - Text feature extraction (10,000 features, 1-3 grams)
- Logistic Regression - Primary classifier
- Naive Bayes - Probabilistic classifier
- Random Forest - Ensemble decision trees
- Voting Classifier - Ensemble model combining all three
- HTML5 - Structure
- CSS3 - Glassmorphism design with cyber theme
- Vanilla JavaScript - Interactive functionality
- Chart.js - Data visualizations (radar charts, doughnut charts)
- Dark theme (#0d1117 background)
- Neon accents (Green #00ff88, Cyan #00ccff)
- Glassmorphism - Frosted glass effect panels
- Responsive layout - Mobile-friendly design
- Python 3.8 or higher
- pip (Python package manager)
- 4GB RAM minimum
- Modern web browser (Chrome, Firefox, Safari, Edge)
git clone https://github.com/yourusername/sentinelai.git
cd sentinelaipip install -r requirements.txtRequired packages:
- Flask==3.0.0
- scikit-learn==1.3.2
- pandas==2.1.4
- numpy==1.26.2
Option A: Use provided dataset (113 samples)
cd model
python train.pyOption B: Use large spam.csv dataset (5,572 samples) - Recommended
- Place
spam.csvin the project root directory - Run training:
cd model
python train.pyThe script will automatically detect and use spam.csv if available.
Training output:
- Model accuracy metrics
- Confusion matrix
- Cross-validation scores
- Sample predictions
- Saved files:
phishing_model.pklandvectorizer.pkl
python app.pyOpen your browser and navigate to:
http://localhost:5000
- Enter Text: Paste any email or message into the text area
- Analyze: Click "Analyze Threat" or press
Ctrl+Enter - Review Results:
- Threat level (SAFE/WARNING/DANGER)
- Phishing probability percentage
- Cognitive manipulation analysis
- Suspicious words detected
- Educational feedback
- Recommended actions
Ctrl+Enter- Analyze messageEsc- Clear results (if implemented)
- 🟢 SAFE (0-30%): Low risk, appears legitimate
- 🟡 WARNING (30-70%): Moderate risk, exercise caution
- 🔴 DANGER (70-100%): High risk, likely phishing
Each manipulation tactic is scored 0-100:
- 0-30: Low presence
- 30-70: Moderate presence
- 70-100: High presence
Perfect for live demonstrations, training sessions, and judging:
- Click the "🎯 Demo Attack Mode" button
- Select from three preloaded examples
- Legitimate team meeting reminder
- Professional business communication
- No threat indicators
- Expected result: SAFE classification
- Order confirmation with verification request
- Suspicious external link
- Moderate urgency language
- Expected result: WARNING classification
- Urgent security alert
- Multiple psychological triggers
- Requests sensitive information (SSN, password)
- Fear + urgency + authority tactics
- Expected result: DANGER classification
- 🎓 Cybersecurity training sessions
- 🏆 Competition demonstrations
- 👥 Team awareness workshops
- 📊 Stakeholder presentations
- Total samples: 5,572 messages
- Spam/Phishing: 747 (13.4%)
- Ham/Safe: 4,825 (86.6%)
- Training set: 4,457 samples (80%)
- Test set: 1,115 samples (20%)
| Model | Accuracy | Notes |
|---|---|---|
| Ensemble | 98.65% | ⭐ Best overall |
| Logistic Regression | 98.57% | Fast, reliable |
| Naive Bayes | 98.30% | Probabilistic |
| Random Forest | 97.49% | Robust |
| Metric | Safe/Ham | Spam/Phishing |
|---|---|---|
| Precision | 99% | 99% |
| Recall | 100% | 91% |
| F1-Score | 99% | 95% |
Predicted
Safe Spam
Actual Safe 960 6 (99.4% correct)
Spam 10 139 (93.3% correct)
- 5-fold CV: 98.20% (+/- 0.55%)
- Consistency: Very stable across folds
- Generalization: Excellent performance on unseen data
✅ Only 6 false positives (0.6% of safe emails) ✅ Only 10 false negatives (6.7% of spam emails) ✅ 98.65% overall accuracy ✅ Production-ready performance
User Input → Flask Backend → ML Pipeline → Analysis Engine → Frontend Display
↓ ↓ ↓
Vectorizer Model Prediction Cognitive Analysis
↓ ↓ ↓
TF-IDF Probability Score Teach-Back Engine
↓ ↓
Threat Level Educational Content
-
Text Preprocessing
- Lowercase conversion
- Unicode normalization
- Stop word removal
-
Feature Extraction
- TF-IDF vectorization
- 10,000 features
- 1-3 word n-grams
- Sublinear term frequency
-
Model Prediction
- Ensemble voting (soft)
- Probability estimation
- Confidence calculation
-
Post-Processing
- Threat level classification
- Cognitive score calculation
- Suspicious word detection
- Educational content generation
app.py
├── /analyze (POST) # Main analysis endpoint
├── /get-insight (GET) # Random security tip
└── /get-demo-example/<type> # Demo examples
Analysis Pipeline:
1. Text vectorization
2. Model prediction
3. Cognitive manipulation analysis
4. Suspicious word detection
5. Teach-back generation
6. Response formattingmain.js
├── analyzeMessage() # Main analysis function
├── displayResults() # Render results
├── createRadarChart() # Cognitive visualization
├── updateRiskMeter() # Risk level display
├── displaySuspiciousWords() # Explainable AI
└── loadDemoExample() # Demo modesentinelai/
├── 📄 app.py # Flask application & API endpoints
├── 📄 requirements.txt # Python dependencies
├── 📄 README.md # This file
├── 📄 .gitignore # Git ignore rules
│
├── 📁 model/ # Machine Learning
│ ├── train.py # Training script
│ ├── phishing_model.pkl # Trained model (generated)
│ └── vectorizer.pkl # TF-IDF vectorizer (generated)
│
├── 📁 data/ # Datasets
│ └── phishing_dataset.csv # Training data (113 samples)
│
├── 📁 templates/ # HTML templates
│ └── index.html # Main UI
│
└── 📁 static/ # Static assets
├── 📁 css/
│ └── style.css # Cyber-themed styles
└── 📁 js/
└── main.js # Frontend logic
External:
spam.csv # Large dataset (5,572 samples)
Dark cyber-themed interface with glassmorphism design
Real-time threat detection with visual indicators
8-axis visualization of psychological manipulation tactics
Educational guidance with plain-language explanations
Contributions are welcome! Here's how you can help:
- 🎨 UI/UX improvements
- 🧠 Additional ML models
- 📊 More visualization options
- 🌐 Internationalization (i18n)
- 📱 Mobile app version
- 🔌 Browser extension
- 📚 Documentation improvements
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes
- Test thoroughly
- Commit:
git commit -m "Add feature" - Push:
git push origin feature-name - Create a Pull Request
- Follow PEP 8 for Python
- Use ESLint for JavaScript
- Comment complex logic
- Write descriptive commit messages
This project is licensed under the MIT License - see the LICENSE file for details.
- Your Name - Initial work - YourGitHub
- SMS Spam Collection Dataset
- Scikit-learn community
- Flask framework
- Chart.js library
- Cybersecurity education community
- Email: neevmodh205@gamil.com
- GitHub: @neevmodh
- LinkedIn: Neev Modh
- Email integration (Gmail, Outlook)
- Browser extension for real-time protection
- Mobile app (iOS/Android)
- Multi-language support
- API for third-party integration
- Advanced reporting and analytics
- User accounts and history tracking
- Custom model training interface
- Threat intelligence feed integration
- Automated phishing simulation training
Made with ❤️ for cybersecurity education
⭐ Star this repo if you find it helpful!