🧠 Sentry: Multimodal Mental Health Assessment Framework
A deep learning system for real-time mental health assessment using facial expressions and body posture analysis.
Overview
Quick Start
Training Models
All Commands
Project Structure
Documentation
Sentry combines facial emotion recognition with body posture analysis to assess mental health indicators like stress, depression, and anxiety.
Feature
Description
Multimodal AI
DenseNet121 (Face) + TCN-LSTM (Posture) + Cross-Attention Fusion
6 Emotions
Neutral, Happy, Sad, Surprise, Fear, Anger
6 Predictions
Stress, Depression, Anxiety, Posture, Stress Indicators, Trajectory
Face Meshgrid
468-point MediaPipe FaceMesh with color-coded regions
Real-time
20-30 FPS with GPU acceleration
Privacy First
100% local processing - no data sent externally
┌─────────────────────────────────────────────────────────────┐
│ VIDEO INPUT │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────┴───────────────┐
▼ ▼
┌───────────────┐ ┌───────────────┐
│ FACE │ │ BODY │
│ DenseNet121 │ │ MediaPipe │
│ → Emotion │ │ → Pose │
│ → 512D │ │ → Features │
└───────────────┘ └───────────────┘
│ │
│ ┌───────────────┐
│ │ TCN-LSTM │
│ │ Temporal │
│ │ → 512D │
│ └───────────────┘
│ │
└───────────────┬───────────────┘
▼
┌───────────────────────┐
│ CROSS-ATTENTION │
│ FUSION (1024D) │
└───────────────────────┘
│
▼
┌───────────────────────────────────────────┐
│ 6 PREDICTION HEADS │
├───────────────────────────────────────────┤
│ • Stress (low/moderate/high) │
│ • Depression (minimal/mild/moderate/severe)│
│ • Anxiety (minimal/mild/moderate/severe) │
│ • Posture (upright/slouched/open/closed) │
│ • Stress Indicator (calm/fidgeting/...) │
│ • Trajectory (stable/deteriorating/...) │
└───────────────────────────────────────────┘
pip install -r requirements.txt
python main.py --video path/to/video.mp4
Emotion Model (FER2013 - Recommended)
# Balanced training - 5000 samples per class with augmentation
python train.py emotion --data data/fer2013 --epochs 40 --balance --aggressive
Emotion Model (CK+ - Small but Clean)
# Balanced training - 400 samples per class
python train.py emotion --data data/ck --epochs 40 --balance --aggressive
Emotion Model (AffectNet)
python train.py emotion --data data/affectnet --epochs 40 --balance
python train.py posture --data data/posture --epochs 50
Flag
Description
--data
Path to dataset (required)
--epochs
Number of training epochs (default: 20)
--batch-size
Batch size (default: 64)
--balance
Balance classes (CK=400, FER2013=5000 samples each)
--aggressive
Extra strong augmentation (use with --balance)
--target-samples
Custom samples per class when balancing
--cpu
Force CPU training
# Webcam demo
python main.py --demo
# Process video file
python main.py --video path/to/video.mp4
# Use trained emotion model
python main.py --demo --trained-model models/emotion_trained/best_model.pth
# Emotion training
python train.py emotion --data data/fer2013 --epochs 40 --balance --aggressive
# Posture training
python train.py posture --data data/posture --epochs 50
# Classifier training (for mental health heads)
python train.py classifier --features path/to/features --labels path/to/labels.json
python train.py evaluate --model models/emotion_trained/best_model.pth --data data/fer2013
# Show download instructions
python train.py download --dataset fer2013
python train.py download --dataset affectnet
python train.py download --dataset posture
# Download posture datasets automatically
python scripts/download_video_posture_datasets.py --dataset all
sentry/
├── main.py # Application entry point
├── train.py # Training CLI
├── requirements.txt # Dependencies
│
├── src/ # Source code
│ ├── facial/ # Face detection & emotion
│ │ ├── emotion.py # EmotionClassifier (DenseNet121)
│ │ ├── detector.py # BlazeFace face detector
│ │ ├── facemesh_analyzer.py # FaceMesh 468 landmarks
│ │ └── postprocessor.py # Emotion post-processing
│ ├── posture/ # Pose estimation
│ │ ├── pose_estimator.py # MediaPipe wrapper
│ │ ├── features.py # Feature extraction
│ │ └── temporal_model.py # TCN-LSTM model
│ ├── fusion/ # Multimodal fusion
│ │ └── fusion_network.py # Cross-attention fusion
│ ├── prediction/ # Mental health prediction
│ │ └── classifier.py # 6-head classifier
│ ├── visualization/ # Display & overlays
│ │ ├── monitor.py # Real-time dashboard
│ │ └── facemesh_visualizer.py # Meshgrid overlay
│ └── config.py # Configuration
│
├── training/ # Training utilities
│ ├── datasets/ # Dataset loaders
│ │ ├── fer2013.py # FER2013 loader
│ │ ├── affectnet.py # AffectNet loader
│ │ └── transforms.py # Data augmentation
│ └── trainers/ # Training loops
│ ├── emotion_trainer.py
│ └── posture_trainer.py
│
├── models/ # Saved checkpoints
├── data/ # Datasets
└── docs/ # Documentation
Dataset
Classes
Size
Download
FER2013
6 (disgust excluded)
~28K train
kaggle datasets download -d msambare/fer2013
CK+
6 (small, clean)
~1K train
kaggle datasets download zhiguocui/ck-dataset
AffectNet
6
~290K train
kaggle datasets download -d mstjebashazida/affectnet
# Download all posture datasets
python scripts/download_video_posture_datasets.py --dataset all
GPU : Use CUDA for 20-30 FPS real-time processing
Batch Size : Reduce to 32 if out of memory
Workers : Set --workers 4 for faster data loading
FP16 : Automatically enabled on GPU
MIT License - See LICENSE file for details.