Masters-Level HCI + ML Project | Indian/American Sign Language β Spoken Text with Affective Computing
SignSense is a real-time, web-based assistive communication platform that:
- Detects hand, face, and pose landmarks via MediaPipe Holistic (500+ landmarks)
- Translates ISL/ASL signs into text using a Transformer / LSTM sequence model
- Detects emotion using a CNN on facial micro-expressions (Happy / Serious / Urgent)
- Speaks translated text with emotion-matched voice via Web Speech API
- Predicts next words using an ML-powered suggestion engine (reducing signer fatigue)
This project targets the Deaf/Hard-of-Hearing community and focuses on low cognitive load, real-time latency, and WCAG 2.1 accessibility compliance.
SignSense/
β
βββ public/ # Static assets
β βββ index.html
β
βββ src/ # React frontend
β βββ components/
β β βββ CameraFeed.jsx # Webcam capture + MediaPipe overlay
β β βββ TranslationPanel.jsx # Real-time sign β text output
β β βββ EmotionBadge.jsx # Live emotion indicator
β β βββ PredictiveBar.jsx # Word suggestion strip
β β βββ SpeechControls.jsx # TTS controls (rate, pitch, voice)
β β βββ LatencyMonitor.jsx # System response time display (HCI eval)
β β βββ AccessibilityAudit.jsx # WCAG heuristic checker panel
β β βββ Navbar.jsx
β βββ hooks/
β β βββ useMediaPipe.js # MediaPipe Holistic integration
β β βββ useWebSocket.js # Real-time backend communication
β β βββ useSpeech.js # Web Speech API hook
β βββ utils/
β β βββ landmarkUtils.js # Normalize/flatten landmark arrays
β β βββ emotionMap.js # Emotion β TTS pitch/rate mapping
β β βββ wcagChecker.js # WCAG 2.1 heuristic evaluator
β βββ App.jsx
β βββ main.jsx
β βββ index.css
β
βββ backend/ # Python FastAPI backend
β βββ main.py # FastAPI app + WebSocket endpoint
β βββ routes/
β β βββ predict.py # Sign prediction route
β β βββ emotion.py # Emotion analysis route
β βββ models/
β β βββ lstm_model.py # LSTM sequence classifier
β β βββ transformer_model.py # Transformer-based sign recognizer
β β βββ emotion_cnn.py # CNN facial emotion classifier
β βββ utils/
β βββ landmark_processor.py # Preprocess MediaPipe landmarks
β βββ tts_modifier.py # Emotion-aware TTS parameter output
β
βββ notebooks/ # Jupyter notebooks for model training
β βββ 01_data_collection.ipynb
β βββ 02_landmark_extraction.ipynb
β βββ 03_lstm_training.ipynb
β βββ 04_emotion_cnn_training.ipynb
β βββ 05_evaluation_metrics.ipynb
β
βββ docs/ # HCI evaluation documentation
β βββ HCI_Evaluation_Report.md
β βββ Usability_Test_Protocol.md
β βββ SUS_Score_Template.xlsx
β βββ Latency_Study.md
β
βββ .github/
β βββ workflows/
β βββ ci.yml # GitHub Actions CI
β
βββ requirements.txt # Python dependencies
βββ package.json # Node dependencies
βββ .env.example # Environment variable template
βββ .gitignore
βββ LICENSE
Webcam Input
β
βΌ
MediaPipe Holistic (Browser)
β 500+ landmarks (x,y,z)
βΌ
WebSocket βββββββββββββββββββΊ FastAPI Backend
β
βββββββββββ΄βββββββββββ
βΌ βΌ
LSTM / Transformer Emotion CNN
(Sign β Word) (Face β Emotion)
β β
βββββββββββ¬ββββββββββββ
βΌ
Translated Text + Emotion Tag
β
βΌ
Web Speech API (Emotion-aware TTS)
β
βΌ
User hears spoken, emotionally
nuanced translation β
- Set up MediaPipe Holistic in React
- Extract and save landmark CSV files for 50 signs
- Augment dataset (flipping, noise injection)
- Train LSTM on landmark sequences (ISL/ASL vocabulary)
- Train CNN on facial expression dataset (FER-2013 or AffectNet)
- Evaluate: Precision, Recall, F1 per sign class
- Build FastAPI WebSocket for real-time inference
- Integrate both models with a unified prediction endpoint
- Optimize for < 200ms latency
- Build React UI with camera feed + translation panel
- Add emotion badge, predictive word bar, TTS controls
- Implement WCAG 2.1 heuristic audit panel
- Conduct usability study (n β₯ 5 participants)
- Calculate SUS (System Usability Scale) score
- Document latency measurements and optimizations
- Heuristic evaluation report
- Node.js 18+
- Python 3.10+
- Webcam
- (Optional) NVIDIA GPU for faster model training
git clone https://github.com/YOUR_USERNAME/SignSense.git
cd SignSensenpm install
npm run dev
# App runs at http://localhost:5173cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn main:app --reload --port 8000cp .env.example .env
# Edit .env with your config| Metric | Tool | Target |
|---|---|---|
| Task Success Rate | Usability Test | β₯ 85% |
| System Usability Scale (SUS) | SUS Questionnaire | β₯ 75 / 100 |
| System Response Time | LatencyMonitor component | < 200ms |
| WCAG 2.1 Compliance | AccessibilityAudit component | AA Level |
| Word Prediction Accuracy | ML evaluation | β₯ 70% top-3 |
- WLASL (Word-Level American Sign Language) β link
- INCLUDE (Indian Sign Language) β link
- FER-2013 (Facial Emotion Recognition) β Kaggle
- AffectNet (High-res facial expressions)
- MediaPipe Holistic: Google AI
Shubham Sharma
MSc Human-Computer Interaction / AI
University Name Siegen University
MIT License β see LICENSE