Skip to content

sharmashubham99/SignSense

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🀟 SignSense β€” Gesture-Driven Assistive Interface

Masters-Level HCI + ML Project | Indian/American Sign Language β†’ Spoken Text with Affective Computing

License: MIT React Python MediaPipe


πŸ“Œ Project Overview

SignSense is a real-time, web-based assistive communication platform that:

  1. Detects hand, face, and pose landmarks via MediaPipe Holistic (500+ landmarks)
  2. Translates ISL/ASL signs into text using a Transformer / LSTM sequence model
  3. Detects emotion using a CNN on facial micro-expressions (Happy / Serious / Urgent)
  4. Speaks translated text with emotion-matched voice via Web Speech API
  5. Predicts next words using an ML-powered suggestion engine (reducing signer fatigue)

This project targets the Deaf/Hard-of-Hearing community and focuses on low cognitive load, real-time latency, and WCAG 2.1 accessibility compliance.


πŸ—‚οΈ Project Structure

SignSense/
β”‚
β”œβ”€β”€ public/                    # Static assets
β”‚   └── index.html
β”‚
β”œβ”€β”€ src/                       # React frontend
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ CameraFeed.jsx         # Webcam capture + MediaPipe overlay
β”‚   β”‚   β”œβ”€β”€ TranslationPanel.jsx   # Real-time sign β†’ text output
β”‚   β”‚   β”œβ”€β”€ EmotionBadge.jsx       # Live emotion indicator
β”‚   β”‚   β”œβ”€β”€ PredictiveBar.jsx      # Word suggestion strip
β”‚   β”‚   β”œβ”€β”€ SpeechControls.jsx     # TTS controls (rate, pitch, voice)
β”‚   β”‚   β”œβ”€β”€ LatencyMonitor.jsx     # System response time display (HCI eval)
β”‚   β”‚   β”œβ”€β”€ AccessibilityAudit.jsx # WCAG heuristic checker panel
β”‚   β”‚   └── Navbar.jsx
β”‚   β”œβ”€β”€ hooks/
β”‚   β”‚   β”œβ”€β”€ useMediaPipe.js        # MediaPipe Holistic integration
β”‚   β”‚   β”œβ”€β”€ useWebSocket.js        # Real-time backend communication
β”‚   β”‚   └── useSpeech.js           # Web Speech API hook
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”œβ”€β”€ landmarkUtils.js       # Normalize/flatten landmark arrays
β”‚   β”‚   β”œβ”€β”€ emotionMap.js          # Emotion β†’ TTS pitch/rate mapping
β”‚   β”‚   └── wcagChecker.js         # WCAG 2.1 heuristic evaluator
β”‚   β”œβ”€β”€ App.jsx
β”‚   β”œβ”€β”€ main.jsx
β”‚   └── index.css
β”‚
β”œβ”€β”€ backend/                   # Python FastAPI backend
β”‚   β”œβ”€β”€ main.py                    # FastAPI app + WebSocket endpoint
β”‚   β”œβ”€β”€ routes/
β”‚   β”‚   β”œβ”€β”€ predict.py             # Sign prediction route
β”‚   β”‚   └── emotion.py             # Emotion analysis route
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ lstm_model.py          # LSTM sequence classifier
β”‚   β”‚   β”œβ”€β”€ transformer_model.py   # Transformer-based sign recognizer
β”‚   β”‚   └── emotion_cnn.py         # CNN facial emotion classifier
β”‚   └── utils/
β”‚       β”œβ”€β”€ landmark_processor.py  # Preprocess MediaPipe landmarks
β”‚       └── tts_modifier.py        # Emotion-aware TTS parameter output
β”‚
β”œβ”€β”€ notebooks/                 # Jupyter notebooks for model training
β”‚   β”œβ”€β”€ 01_data_collection.ipynb
β”‚   β”œβ”€β”€ 02_landmark_extraction.ipynb
β”‚   β”œβ”€β”€ 03_lstm_training.ipynb
β”‚   β”œβ”€β”€ 04_emotion_cnn_training.ipynb
β”‚   └── 05_evaluation_metrics.ipynb
β”‚
β”œβ”€β”€ docs/                      # HCI evaluation documentation
β”‚   β”œβ”€β”€ HCI_Evaluation_Report.md
β”‚   β”œβ”€β”€ Usability_Test_Protocol.md
β”‚   β”œβ”€β”€ SUS_Score_Template.xlsx
β”‚   └── Latency_Study.md
β”‚
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── ci.yml             # GitHub Actions CI
β”‚
β”œβ”€β”€ requirements.txt           # Python dependencies
β”œβ”€β”€ package.json               # Node dependencies
β”œβ”€β”€ .env.example               # Environment variable template
β”œβ”€β”€ .gitignore
└── LICENSE

🧠 Technical Architecture

Webcam Input
    β”‚
    β–Ό
MediaPipe Holistic (Browser)
    β”‚  500+ landmarks (x,y,z)
    β–Ό
WebSocket ──────────────────► FastAPI Backend
                                    β”‚
                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                          β–Ό                     β–Ό
                  LSTM / Transformer        Emotion CNN
                  (Sign β†’ Word)         (Face β†’ Emotion)
                          β”‚                     β”‚
                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β–Ό
                          Translated Text + Emotion Tag
                                    β”‚
                                    β–Ό
                          Web Speech API (Emotion-aware TTS)
                                    β”‚
                                    β–Ό
                          User hears spoken, emotionally
                          nuanced translation βœ…

πŸš€ Roadmap

Phase 1 β€” Data & Landmarks βœ…

  • Set up MediaPipe Holistic in React
  • Extract and save landmark CSV files for 50 signs
  • Augment dataset (flipping, noise injection)

Phase 2 β€” ML Model Training

  • Train LSTM on landmark sequences (ISL/ASL vocabulary)
  • Train CNN on facial expression dataset (FER-2013 or AffectNet)
  • Evaluate: Precision, Recall, F1 per sign class

Phase 3 β€” Backend API

  • Build FastAPI WebSocket for real-time inference
  • Integrate both models with a unified prediction endpoint
  • Optimize for < 200ms latency

Phase 4 β€” Frontend Dashboard

  • Build React UI with camera feed + translation panel
  • Add emotion badge, predictive word bar, TTS controls
  • Implement WCAG 2.1 heuristic audit panel

Phase 5 β€” HCI Evaluation

  • Conduct usability study (n β‰₯ 5 participants)
  • Calculate SUS (System Usability Scale) score
  • Document latency measurements and optimizations
  • Heuristic evaluation report

βš™οΈ Setup & Installation

Prerequisites

  • Node.js 18+
  • Python 3.10+
  • Webcam
  • (Optional) NVIDIA GPU for faster model training

1. Clone the repository

git clone https://github.com/YOUR_USERNAME/SignSense.git
cd SignSense

2. Frontend Setup

npm install
npm run dev
# App runs at http://localhost:5173

3. Backend Setup

cd backend
python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn main:app --reload --port 8000

4. Environment Variables

cp .env.example .env
# Edit .env with your config

πŸ“Š HCI Evaluation Metrics

Metric Tool Target
Task Success Rate Usability Test β‰₯ 85%
System Usability Scale (SUS) SUS Questionnaire β‰₯ 75 / 100
System Response Time LatencyMonitor component < 200ms
WCAG 2.1 Compliance AccessibilityAudit component AA Level
Word Prediction Accuracy ML evaluation β‰₯ 70% top-3

πŸ“š References & Datasets

  • WLASL (Word-Level American Sign Language) β€” link
  • INCLUDE (Indian Sign Language) β€” link
  • FER-2013 (Facial Emotion Recognition) β€” Kaggle
  • AffectNet (High-res facial expressions)
  • MediaPipe Holistic: Google AI

πŸ‘€ Author

Shubham Sharma
MSc Human-Computer Interaction / AI
University Name Siegen University


πŸ“„ License

MIT License β€” see LICENSE

About

Gesture-Driven Assistive Interface for Sign Language Translation with Affective Computing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors