Using MediaPipe Hand Landmarks + TensorFlow Neural Network
This project recognizes A–Z American Sign Language gestures in real-time using:
- MediaPipe Hand Tracking (21 hand landmarks)
- Custom ASL Dataset (User Captured)
- MLP Deep Learning Model (42-d Landmark Features)
- OpenCV for Live Video Feed
This system is designed for deaf & mute communication support, gesture-controlled interfaces, and educational use.
| Feature | Description |
|---|---|
| Real-time Hand Tracking | Detects hand and draws red landmark points + connections |
| Custom Dataset Support | User can add their own ASL samples |
| High Accuracy | Achieves 95%–99% accuracy on well-lit conditions |
| Fast Training | Model trains in 5–15 minutes (no GPU required) |
| Lightweight Model | Uses only 42 numerical features per frame |
| Works on Normal Laptops | No NVIDIA GPU required |
- Python 3.10
- MediaPipe
- OpenCV
- TensorFlow / Keras
- NumPy
- Scikit-learn
git clone https://github.com/YourUsername/ASL-Sign-Recognition.git cd ASL-Sign-Recognition
py -3.10 -m venv handenv handenv\Scripts\activate
pip install -r requirements.txt
Run dataset capture script: python capture_dataset.py
- Enter the letter (A–Z).
- Show the hand sign in front of webcam.
- Red dots and lines will show tracking.
- 200 samples per letter recommended.
- Press
Qto stop and move to next letter.
Your dataset will be stored like: dataset/ ├── A/ ├── B/ ├── C/ └── …
After collecting dataset: python train_asl_landmarks.py
Output Model Files: model/asl_landmarks_mlp.h5 model/labels.txt
Training Time: 5–18 minutes depending on dataset size.
python predict_asl_live.py
Controls:
| Key | Action |
|---|---|
Q |
Quit Application |
+ |
Increase Confidence Threshold |
- |
Decrease Confidence Threshold |
Output Example: Pred: A (97%) FPS: 28
- Use plain background for best results
- Ensure good lighting (avoid shadows)
- Keep hand centered inside webcam frame
- Letters J and Z involve movement, use final stop pose for training
- Do NOT upload dataset to GitHub → Add
dataset/to.gitignore
| Metric | Score |
|---|---|
| Training Accuracy | ~97–99% |
| Validation Accuracy | ~96–99% |
| Real-Time Accuracy | ~90–98% (depends on lighting & distance) |
Best Performance Conditions:
- Stable hand
- Good lighting
- Clean background
- Convert Recognized Gesture → Voice Output (Text-to-Speech)
- Convert Continuous Letters → Word Builder Mode
- Add Numbers (0–9) & Common Words (HELLO, THANK YOU, YES, NO)
- Develop GUI App (Tkinter / PyQt / Flet)
Project Developer: Shivam Soni
If using in GitHub → add: Shivam09xc