A real-time Computer Vision application that detects and translates American Sign Language (ASL) alphabets using a live webcam feed.
Communication barriers exist between individuals who use sign language and those who do not. This project aims to bridge that gap by providing an automated, real-time translation tool that interprets hand gestures into written alphabets, making communication more accessible.
- Real-Time Inference: Captures live webcam feed and predicts sign language gestures instantly with low latency.
- Transfer Learning: Utilizes a pre-trained MobileNetV2 base for highly efficient feature extraction, ensuring the model is lightweight and fast enough to run on standard CPUs.
- Modern Web Interface: Provides a clean web application built with Flask, making the tool accessible through a standard web browser.
- Targeting Assistance: On-screen visual guides help users frame their hand correctly for maximum prediction accuracy.
- Deep Learning: TensorFlow, Keras
- Computer Vision: OpenCV, NumPy
- Web Framework: Flask
- Data Loading: The dataset is augmented and loaded using Keras ImageDataGenerator to improve model robustness.
- Model Training: A MobileNetV2 architecture is fine-tuned on the sign language dataset to extract deep visual features.
- Inference Pipeline: The Flask application uses OpenCV to capture the webcam feed, isolates a Region of Interest (ROI), and pre-processes the frame to match the model's expected input.
- Prediction: The pre-processed frame is passed through the neural network to output the most likely alphabet and its confidence score, which is then displayed back to the user.
The model is trained on a custom dataset of American Sign Language alphabets. The dataset consists of cropped images of hands performing different signs, categorized into individual folders corresponding to each class.
By utilizing transfer learning with MobileNetV2, the model converges rapidly and achieves high accuracy on validation data. During real-time use, it successfully identifies static gestures when placed correctly within the targeting frame.
-
Install Dependencies: Ensure you have Python installed, then run:
pip install -r requirements.txt
-
Run the Application: Start the Flask web server:
python app.py
-
Open in Browser: Open your web browser and navigate to:
http://127.0.0.1:5000/
- Transition from raw image classification to hand-landmark detection using MediaPipe for improved background and lighting immunity.
- Expand the dataset to include dynamic gestures (words and phrases) instead of only static alphabets.
- Improve the UI to save and display sequences of translated letters to form complete sentences.
- Chirayu (Developed independently)