SignDecode: Real-Time Sign Language Recognition System

Overview

SignDecode is a computer vision application that performs real-time American Sign Language (ASL) recognition using skeletal hand tracking and neural network classification. The system processes webcam input to detect hand landmarks and classify gestures into alphanumeric characters (A-Z, 0-9).

Key Technical Achievement: By using skeletal coordinate data instead of raw pixel analysis, the system achieves 99% reduction in input dimensionality (63 features vs 4,096 pixels for 64x64 images) while maintaining 92% classification accuracy.

Architecture

System Pipeline

Camera Input → MediaPipe Hand Detection → Landmark Extraction (21 points × 3 coords) 
→ MLP Classifier → Temporal Smoothing → Output Display

Technology Stack

Computer Vision: OpenCV, Google MediaPipe
Machine Learning: TensorFlow/Keras (Multi-Layer Perceptron)
Backend: Flask (REST API)
Frontend: Vanilla JavaScript, HTML5, CSS3
Data Processing: NumPy, Pandas

Model Architecture

Input Layer:    63 features (21 hand landmarks × xyz coordinates)
Hidden Layer 1: 128 neurons, ReLU activation, 30% dropout
Hidden Layer 2: 64 neurons, ReLU activation, 30% dropout  
Hidden Layer 3: 64 neurons, ReLU activation
Output Layer:   36 neurons (A-Z, 0-9), Softmax activation

Training Configuration:

Optimizer: Adam
Loss Function: Sparse Categorical Crossentropy
Epochs: 50
Batch Size: 16
Train/Test Split: 80/20

Performance Metrics:

Training Accuracy: 92%
Validation Accuracy: 88%
Inference Latency: <50ms per frame
Model Size: ~500KB

Project Structure

SignDecode/
├── src/                        # Application source code
│   ├── app.py                 # Flask server, API endpoints
│   ├── model.py               # Model wrapper class
│   ├── utils.py               # Keypoint extraction utilities
│   ├── labels.py              # Class label mappings
│   ├── text_to_speech.py      # Audio output module
│   ├── static/                # Frontend assets (CSS, JS)
│   └── templates/             # HTML templates
├── training/                   # Model training pipeline
│   ├── collect_data.py        # Data collection utility
│   ├── train_model.py         # Model training script
│   └── dataset/               # Training data storage
├── models/                     # Trained model artifacts
│   └── sign_language_model.h5
├── run.py                      # Application entry point
├── requirements.txt            # Python dependencies
└── README.md

Installation

Prerequisites

Python 3.8+
Webcam
Modern web browser (Chrome/Firefox recommended)

Setup

# Clone repository
git clone https://github.com/thesakshidigg/SignDecode-Sign-Language-Recognition-.git
cd SignDecode-Sign-Language-Recognition-

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run application
python run.py

Navigate to http://localhost:5000 in your browser.

Usage

Running the Application

python run.py

The Flask server starts on port 5000. The web interface provides:

Real-time hand tracking visualization
ASL character recognition
Text-to-speech output
Interactive sign language game

Training a Custom Model

Step 1: Collect Training Data

cd training
python collect_data.py

Follow prompts to record hand gestures for each character. Data is saved to training/dataset/sign_data.csv.

Step 2: Train Model

python train_model.py

Trains the neural network and saves the model to models/sign_language_model.h5.

Technical Implementation Details

Hand Landmark Detection

MediaPipe detects 21 anatomical landmarks per hand:

Wrist (1 point)
Thumb (4 points)
Index, Middle, Ring, Pinky fingers (4 points each)

Each landmark provides (x, y, z) coordinates, normalized to [0, 1] range.

Classification Approach

The system uses a Multi-Layer Perceptron (MLP) rather than a Convolutional Neural Network (CNN) because:

Input Type: Structured coordinate data (not spatial image data)
Efficiency: 63 input features vs 4,096+ for image-based approaches
Invariance: Skeletal data is inherently invariant to lighting, background, and skin tone

Temporal Smoothing

A 15-frame consistency filter prevents prediction flickering:

if predicted_label == last_predicted_label:
    frame_count += 1
    if frame_count >= THRESHOLD_FRAMES:
        output_text += predicted_character

API Endpoints

POST /process_frame

Processes a single video frame for hand detection and classification.

Request:

{
  "image": "data:image/jpeg;base64,..."
}

Response:

{
  "prediction": "A",
  "image": "data:image/jpeg;base64,..."
}

GET /status

Returns current recognized text.

POST /clear_text

Clears the output text buffer.

Performance Considerations

Latency: Sub-50ms inference time enables real-time processing at 30 FPS
Accuracy Trade-off: Temporal smoothing reduces false positives at the cost of ~500ms recognition delay
Scalability: Lightweight model enables deployment on edge devices without GPU

Future Enhancements

Support for dynamic gestures (word-level recognition using LSTM/GRU)
Multi-language sign language support (BSL, ISL, etc.)
Mobile deployment using TensorFlow Lite
Data augmentation for improved robustness
REST API for third-party integration

Contributing

Contributions are welcome. Please follow standard Git workflow:

Fork the repository
Create a feature branch (git checkout -b feature/improvement)
Commit changes (git commit -m 'Add feature')
Push to branch (git push origin feature/improvement)
Open a Pull Request

License

MIT License - see LICENSE file for details.

Contact

Sakshi Diggikar
GitHub: @thesakshidigg

Acknowledgments

Google MediaPipe for hand tracking framework
TensorFlow team for ML infrastructure
ASL dataset contributors

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Lib/site-packages		Lib/site-packages
Scripts		Scripts
__pycache__		__pycache__
src		src
training		training
.gitignore		.gitignore
README.md		README.md
render.yaml		render.yaml
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SignDecode: Real-Time Sign Language Recognition System

Overview

Architecture

System Pipeline

Technology Stack

Model Architecture

Project Structure

Installation

Prerequisites

Setup

Usage

Running the Application

Training a Custom Model

Technical Implementation Details

Hand Landmark Detection

Classification Approach

Temporal Smoothing

API Endpoints

POST /process_frame

GET /status

POST /clear_text

Performance Considerations

Future Enhancements

Contributing

License

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SignDecode: Real-Time Sign Language Recognition System

Overview

Architecture

System Pipeline

Technology Stack

Model Architecture

Project Structure

Installation

Prerequisites

Setup

Usage

Running the Application

Training a Custom Model

Technical Implementation Details

Hand Landmark Detection

Classification Approach

Temporal Smoothing

API Endpoints

POST /process_frame

GET /status

POST /clear_text

Performance Considerations

Future Enhancements

Contributing

License

Contact

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages