Skip to content

bug: fix computer vision & model integration issues (label crash, out-of-bound crop, image size mismatch) #33

Description

@anushkakannawar
  1. Model Labels Index Out-of-Bounds Crash

The Problem: The list of letters is hardcoded to 26 alphabets in the code, but the model has been trained on a dataset with 27 labels (including a custom thank-you label). If the model outputs the 27th index, the code attempts to access a non-existent index in the array, crashing the backend.

Solution:

At application startup, the application should dynamically open and read Model/labels.txt line by line.
For each line, parse the text to extract the class name (usually by separating the label index from the name string).
Build the labels array dynamically from this file instead of using a hardcoded list of letters. This guarantees the labels array always matches the model's output classes.

  1. Data Collection Boundary Crash

The Problem: The hand-tracking bounding box expands by a fixed margin (offset). If the hand moves close to the camera borders, the expanded coordinates go negative or exceed the frame's pixel dimensions, resulting in an invalid crop slice. Passing this empty image to the resizing function crashes the program.

Solution:

Before slicing the webcam frame array, obtain the frame's actual width and height.
Clamp the start crop coordinates so they never drop below 0 (using a maximum value check).
Clamp the end crop coordinates so they never exceed the frame's height or width (using a minimum value check).
Add a safety guard clause: check if the cropped image dimensions are valid (non-zero) before passing it to OpenCV's resize functions. If the crop is empty, skip processing for that frame.

  1. Image Resolution & Padding Inconsistencies

The Problem: The live app and data capture scripts normalize hand crops onto a 300x300 canvas, while the training script resizes and trains on 224x224 resolution. This results in double-rescaling during live usage, which degrades image sharpness, destroys fine hand details, and leads to lower model accuracy.

Solution:

Harmonize the target image dimensions to a single standardized size (e.g., 224x224 or 300x300) across all three core files: app.py, dataCollection.py, and train_model.py.
Ensure that the white padding canvas size, the crop resize target size, and the model's neural network input layer configuration match perfectly. Standardizing to 224x224 is recommended as it reduces memory usage and improves real-time inference speed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions