The code for preprocess data is in data_process.ipynb which consists of:
- Get keypoints using MediaPipe for each image
- Flip each image and get the keypoints in the flipper image
The code for class Dataset is in dataset.py where the data augmentation is applied:
- random flip
- by certain probability, we use the keypoints from the flipped image
- If the flipped image has no keypoints, use the original one
- keypoint rotation
- added a random
$[-\frac{\pi}{6}, \frac{\pi}{6}]$ rotation to x,y,z axis of the keypoints
- added a random
The training script for models are in train_val_cnn_aug.ipynb
To inference, run
python runMediaPipe_CNN.py
and use model weight cnn_model_new.pth
Compared to previous version, I have added the new Dataset class that apply the random transformation to keypoints and the flipped keypoints for images.
Something I have tried are:
- Use Mediapipe during training to obtain keypoints, in this way we can first apply image transformation, but it is super slow
- For random rotation I used to use range
$[-\pi,\pi]$ , but then the model confuses letters like O and E - The current hyperparameters are the ones that does best so far. I have tried:
- remove second CNN layer
- for second CNN layer use channel 16
- 2 and 3 hidden layers
- batch size 32
- hidden size 64, 256
- epochs: 50, 100, 200
- lr: 0.01, 0.005 along with same weight decay