Skip to content

MarcosTavar3s/LSTM_Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

72 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Classification of Human Movements in Time Series Using Long Short-Term Memory (LSTM)

License: GPL v3 Python TensorFlow


πŸ“‹ Description

This project evaluates the performance of a Long Short-Term Memory (LSTM) architecture for human movement classification in the UCF50 dataset.


πŸ“œ Summary


πŸ“‚ Project Tree

LSTM_Classifier/
β”œβ”€β”€ code/
|   β”œβ”€β”€ examples/ # videos to test the network
β”‚   β”œβ”€β”€ module.py
β”‚   β”œβ”€β”€ test.py
β”‚   β”œβ”€β”€ train.py
|   β”œβ”€β”€ requirements.txt
β”œβ”€β”€ statistics/ # statistic results
β”œβ”€β”€ README.md
└── LICENSE

1. Introduction

Human movements consist of actions that cannot be properly classified by one image alone, but rather by a set of images in a specific sequence. In this context, the goal of this project is to address the problem of identifying movements by using multi-frame containers (videos) and creating a time-series neural network module. To achieve this goal, a Long Short-Term Memory (LSTM) architecture was chosen due to its ability to retain information from previous steps. Furthermore, to evaluate the network, different frame inputs were tested from 15 to 120 frames.

Regarding the dataset, this study utilizes Realistic Action Recognition: UCF50 [1]. The main reasons for this choice are: the variety of human movement and consistency in usage worldwide.

Diving HorseRace Mixing
Diving gif HorseRace gif Mixing gif

Example of Diving, HorseRace, and Mixing classes in UCF50 dataset


2. Architecture

To carry out this study, based on Bleed AI Academy’s Youtube video [2], the following architecture of LSTM was used:

 
                              ------------------------------------------------
                              |                 ConvLSTM2D                   |
                              ------------------------------------------------
                              |   Filters=4, Kernel=(3,3), Activation=Tanh   |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |                MaxPooling3D                  |
                              ------------------------------------------------
                              |       Padding=Same, Pool_Size=(1,2,2)        |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |        TimeDistributed + Dropout             |
                              ------------------------------------------------
                              |                Dropout=0.2                   |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |                 ConvLSTM2D                   |
                              ------------------------------------------------
                              |   Filters=14, Kernel=(3,3), Activation=Tanh  |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |                MaxPooling3D                  |
                              ------------------------------------------------
                              |       Padding=Same, Pool_Size=(1,2,2)        |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |        TimeDistributed + Dropout             |
                              ------------------------------------------------
                              |                Dropout=0.2                   |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |                 ConvLSTM2D                   |
                              ------------------------------------------------
                              |   Filters=16, Kernel=(3,3), Activation=Tanh  |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |                MaxPooling3D                  |
                              ------------------------------------------------
                              |       Padding=Same, Pool_Size=(1,2,2)        |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |                  Flatten                     |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |                  Dense                       |
                              ------------------------------------------------
                              |        6 classes, Activation=SoftMax         |
                              ------------------------------------------------
  

3. Methodology

3.1. Classes

Initially, to assess which configuration presents the best performance, it was decided to fix the number of classes to seven: WalkingWithDog, Skiing, Swing, Diving, Mixing, HorseRace, and HorseRiding. The classes are encoded with One-Hot Encoded Labels (no need for ordering among themselves).

Skiing HorseRiding Swing WalkingWithDog
Skiing gif HorseRiding gif  Swing gif  WalkingWithDog gif

Example of Skiing, HorseRiding, Swing and WalkingWithDog classes in UCF50 dataset

3.2. Quantity of Frames

After that establishment, the next step was to alter the quantity of collected frames from 15 to 120 frames. There, I trained each network in 5 epochs to expect an overall performance, and subsequently selected the more efficient ones for longer training (30 epochs).

3.3. Performance Evaluation

For matters of evaluation, metrics such as loss, accuracy, recall, and precision were the backbone to appoint the best network for this context. Finally, the assessment was deemed successful.


4. Module.py methods

As a side effect of this study, I created a structured and oriented module for the LSTM architecture shown above. The main methods are:

Method Description
create_dataset creates a dataset from the input path
frame_features_extraction extract features from each class and store to create the dataset
architecture assemble the LSTM architecture
predict predict an input video and store it in an output file
train train the LSTM model (train: 70, val: 15, test: 15)
evaluate generate a .json with loss, accuracy, precision, and recall metrics
load_model load an existent model
save_architecture_image save an image of the LSTM architecture
save_metric save training metrics over epochs in a .csv

5. Results and Discussion

Figure 01 shows that the best performance in terms of accuracy, loss, recall, and precision occurs when 60 frames are collected from each video. Nonetheless, it is notable that for longer videos (more than 10 seconds), improving the frames collection may be desirable to provide a more detailed understanding of the action represented throughout the video.

Regarding the epochs, I chose thirty because the network results start declining after this threshold. Even so, with a patience parameter of 10 epochs, it is perceivable that none of the settings go beyond 23, which means that the training time could be reduced as the number of epochs decreases.

Performance Graphics

Figure 01: Accuracy, loss, recall, precision graphics

In terms of classification, Figure 02 represents the comparison between the original and the predicted video. Note that there is a minimal delay before video classification, which happens in virtue of the need to receive some frames to make a proper inference.

Comparison gif

Figure 02: Comparison among the original and the predicted video

This behavior in UCF50 has shown that Long-Short Term Memory Networks are a possibile solution to human movement classification problems.


6. Conclusion

The study demonstrated that LSTMs are a solution to human movement classification problems. Despite using a small and educational dataset, the trained model presented satisfactory results. Furthermore, it is worth noting that, in terms of the UCF50 dataset, the overall best setting happens when 60 frames are captured from each video.


7. Future Steps

It is worth noting that this repository is only a scratch of LSTM's potential to tackle problems concerning the identification of human movements. For the future, adding the capacity of continuous learning, designing an accessible user terminal to execute functions (such as training, creating a dataset, evaluating performance), and testing different architectures are possible implementations.


8. Running Locally

πŸ“₯ Clone the repository:

git clone https://https://github.com/MarcosTavar3s/LSTM_Classifier.git
cd code

🐍 Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate # Linux or MacOS
venv/Scripts/activate # Windows

πŸ“¦ Install the dependencies:

pip install -r requirements.txt

πŸš€Run the project:

python train.py # for training
python test.py # for testing

πŸ“ŒTo use only module.py, import in your python code:

from lstm import classifier_model

9. Developer Team

Marcos AurΓ©lio
Researcher
Helton Maia
Academic Advisor

10. References

[1] P. Ahmad, "Realistic Action Recognition - UCF50," Kaggle, 2022. [Online]. Available: https://www.kaggle.com/datasets/pypiahmad/realistic-action-recognition-ucf50.

[2] Bleed AI Academy, "Human Activity Recognition using TensorFlow (CNN + LSTM) | 2 Methods", YouTube, 2021. [Online]. Available: https://www.youtube.com/watch?v=QmtSkq3DYko.


11. License

This project is licensed under the terms of the MIT License.

About

This project evaluates the performance of a Long Short-Term Memory (LSTM) architecture for human movement classification in the UCF50 dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages