Classification of Human Movements in Time Series Using Long Short-Term Memory (LSTM)

📋 Description

This project evaluates the performance of a Long Short-Term Memory (LSTM) architecture for human movement classification in the UCF50 dataset.

📜 Summary

1. Introduction
2. Architecture
3. Methodology
4. Module.py methods
5. Results and Discussion
6. Conclusion
7. Future Steps
8. Running Locally
9. Developer Team
10. References
11. License

📂 Project Tree

LSTM_Classifier/
├── code/
|   ├── examples/ # videos to test the network
│   ├── module.py
│   ├── test.py
│   ├── train.py
|   ├── requirements.txt
├── statistics/ # statistic results
├── README.md
└── LICENSE

1. Introduction

Human movements consist of actions that cannot be properly classified by one image alone, but rather by a set of images in a specific sequence. In this context, the goal of this project is to address the problem of identifying movements by using multi-frame containers (videos) and creating a time-series neural network module. To achieve this goal, a Long Short-Term Memory (LSTM) architecture was chosen due to its ability to retain information from previous steps. Furthermore, to evaluate the network, different frame inputs were tested from 15 to 120 frames.

Regarding the dataset, this study utilizes Realistic Action Recognition: UCF50 [1]. The main reasons for this choice are: the variety of human movement and consistency in usage worldwide.

Diving	HorseRace	Mixing

Example of Diving, HorseRace, and Mixing classes in UCF50 dataset

2. Architecture

To carry out this study, based on Bleed AI Academy’s Youtube video [2], the following architecture of LSTM was used:

 
                              ------------------------------------------------
                              |                 ConvLSTM2D                   |
                              ------------------------------------------------
                              |   Filters=4, Kernel=(3,3), Activation=Tanh   |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |                MaxPooling3D                  |
                              ------------------------------------------------
                              |       Padding=Same, Pool_Size=(1,2,2)        |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |        TimeDistributed + Dropout             |
                              ------------------------------------------------
                              |                Dropout=0.2                   |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |                 ConvLSTM2D                   |
                              ------------------------------------------------
                              |   Filters=14, Kernel=(3,3), Activation=Tanh  |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |                MaxPooling3D                  |
                              ------------------------------------------------
                              |       Padding=Same, Pool_Size=(1,2,2)        |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |        TimeDistributed + Dropout             |
                              ------------------------------------------------
                              |                Dropout=0.2                   |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |                 ConvLSTM2D                   |
                              ------------------------------------------------
                              |   Filters=16, Kernel=(3,3), Activation=Tanh  |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |                MaxPooling3D                  |
                              ------------------------------------------------
                              |       Padding=Same, Pool_Size=(1,2,2)        |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |                  Flatten                     |
                              ------------------------------------------------
                                                   ↓
                              ------------------------------------------------
                              |                  Dense                       |
                              ------------------------------------------------
                              |        6 classes, Activation=SoftMax         |
                              ------------------------------------------------

3. Methodology

3.1. Classes

Initially, to assess which configuration presents the best performance, it was decided to fix the number of classes to seven: WalkingWithDog, Skiing, Swing, Diving, Mixing, HorseRace, and HorseRiding. The classes are encoded with One-Hot Encoded Labels (no need for ordering among themselves).

Skiing	HorseRiding	Swing	WalkingWithDog

Example of Skiing, HorseRiding, Swing and WalkingWithDog classes in UCF50 dataset

3.2. Quantity of Frames

After that establishment, the next step was to alter the quantity of collected frames from 15 to 120 frames. There, I trained each network in 5 epochs to expect an overall performance, and subsequently selected the more efficient ones for longer training (30 epochs).

3.3. Performance Evaluation

For matters of evaluation, metrics such as loss, accuracy, recall, and precision were the backbone to appoint the best network for this context. Finally, the assessment was deemed successful.

4. Module.py methods

As a side effect of this study, I created a structured and oriented module for the LSTM architecture shown above. The main methods are:

Method	Description
create_dataset	creates a dataset from the input path
frame_features_extraction	extract features from each class and store to create the dataset
architecture	assemble the LSTM architecture
predict	predict an input video and store it in an output file
train	train the LSTM model (train: 70, val: 15, test: 15)
evaluate	generate a .json with loss, accuracy, precision, and recall metrics
load_model	load an existent model
save_architecture_image	save an image of the LSTM architecture
save_metric	save training metrics over epochs in a .csv

5. Results and Discussion

Figure 01 shows that the best performance in terms of accuracy, loss, recall, and precision occurs when 60 frames are collected from each video. Nonetheless, it is notable that for longer videos (more than 10 seconds), improving the frames collection may be desirable to provide a more detailed understanding of the action represented throughout the video.

Regarding the epochs, I chose thirty because the network results start declining after this threshold. Even so, with a patience parameter of 10 epochs, it is perceivable that none of the settings go beyond 23, which means that the training time could be reduced as the number of epochs decreases.

Figure 01: Accuracy, loss, recall, precision graphics

In terms of classification, Figure 02 represents the comparison between the original and the predicted video. Note that there is a minimal delay before video classification, which happens in virtue of the need to receive some frames to make a proper inference.

Figure 02: Comparison among the original and the predicted video

This behavior in UCF50 has shown that Long-Short Term Memory Networks are a possibile solution to human movement classification problems.

6. Conclusion

The study demonstrated that LSTMs are a solution to human movement classification problems. Despite using a small and educational dataset, the trained model presented satisfactory results. Furthermore, it is worth noting that, in terms of the UCF50 dataset, the overall best setting happens when 60 frames are captured from each video.

7. Future Steps

It is worth noting that this repository is only a scratch of LSTM's potential to tackle problems concerning the identification of human movements. For the future, adding the capacity of continuous learning, designing an accessible user terminal to execute functions (such as training, creating a dataset, evaluating performance), and testing different architectures are possible implementations.

8. Running Locally

📥 Clone the repository:

git clone https://https://github.com/MarcosTavar3s/LSTM_Classifier.git
cd code

🐍 Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate # Linux or MacOS
venv/Scripts/activate # Windows

📦 Install the dependencies:

pip install -r requirements.txt

🚀Run the project:

python train.py # for training
python test.py # for testing

📌To use only module.py, import in your python code:

from lstm import classifier_model

9. Developer Team


Marcos Aurélio Researcher	Helton Maia Academic Advisor

10. References

[1] P. Ahmad, "Realistic Action Recognition - UCF50," Kaggle, 2022. [Online]. Available: https://www.kaggle.com/datasets/pypiahmad/realistic-action-recognition-ucf50.

[2] Bleed AI Academy, "Human Activity Recognition using TensorFlow (CNN + LSTM) | 2 Methods", YouTube, 2021. [Online]. Available: https://www.youtube.com/watch?v=QmtSkq3DYko.

11. License

This project is licensed under the terms of the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classification of Human Movements in Time Series Using Long Short-Term Memory (LSTM)

📋 Description

📜 Summary

📂 Project Tree

1. Introduction

2. Architecture

3. Methodology

3.1. Classes

3.2. Quantity of Frames

3.3. Performance Evaluation

4. Module.py methods

5. Results and Discussion

6. Conclusion

7. Future Steps

8. Running Locally

9. Developer Team

10. References

11. License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
code		code
statistics		statistics
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Classification of Human Movements in Time Series Using Long Short-Term Memory (LSTM)

📋 Description

📜 Summary

📂 Project Tree

1. Introduction

2. Architecture

3. Methodology

3.1. Classes

3.2. Quantity of Frames

3.3. Performance Evaluation

4. Module.py methods

5. Results and Discussion

6. Conclusion

7. Future Steps

8. Running Locally

9. Developer Team

10. References

11. License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages