MobileViViT

This repository provides an implementation of MobileViViT (Mobile Video Vision Transformers) — an adaptation of MobileViT designed for higher-dimensional tasks such as video.

⚠️ Note: This is not a research-based project. It is an adaptation of MobileViT with careful consideration to preserve the integrity of the original work.

Available Variants

MobileViViT-S
MobileViViT-XS
MobileViViT-XXS

Design Philosophy

The implementation is built from modular components that can be reused independently or combined to construct new architectures. Utility modules (e.g., custom training loops, video data generators) are included under utils/ and come with documentation for easier understanding and extension.

References & Inspirations

This repository includes complete or partial implementations inspired by or directly adapted from the following works:

All references have been properly acknowledged. If any material violates terms of use, please contact me, and I will promptly address it.

Quick Start

The repository was developed with Python 3.11.x, TensorFlow, and Keras.

Clone the repository and install dependencies:

git clone https://github.com/AliKHaliliT/MobileViViT.git
cd MobileViViT
pip install -r requirements.txt

(Optional) Create a virtual environment:

python -m venv venv
source venv/bin/activate   # On Linux/Mac
venv\Scripts\activate      # On Windows

⚠️ Depending on your system, you may need to install additional packages beyond those listed in requirements.txt.

Package Structure

MobileViViT/
 ├── __init__.py
 ├── assets/
 │   ├── __init__.py
 │   ├── activations/
 │   │   ├── __init__.py
 │   │   ├── hard_swish.py
 │   │   ├── mish.py
 │   │   └── sine.py
 │   ├── blocks/
 │   │   ├── __init__.py
 │   │   ├── mlp.py
 │   │   ├── mobilevivit.py
 │   │   ├── mvnblock.py
 │   │   ├── siren.py
 │   │   └── transformer.py
 │   ├── layers/
 │   │   ├── __init__.py
 │   │   ├── conv2plus1d.py
 │   │   ├── conv_layer.py
 │   │   ├── fc_layer.py
 │   │   ├── fold.py
 │   │   ├── positional_encoder.py
 │   │   ├── sae_3d.py
 │   │   ├── sine_layer.py
 │   │   ├── ssae_3d.py
 │   │   ├── transformer_layer.py
 │   │   ├── tubelet_embedding.py
 │   │   └── unfold.py
 │   └── utils/
 │       ├── __init__.py
 │       ├── activation_function.py
 │       ├── low_resource_training_scheme.py
 │       ├── move_column_to_the_beginning.py
 │       ├── progress_bar.py
 │       ├── sine_layer_initializer.py
 │       ├── squeeze_and_excitation.py
 │       ├── video_data_generator.py
 │       ├── video_file_to_numpy_array.py
 │       └── video_frame_unifier.py
 ├── mobilevivit_s.py
 ├── mobilevivit_xs.py
 └── mobilevivit_xxs.py

Usage Example

import pandas as pd
from MobileViViT.assets.utils.video_data_generator import VideoDataGenerator
from MobileViViT import MobileViViTXXS


# Config
num_output_units = 2
batch_size = 1
epochs = 1


# Sample input video
path_to_video = "util_resources/test_video.mp4"
video_data = pd.DataFrame({
    "Address + FileName": [path_to_video],
    "0": [0],
    "1": [1]
})


# Data generator
data_generator = VideoDataGenerator(dataframe=video_data, batch_size=batch_size)


# Initialize and train model
model = MobileViViTXXS(num_output_units=num_output_units)
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
model.fit(data_generator, epochs=epochs)

License

This work is under an MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
MobileViViT		MobileViViT
tests		tests
util_resources		util_resources
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
playground.py		playground.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MobileViViT

Available Variants

Design Philosophy

References & Inspirations

Quick Start

Package Structure

Usage Example

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AliKHaliliT/MobileViViT

Folders and files

Latest commit

History

Repository files navigation

MobileViViT

Available Variants

Design Philosophy

References & Inspirations

Quick Start

Package Structure

Usage Example

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages