Skip to content

bskkimm/Simple-ViT-Implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📚 Simple-ViT-Implementation

This repo offers simple implementationi ViT (Vision Transformer) from scratch using PyTorch.

Find elaborated implementation here.

🚀 Getting started

Please follow the insturction below.

git clone https://github.com/bskkimm/Simple-ViT-Implementation.git
conda create -n ViT python=3.10 -y
conda activate ViT
pip install -r requirements.txt

Then, implement ViT step by step using tutorial_from_scratch.ipynb

📊 Results

Model Dataset Train Accuracy Test Accuracy GPU Used Training Time
ViT-B/12 CIFAR-10 98.88% 77.40% RTX 4070 Laptop 2.0 hours

🔍 Attention Map Visualization

Due to the small image size in CIFAR-10, I implemented attention map visualization on the Food-101 dataset instead, which offers higher-resolution samples more suitable for visual interpretability.

ViT Attention Map on Food101

About

This repo offers simple implementationi ViT (Vision Transformer) from scratch using PyTorch.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors