Skip to content

Sachinxmpl/Vision_transformer_from_scratch

Repository files navigation

Vision Transformer (ViT) from Scratch

This repository contains a PyTorch implementation of the Vision Transformer (ViT), built completely from scratch without relying on high-level Transformer libraries.
It follows the approach introduced in the paper:

📄 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy et al., ICLR 2021


🚀 Features

  • Custom implementation of core ViT components:
    • Patch Embedding
    • Multi-Head Self Attention (MHSA)
    • Position Embeddings
    • Transformer Encoder Layers
    • Classification Head
  • Training pipeline for CIFAR-10 dataset

📂 Project Structure

vision_transformers_from_scratch/
│── dataset.py 
│── train.py 
│── models/
│ └── ViT.py 
│── best_vit_cifar10.pth  

About

Implementation of Vision Transformer (ViT) built entirely from scratch using PyTorch. Based on paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages