Skip to content

laggycoder/ViT-Implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ViT-Implementation

AI Club DC Mini Project

This project is a step-by-step journey into modern deep learning architectures, with a special focus on understanding and implementing Vision Transformers (ViT) using PyTorch. The tasks below are designed to build foundational knowledge and practical coding skills.


πŸš€ Tasks Overview

βœ… Task 1: Convolutional Neural Networks (CNNs)

  • Objective: Understand and implement a basic CNN.
  • What to do:
    • Study CNN architecture fundamentals (convolutional layers, pooling, activation functions).
    • Implement a simple CNN from scratch in PyTorch (e.g., on MNIST or CIFAR-10).

βœ… Task 2: Attention Mechanism & Transformer Encoder-Decoder

  1. Paper Reading:

  2. Practical Exploration:

    • Go through a blog/tutorial where an encoder-decoder transformer model is implemented from scratch.
    • Suggested blog: The Annotated Transformer (or choose your preferred one).

βœ… Task 3: Vision Transformer (ViT)

  1. Paper Reading:

  2. Implementation:

    • Code the ViT architecture from scratch using only PyTorch (no high-level transformer libraries like Hugging Face).
    • Understand and build:
      • Patch embeddings
      • Positional encodings
      • Transformer encoder blocks
      • Classification head

πŸ“ Repository Structure (Implemented)

ViT-Implementation/
β”œβ”€β”€ task1_cnn/
β”‚   └── cnn_model.ipynb
β”œβ”€β”€ task3_vit/
β”‚   └── vit_from_scratch.ipynb
└── README.md

About

AI Club DC Mini Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors