ViT-Homework-AML

*For this homework you will need to hand in a report (.pdf) with the required information.

The paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale shows that Transformers applied directly to image patches and pre-trained on large datasets work really well on image recognition tasks. The results from this paper using various arquitectures and multiple datasets are in the table below.

model	dataset	dropout=0.1
R50+ViT-B_16	cifar10	98.94%, 10.1h (V100), tb.dev
R50+ViT-B_16	cifar100	92.30%, 10.1h (V100), tb.dev
R50+ViT-B_16	imagenet2012	85.08%, 24.2h (V100), tb.dev
ViT-B_16	cifar10	98.76%, 7.8h (V100), tb.dev
ViT-B_16	cifar100	91.92%, 7.8h (V100), tb.dev
ViT-B_16	imagenet2012	84.12%, 19.3h (V100), tb.dev
ViT-B_32	cifar10	98.75%, 1.8h (V100), tb.dev
ViT-B_32	cifar100	92.05%, 1.8h (V100), tb.dev
ViT-B_32	imagenet2012	81.31%, 4.9h (V100), tb.dev
ViT-L_16	cifar10	99.14%, 24.7h (V100), tb.dev
ViT-L_16	cifar100	93.22%, 24.4h (V100), tb.dev
ViT-L_16	imagenet2012	85.05%, 59.7h (V100), tb.dev
ViT-L_32	cifar10	99.09%, 6.1h (V100), tb.dev
ViT-L_32	cifar100	93.34%, 6.2h (V100), tb.dev
ViT-L_32	imagenet2012	81.13%, 15.0h (V100), tb.dev

First Task (1 point):

This first task takes about 6 hours to run so plan ahead

Recreate the results from the ViT-B_16 architecture with the cifar10 dataset. Attach evidence from your results in your report.

#Previously install the requirements
$ pip install -r requirements.txt

# Download the imagenet21k pre-train model
$ wget https://storage.googleapis.com/vit_models/imagenet21k/ViT-B_16.npz

#Run train.py
$ python train.py --name cifar10-100_500 --dataset cifar10 --model_type ViT-B_16 --pretrained_dir ViT-B_16.npz

Use Automatic Mixed Precision(Amp) to reduce memory usage and train faster by setting --fp16_opt_level to O2 (--fp16 --fp16_opt_level O2).

*Note: Set train batch size and number of steps to 1000, otherwise it will take about 50 hours to run. *Note: When GPU memory is insufficient, you can proceed with training by adjusting the value of --gradient_accumulation_steps to 10.

Second task (1 point):

Open the modeling.py file in the models folder. Explain in your own words whats happening between lines 83 to 91 (Find the #Start and #Finish comments) and why. Explain the relevance of the operations.

Third Task (3 points):

As you already know, self-attention is essential in ViT. The attention map for the input image can be visualized through the attention score of self-attention. Open the ViT_Visualize.py file.

Part A (1.5 point):

Read the code and answer the questions/instructions included in the code.

Part B (1 point):

Choose 6 images of 6 different categories (ej. golden retriever, piano, person, etc) from the internet.

Specifications:

Google Chrome recommended
The image must be .jpg
Copy the image address and include it in the corresponding part of the code

Visualize the attention maps of the 6 images. Attach a subplot which includes the original images and the attention maps obtained. Include the predictions in your report.

An example of an attention map is shown below:

Part C (0.5 point):

Choose 3 images from the previous exercise and visualize the attention maps from the first, middle and last layer. Attach a subplot with the original images and the corresponding 3 attention maps. Briefly analyze the different layers.

BONO (0.25 points):

Instead of using internet images, use 6 images taken by you.

Hand In

Attach the required files to your homework repository.

Report (.pdf)
ViT_Visualize.py completed

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
models		models
utils		utils
README.md		README.md
ViT_Visualize.py		ViT_Visualize.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ViT-Homework-AML

First Task (1 point):

Second task (1 point):

Third Task (3 points):

Part A (1.5 point):

Part B (1 point):

Part C (0.5 point):

BONO (0.25 points):

Hand In

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ViT-Homework-AML

First Task (1 point):

Second task (1 point):

Third Task (3 points):

Part A (1.5 point):

Part B (1 point):

Part C (0.5 point):

BONO (0.25 points):

Hand In

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages