[Survey, 논문 리뷰] ViT-Adapter, flash attention, ......

# Vision Transformer Adapter for Dense Predictions

Info.
- ICLR 2023 spotlight
- https://github.com/czczup/ViT-Adapter
- https://arxiv.org/abs/2205.08534



### Summary
- plain ViT
  - which is prone to work poorly due to the lack of inductive bias & weak prior assumption
  - To achieve general-purpose model, **transformer structure** is essential for masked data modeling and multi-modal pre-training
  - but vision-specific models are stronger than transformers... -> adapter can be a solution
- adapter: train on large-scale multi-modal data
  - after training is done, no need to fine-tune for downstream tasks
- transformers for each of the various vision tasks
  - such as instance, semantic, panoptic segmentation, visual grounding, detection...
- achieve SOTA without using external dataset
  - 😮 is the performance in light of the strong adapter already pre-trained on a multimodal dataset?
  - => The author mentions this point. the author compared models under the fair pre-training strategy


### Questions before reading the paper
- is the "adapter" concept the same as NLP's? https://intelligentcm.tistory.com/340
  - Yes! The author refers to the NLP's adapter paper in the introduction section.
  - eg., object detection on COCO val2017.
![image](https://github.com/sghong977/Daily_AIML/assets/46152199/8cd9c5bc-4844-4cc7-a334-5c950ad4af2a)


- github에 flash attention을 적용한다는 말이 있던데, 요즘 이 키워드 자주 보인다. 이건 뭐지?
  - [설명 link](https://taewan2002.medium.com/%EC%84%B1%EB%8A%A5-%EC%B5%9C%EC%A0%81%ED%99%94%EB%A5%BC-%EC%9C%84%ED%95%9C-flash-attention-2-41a345808005) 이거 읽어보니까 그냥 연산 효율적으로 하려고 만든 기법이다. chatGPT, Bard에게 물어보니 (당연하지만) 연산 결과는 인반 attention과 똑같다.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Survey, 논문 리뷰] ViT-Adapter, flash attention, ...... #40

Vision Transformer Adapter for Dense Predictions

Summary

Questions before reading the paper

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Survey, 논문 리뷰] ViT-Adapter, flash attention, ...... #40

Description

Vision Transformer Adapter for Dense Predictions

Summary

Questions before reading the paper

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions