[20230413] Weekly VLM2 - CoCoOp

Paper
[Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2203.05557) (a.k.a. CoCoOp)

**Summary**
기존 CoOp : Prompt Learning의 개념을 VLM에 도입하여 활용하였음.
prompt 내의 context 단어들을 학습 가능한 벡터들로 변환하여, 학습을 함, 장정으로는 성능이 잘 나오지만,
많은 시간 및 데이터 구축 비용, 낮은 generarilty. --> 즉, 구체적인 class들에만 optimize됨.

![image](https://user-images.githubusercontent.com/86551201/230847642-f48c1255-7ff9-40fe-9886-a14beab77036.png)

CoCoOp의 경우, Conditional Prompt Learning을 사용함으로써, 고정된 context가 아닌, input instance에 따른 condion을 활용함.
아래의 그림과 같이, context vector와 각각의 Model의 parameter를 optimize함.

![image](https://user-images.githubusercontent.com/86551201/230848202-1853e0a2-e044-42de-a385-ff6b68ca9fd8.png)

CoCoOp의 경우, Image Captioning의 패러다임과 유사함. 그리고, Image instance를 사용함으로써, 좀 더 generalize를 가져올 수 있음.

Image Encoder의 경우, meta-Net이라고 불리는 것을 활용함.
- 2 layer bottleneck 구조
- input에 대해 conditional token, context vector와 결합

즉, 장점을 3가지로 요약하자면,..
1. 새로운 class에 대해서 일반화된 성능을 보일 수 있음 (Overfitting이 안됨),
2. 특정 class set에 집중되는 것을 방지, lightweight model을 사용해도 좋은 성능을 발휘함.
3. general 하므로, class shift에도 성능이 잘 나옴.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[20230413] Weekly VLM2 - CoCoOp #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[20230413] Weekly VLM2 - CoCoOp #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions