ViT web12m training recipe and loss log/final loss value

Hi @mk-minchul could you describe the training recipe of `ViT-Kprpe-web12m` - I am training using the recipe given in the paper i.e **lr = 1e-4** and batch_size of **1024** ( using ddp over two gpus) + the `gridsample` config file with `adamw` and momentum and decay as given in the supplementary of the paper with `pfc=0.3` (since it is faster) - but my results are not as good. Could you confirm whether this is the same recipe for your `pretrained model` or is it just my `partial_fc` value which is causing the issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ViT web12m training recipe and loss log/final loss value #40

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ViT web12m training recipe and loss log/final loss value #40

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions