Hi @mk-minchul could you describe the training recipe of ViT-Kprpe-web12m - I am training using the recipe given in the paper i.e lr = 1e-4 and batch_size of 1024 ( using ddp over two gpus) + the gridsample config file with adamw and momentum and decay as given in the supplementary of the paper with pfc=0.3 (since it is faster) - but my results are not as good. Could you confirm whether this is the same recipe for your pretrained model or is it just my partial_fc value which is causing the issue?
Hi @mk-minchul could you describe the training recipe of
ViT-Kprpe-web12m- I am training using the recipe given in the paper i.e lr = 1e-4 and batch_size of 1024 ( using ddp over two gpus) + thegridsampleconfig file withadamwand momentum and decay as given in the supplementary of the paper withpfc=0.3(since it is faster) - but my results are not as good. Could you confirm whether this is the same recipe for yourpretrained modelor is it just mypartial_fcvalue which is causing the issue?