Hi authors,
Thank you for sharing this interesting work and releasing the code. I have been reading the paper carefully and have a question regarding the ablation studies.
Could you clarify what image encoder is used in Tables 5–7 when DCFormer is removed? Is it replaced by ViT3D, as in the encoder comparison in Table 8, or is a different encoder/projector configuration used?
It would be very helpful if you could provide more details about the exact ablation configurations, especially:
What encoder replaces DCFormer in Tables 5–7?
What projector is used when MLP-Mixer is removed together with DCFormer?
Hi authors,
Thank you for sharing this interesting work and releasing the code. I have been reading the paper carefully and have a question regarding the ablation studies.
Could you clarify what image encoder is used in Tables 5–7 when DCFormer is removed? Is it replaced by ViT3D, as in the encoder comparison in Table 8, or is a different encoder/projector configuration used?
It would be very helpful if you could provide more details about the exact ablation configurations, especially:
What encoder replaces DCFormer in Tables 5–7?
What projector is used when MLP-Mixer is removed together with DCFormer?