Dear Author, I have some questions regarding the training code and I hope you can provide answers. First of all, I noticed that in the paper it is mentioned that during the training process the entire encoder is frozen. Then, why was a learning rate set? Is this merely about training the geometric prediction head for the entire pi3 project? Additionally, I noticed that there is a global point cloud in the code. I observed that it uses the first frame for cross-attention. This seems to be different from the approach of the pi3. Is it necessary to enable this during training? Finally, if I want to make full-scale fine-tuning of pi3 or pi3x, should I enable the encoder and conduct the training in one go without going through the phased training process? (With the confidence level indicator included) Thank you for your response.
Dear Author, I have some questions regarding the training code and I hope you can provide answers. First of all, I noticed that in the paper it is mentioned that during the training process the entire encoder is frozen. Then, why was a learning rate set? Is this merely about training the geometric prediction head for the entire pi3 project? Additionally, I noticed that there is a global point cloud in the code. I observed that it uses the first frame for cross-attention. This seems to be different from the approach of the pi3. Is it necessary to enable this during training? Finally, if I want to make full-scale fine-tuning of pi3 or pi3x, should I enable the encoder and conduct the training in one go without going through the phased training process? (With the confidence level indicator included) Thank you for your response.