AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 results in the LLaVA model achieving a loss of 0，grad_norm of NAN.

When using AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 leads to the LLaVA model reaching a loss of 0 after 5000 steps. The original paper kept the encoder frozen. Why is it not recommended to unfreeze it for training? If I decide to unfreeze it, What should I do?