Your project is attempt to fine-tune a model with cosmology knowledge, and I checked your history, it seems you tried to overfit the QA dataset you generated and other SFT datasets, I wondered have you considered do a second pre-training with mistral or other base model, and then fine-tune with chat styled SFT dataset?
Why I asking that because some people said fine tuning is a way to teach the model how to response to question/instruction, but the knowledge adoption is happening in pre-train stage, but we still find some projects are able to teach the model new knowledge via fine-tuning.
But i was thinking overfitting the model with a lot of epochs, why not just do second pre-training and then fine tuning.
Your project is attempt to fine-tune a model with cosmology knowledge, and I checked your history, it seems you tried to overfit the QA dataset you generated and other SFT datasets, I wondered have you considered do a second pre-training with mistral or other base model, and then fine-tune with chat styled SFT dataset?
Why I asking that because some people said fine tuning is a way to teach the model how to response to question/instruction, but the knowledge adoption is happening in pre-train stage, but we still find some projects are able to teach the model new knowledge via fine-tuning.
But i was thinking overfitting the model with a lot of epochs, why not just do second pre-training and then fine tuning.