Dear @ChenDaiwei-99,
Thank you very much for sharing your remarkable work. I have thoroughly reviewed your project and find it highly innovative and insightful. The creativity and depth of your contributions are truly inspiring .
After studying the codebase and documentation, I am keenly interested in applying your work to my own project. Before proceeding further, I would like to ensure that my understanding is aligned with your design intentions. During the dataset construction phase, I encountered some confusion regarding specific details. For instance:
Dataset Structure: I have a custom preference dataset that I aim to use for training, but I am uncertain about the proper methodology for converting it into the required format. Could you please elaborate on how the training dataset (e.g., pickapicv2_cliph_original_ds.yaml) is structured? Specifically, I would appreciate more clarity on the configuration settings and the overall data organization.
Hugging Face Embeddings: Regarding the embedding data you released on Hugging Face, I would like to understand how to utilize it effectively. Is it necessary to have both the .ptfiles and their corresponding .joblibfiles? If I wish to incorporate my own data, what steps should I follow to adapt it into a trainable format?
I would be extremely grateful if you could kindly shed light on these points. Your guidance would significantly help me ensure that my approach is consistent with the vision of your work.
Thank you once again for your valuable contributions and for making this project accessible to the community. I look forward to your insights.
Dear @ChenDaiwei-99,
Thank you very much for sharing your remarkable work. I have thoroughly reviewed your project and find it highly innovative and insightful. The creativity and depth of your contributions are truly inspiring .
After studying the codebase and documentation, I am keenly interested in applying your work to my own project. Before proceeding further, I would like to ensure that my understanding is aligned with your design intentions. During the dataset construction phase, I encountered some confusion regarding specific details. For instance:
Dataset Structure: I have a custom preference dataset that I aim to use for training, but I am uncertain about the proper methodology for converting it into the required format. Could you please elaborate on how the training dataset (e.g., pickapicv2_cliph_original_ds.yaml) is structured? Specifically, I would appreciate more clarity on the configuration settings and the overall data organization.
Hugging Face Embeddings: Regarding the embedding data you released on Hugging Face, I would like to understand how to utilize it effectively. Is it necessary to have both the .ptfiles and their corresponding .joblibfiles? If I wish to incorporate my own data, what steps should I follow to adapt it into a trainable format?
I would be extremely grateful if you could kindly shed light on these points. Your guidance would significantly help me ensure that my approach is consistent with the vision of your work.
Thank you once again for your valuable contributions and for making this project accessible to the community. I look forward to your insights.