Skip to content

Seeking guidance on dataset construction and training with released PickaPic embeddings #9

@3038543815

Description

@3038543815

Dear @ChenDaiwei-99,
Thank you very much for sharing your remarkable work. I have thoroughly reviewed your project and find it highly innovative and insightful. The creativity and depth of your contributions are truly inspiring .
After studying the codebase and documentation, I am keenly interested in applying your work to my own project. Before proceeding further, I would like to ensure that my understanding is aligned with your design intentions. During the dataset construction phase, I encountered some confusion regarding specific details. For instance:
Dataset Structure: I have a custom preference dataset that I aim to use for training, but I am uncertain about the proper methodology for converting it into the required format. Could you please elaborate on how the training dataset (e.g., pickapicv2_cliph_original_ds.yaml) is structured? Specifically, I would appreciate more clarity on the configuration settings and the overall data organization.
Hugging Face Embeddings: Regarding the embedding data you released on Hugging Face, I would like to understand how to utilize it effectively. Is it necessary to have both the .ptfiles and their corresponding .joblibfiles? If I wish to incorporate my own data, what steps should I follow to adapt it into a trainable format?
I would be extremely grateful if you could kindly shed light on these points. Your guidance would significantly help me ensure that my approach is consistent with the vision of your work.
Thank you once again for your valuable contributions and for making this project accessible to the community. I look forward to your insights.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions