Skip to content

we concatenate the image embedding and text embedding and input them into the network?  #18

@Valentina-tinder

Description

@Valentina-tinder

Hi Dear author, I am retraining your network to adapt to other object level tasks, such as object level, and found that the predicted metal is too small, close to 0. You input image embeddings in the training model pipeline, while the diffusion model usually inputs text embeddings (if I can give it a material prompt word). Can you tell me why? If the text embedding is input, will the prediction results of the trained model become better or worse? Or can we concatenate the image embedding and text embedding and input them into the network? Thank you very much if you can answer

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions