Skip to content

Improve HF integration #5

@NielsRogge

Description

@NielsRogge

Hi @WangHelin1997,

Niels here from the open-source team at Hugging Face. I discovered your work through the daily papers: https://huggingface.co/papers/2409.08425, congrats!. I work together with AK on improving the visibility of researchers' work on the hub.

I see you already made have a model and demo on the 🤗 hub which is great. I've got a couple of suggestions on improving the HF integration:

Uploading datasets

I see the datasets are currently inside the model repo: https://huggingface.co/westbrook/SoloAudio/tree/main. Would be great to make them available as Dataset repos. See https://huggingface.co/docs/datasets/loading for details. The datasets could then also be made compatible with the Datasets library so that people can load the data in 2 lines of code, there's also the dataset viewer: https://huggingface.co/docs/hub/en/datasets-viewer, etc. (which allows people to explore the data right from the browser)

Uploading models

Regarding the models, we encourage researchers to push each model checkpoint to a separate model repository, so that things like download stats also work. Currently it seems that the VAE and SoloAudio checkpoints are in a single repo: https://huggingface.co/westbrook/SoloAudio/tree/main.

See here for a guide: https://huggingface.co/docs/hub/models-uploading. In case the models are custom PyTorch model, we could probably leverage the PyTorchModelHubMixin class which adds from_pretrained and push_to_hub to each model. Alternatively, one can leverages the hf_hub_download one-liner to download a checkpoint from the hub.

Let me know if you're interested/need any help regarding this!

Cheers,

Niels
ML Engineer @ HF 🤗

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions