Hi @WangHelin1997,
Niels here from the open-source team at Hugging Face. I discovered your work through the daily papers: https://huggingface.co/papers/2409.08425, congrats!. I work together with AK on improving the visibility of researchers' work on the hub.
I see you already made have a model and demo on the 🤗 hub which is great. I've got a couple of suggestions on improving the HF integration:
Uploading datasets
I see the datasets are currently inside the model repo: https://huggingface.co/westbrook/SoloAudio/tree/main. Would be great to make them available as Dataset repos. See https://huggingface.co/docs/datasets/loading for details. The datasets could then also be made compatible with the Datasets library so that people can load the data in 2 lines of code, there's also the dataset viewer: https://huggingface.co/docs/hub/en/datasets-viewer, etc. (which allows people to explore the data right from the browser)
Uploading models
Regarding the models, we encourage researchers to push each model checkpoint to a separate model repository, so that things like download stats also work. Currently it seems that the VAE and SoloAudio checkpoints are in a single repo: https://huggingface.co/westbrook/SoloAudio/tree/main.
See here for a guide: https://huggingface.co/docs/hub/models-uploading. In case the models are custom PyTorch model, we could probably leverage the PyTorchModelHubMixin class which adds from_pretrained and push_to_hub to each model. Alternatively, one can leverages the hf_hub_download one-liner to download a checkpoint from the hub.
Let me know if you're interested/need any help regarding this!
Cheers,
Niels
ML Engineer @ HF 🤗
Hi @WangHelin1997,
Niels here from the open-source team at Hugging Face. I discovered your work through the daily papers: https://huggingface.co/papers/2409.08425, congrats!. I work together with AK on improving the visibility of researchers' work on the hub.
I see you already made have a model and demo on the 🤗 hub which is great. I've got a couple of suggestions on improving the HF integration:
Uploading datasets
I see the datasets are currently inside the model repo: https://huggingface.co/westbrook/SoloAudio/tree/main. Would be great to make them available as Dataset repos. See https://huggingface.co/docs/datasets/loading for details. The datasets could then also be made compatible with the Datasets library so that people can load the data in 2 lines of code, there's also the dataset viewer: https://huggingface.co/docs/hub/en/datasets-viewer, etc. (which allows people to explore the data right from the browser)
Uploading models
Regarding the models, we encourage researchers to push each model checkpoint to a separate model repository, so that things like download stats also work. Currently it seems that the VAE and SoloAudio checkpoints are in a single repo: https://huggingface.co/westbrook/SoloAudio/tree/main.
See here for a guide: https://huggingface.co/docs/hub/models-uploading. In case the models are custom PyTorch model, we could probably leverage the PyTorchModelHubMixin class which adds
from_pretrainedandpush_to_hubto each model. Alternatively, one can leverages the hf_hub_download one-liner to download a checkpoint from the hub.Let me know if you're interested/need any help regarding this!
Cheers,
Niels
ML Engineer @ HF 🤗