Include version in DatasetInfo YAML so push_to_hub preserves it#8218
Open
adityasingh2400 wants to merge 1 commit into
Open
Include version in DatasetInfo YAML so push_to_hub preserves it#8218adityasingh2400 wants to merge 1 commit into
adityasingh2400 wants to merge 1 commit into
Conversation
DatasetInfo._INCLUDED_INFO_IN_YAML omits "version", so the config-level version set via load_dataset(..., version=...) is dropped when push_to_hub writes the README.md front matter. The dataset on the Hub therefore comes back with version=None even though the local DatasetInfo had one set, forcing users to add the version block to the README by hand. Add "version" to the included keys so it round-trips through the YAML, matching the existing reconstruction logic in DatasetInfo.__post_init__ that already accepts a version string. Fixes huggingface#7378.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DatasetInfo._INCLUDED_INFO_IN_YAML controls which fields land in the README.md dataset_info block written by push_to_hub, and it currently omits version. Setting version on a Dataset (e.g. via load_dataset(..., version=...)) and pushing it to the Hub silently drops the version from the rendered front matter, so reloading the dataset returns version=None unless the user edits the README by hand. Closes #7378.
The reload side already handles a version string in DatasetInfo.post_init via the Version dataclass, so adding "version" to the included keys is enough to round-trip the value through YAML without any extra coercion. The empty case is preserved because _to_yaml_dict only emits keys whose values are set on the instance.
Added test_dataset_info_to_yaml_dict_preserves_version in tests/test_info.py that asserts the string survives _to_yaml_dict and _from_yaml_dict. tests/test_info.py and tests/test_hub.py pass locally.