Skip to content

Include version in DatasetInfo YAML so push_to_hub preserves it#8218

Open
adityasingh2400 wants to merge 1 commit into
huggingface:mainfrom
adityasingh2400:fix-include-version-in-dataset-info-yaml
Open

Include version in DatasetInfo YAML so push_to_hub preserves it#8218
adityasingh2400 wants to merge 1 commit into
huggingface:mainfrom
adityasingh2400:fix-include-version-in-dataset-info-yaml

Conversation

@adityasingh2400
Copy link
Copy Markdown

DatasetInfo._INCLUDED_INFO_IN_YAML controls which fields land in the README.md dataset_info block written by push_to_hub, and it currently omits version. Setting version on a Dataset (e.g. via load_dataset(..., version=...)) and pushing it to the Hub silently drops the version from the rendered front matter, so reloading the dataset returns version=None unless the user edits the README by hand. Closes #7378.

The reload side already handles a version string in DatasetInfo.post_init via the Version dataclass, so adding "version" to the included keys is enough to round-trip the value through YAML without any extra coercion. The empty case is preserved because _to_yaml_dict only emits keys whose values are set on the instance.

Added test_dataset_info_to_yaml_dict_preserves_version in tests/test_info.py that asserts the string survives _to_yaml_dict and _from_yaml_dict. tests/test_info.py and tests/test_hub.py pass locally.

DatasetInfo._INCLUDED_INFO_IN_YAML omits "version", so the config-level
version set via load_dataset(..., version=...) is dropped when
push_to_hub writes the README.md front matter. The dataset on the Hub
therefore comes back with version=None even though the local
DatasetInfo had one set, forcing users to add the version block to the
README by hand. Add "version" to the included keys so it round-trips
through the YAML, matching the existing reconstruction logic in
DatasetInfo.__post_init__ that already accepts a version string.

Fixes huggingface#7378.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow pushing config version to hub

1 participant