You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when datasets are created, they can be versioned by passing the version argument to load_dataset(...). For example creating outcomes.csv on the command line
The version info is stored in the info and can be accessed e.g. by next(iter(dataset.values())).info.version
This dataset can be uploaded to the hub with dataset.push_to_hub(repo_id = "maomlab/example_dataset"). This will create a dataset on the hub with the following in the README.md, but it doesn't upload the version information:
And then when I download it, the version information is correct.
Motivation
Why adding version information for each config makes sense
The version information is already recorded in the dataset config info data structure and is able to parse it correctly, so it makes sense to sync it with push_to_hub.
Keeping the version info in at the config level is different from version info at the branch level. As the former relates to the version of the specific dataset the config refers to rather than the version of the dataset curation itself.
Feature request
Currently, when datasets are created, they can be versioned by passing the
version
argument toload_dataset(...)
. For example creatingoutcomes.csv
on the command lineand creating it
The version info is stored in the
info
and can be accessed e.g. bynext(iter(dataset.values())).info.version
This dataset can be uploaded to the hub with
dataset.push_to_hub(repo_id = "maomlab/example_dataset")
. This will create a dataset on the hub with the following in theREADME.md
, but it doesn't upload the version information:However, when I download from the hub, the version information is missing:
I can add the version information manually to the hub, by appending it to the end of config section:
And then when I download it, the version information is correct.
Motivation
Why adding version information for each config makes sense
push_to_hub
.A explanation for the current behavior:
In datasets/src/datasets/info.py:159, the
_INCLUDED_INFO_IN_YAML
variable doesn't include"version"
.If my reading of the code is right, adding
"version"
to_INCLUDED_INFO_IN_YAML
, would allow the version information to be uploaded to the hub.Your contribution
Request: add
"version"
to_INCLUDE_INFO_IN_YAML
in datasets/src/datasets/info.py:159The text was updated successfully, but these errors were encountered: