I have a DatasetDict containing 10 splits (‘fold_0’ to ‘fold_9’). All the Dataset objects included in the DatasetDict contain 2 features: “label”& “text”. Here’s a small overview:
print(my_dataset_dict)>>> DatasetDict({ fold_0: Dataset({ features: ['label', 'text'], num_rows: 85087 }) fold_1: Dataset({ features: ['label', 'text'], num_rows: 85076 }) .... fold_9: Dataset({ features: ['label', 'text'], num_rows: 85159 }) })
For each Dataset, the “label” column was encoded with ClassLabel, and the “text” column is just a bunch of sentences:
print(my_dataset_dict['fold_0'].features)>>> {'label': ClassLabel(names=['MA211', 'MA221', ..., 'V39'], id=None), 'text': Value(dtype='string', id=None)}
So far so good, it’s exactly what I’m expecting.However, if I push it to the Hub and then load it again (in another script or in the same one, it doesn’t matter), then the labels disappear:
huggingface_hub.delete_repo(repo_id=dataset_path, repo_type='dataset', missing_ok=True) # Just to be sure the previous DatasetDict is removed firstmy_dataset_dict.push_to_hub(dataset_path) # No problem, I see it on the Hub after that (and the real labels appear)test_dataset_dict = datasets.load_dataset(dataset_path) # Reloading it from the same pathprint(test_dataset_dict['fold_0'].features)>>> {'label': Value(dtype='string', id=None),'text': Value(dtype='string', id=None)}
As you can see, I don’t have the labels anymore. It’s a problem for me because I need to “cure the data” and create the dataset in a specific notebook, and then load back the data and perform some ML tasks on another notebook, and I’m losing the real labels.I tried loading using test_dataset_dict = datasets.load_dataset(dataset_path, download_mode=datasets.downloadMode.FORCE_REDOWNLOAD) but it doesn’t change anything. The text and the labels (just the integers) are loaded, but I don’t have the names of the labels. The names of the labels are pushed to the Hub, because I can see them on the viewer under the label column (I see the integer and the associated code right next to it):
Thanks for your help!