Sen2-100k dataset size issue

It seems like all of the potential methods to download the dataset are about 6k samples shy.  Using huggingface's load_dataset() option, downloading tar.xz files manually, gif lfs, etc. all run into the same issue as of this date.

The downloaded dataset ends up containing ~94k samples (94164 samples per my most recent attempt at this), which makes attempts to reproduce the work or leverage the excellent dataset/dataloader work done already quite challenging.

If I'm eyeballing it, it looks like the data_10.tar.xz file in the data is the most likely culprit, as the other .tar files over around ~7.8 GB in size, and data_10.tar.xz is 3.25 GB.

It's certainly possible I'm missing something, but I haven't been able to figure out an effective way around this issue.  Any assistance in the matter would be appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sen2-100k dataset size issue #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sen2-100k dataset size issue #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions