FEATURE: Add support for non-TorchVision datasets#77
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #77 +/- ##
==========================================
- Coverage 71.81% 71.66% -0.15%
==========================================
Files 86 86
Lines 14903 14941 +38
==========================================
+ Hits 10702 10708 +6
- Misses 4201 4233 +32
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 3 files with indirect coverage changes Continue to review full report in Codecov by Harness.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
As discussed, can you have a new key as a dataset link with the url to download, specifically to CUB200? Then, the cub folder and the following code in the new files are not needed.
There was a problem hiding this comment.
This is redundant, we store our model and data in the /data folder.
There was a problem hiding this comment.
We don't need this, as we can directly download the data and model to the folder. So you only need to settle the path to /data
|
As discussed, the good solution is to amend with act/front_end/torchvision_loader/data_model_loader.py and act/front_end/torchvision_loader/data_model_mapping.py only. |
|
@guanqin-123 I'll address the comments to do with Just a download link is not enough. The CUB-200-2011 dataset is a
This is not a I got my code from here, which needs both the decompressed metadata = {'id': img_id, 'img_path': img_path, 'class_label': i,
'attribute_label': attribute_labels_all[img_id], 'attribute_certainty': attribute_certainties_all[img_id],
'uncertain_attribute_label': attribute_uncertain_labels_all[img_id]}The authors of that project have uploaded pre-processed pickled files, but it's on Codalab, which I would need to install a command line tool for to download programmatically. I chose to implement Other solutions include:
I definitely need some custom |
…load config Non-TorchVision datasets (e.g. CUB-200-2011) are declared with a 'download' key (url/md5/image_root + optional index_file/split_file). The loader downloads and extracts the archive into data/torchvision/<name>/raw, wraps image_root in ImageFolder, and applies the dataset's official deterministic train/test split via Subset. The custom/cub package is removed: ImageFolder covers the CUB layout, the official split comes from images.txt + train_test_split.txt, and the ConceptBottleneck-derived attribute/pickle code was unused by ACT's pipeline. Verified on real CUB-200-2011: 5994 train / 5794 test.
Data acquisition.

This PR adds support for the CUB-200-2011 dataset, and creates a system that can support other non-TorchVision datasets which inherit
torch.utils.data.Dataset.Instructions to add a custom dataset type are in
act/front_end/torchvision_loader/custom.There is some dead code that I can get rid of in
act/front_end/torchvision_loader/custom/cub, to increase coverage.