FEATURE: Add support for non-TorchVision datasets by hanyuone · Pull Request #77 · SVF-tools/ACT

hanyuone · 2026-06-09T06:33:51Z

This PR adds support for the CUB-200-2011 dataset, and creates a system that can support other non-TorchVision datasets which inherit torch.utils.data.Dataset.

Instructions to add a custom dataset type are in act/front_end/torchvision_loader/custom.

There is some dead code that I can get rid of in act/front_end/torchvision_loader/custom/cub, to increase coverage.

codecov · 2026-06-09T06:40:23Z

Codecov Report

❌ Patch coverage is 33.89831% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.66%. Comparing base (e831a0f) to head (4f53a1c).

Files with missing lines	Patch %	Lines
.../front_end/torchvision_loader/data_model_loader.py	33.89%	39 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #77      +/-   ##
==========================================
- Coverage   71.81%   71.66%   -0.15%     
==========================================
  Files          86       86              
  Lines       14903    14941      +38     
==========================================
+ Hits        10702    10708       +6     
- Misses       4201     4233      +32

Flag	Coverage Δ
bab	`46.55% <3.38%> (-0.11%)`	⬇️
backend-float32	`48.96% <3.38%> (-0.12%)`	⬇️
backend-float64	`49.03% <3.38%> (-0.09%)`	⬇️
frontend	`38.43% <33.89%> (-0.07%)`	⬇️
pipeline-fuzz	`19.12% <3.38%> (-0.04%)`	⬇️
pipeline-verify	`38.14% <30.50%> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...front_end/torchvision_loader/data_model_mapping.py	`80.00% <ø> (ø)`
.../front_end/torchvision_loader/data_model_loader.py	`56.26% <33.89%> (-2.96%)`	⬇️

... and 3 files with indirect coverage changes

Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e831a0f...4f53a1c. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

guanqin-123 · 2026-06-09T08:19:20Z

As discussed, can you have a new key as a dataset link with the url to download, specifically to CUB200? Then, the cub folder and the following code in the new files are not needed.

guanqin-123 · 2026-06-09T08:21:08Z

This is redundant, we store our model and data in the /data folder.

guanqin-123 · 2026-06-09T08:23:50Z

We don't need this, as we can directly download the data and model to the folder. So you only need to settle the path to /data

guanqin-123 · 2026-06-09T08:25:49Z

As discussed, the good solution is to amend with act/front_end/torchvision_loader/data_model_loader.py and act/front_end/torchvision_loader/data_model_mapping.py only.

hanyuone · 2026-06-10T01:14:25Z

@guanqin-123 I'll address the comments to do with custom/cub files here.

Just a download link is not enough. The CUB-200-2011 dataset is a .tar.gz file of images and some text files (structure below):

attributes/image_attribute_labels.txt specifies what attributes an image has, image_class_labels.txt specifies what class each image belongs to, train_test_split.txt says what images are part of the training/testing set etc.

This is not a torch.utils.data.Dataset, which TorchVisionSpecCreator works with. So at the very least I need a custom Dataset type that reads these images and text files, and outputs those images with the proper classes and attributes.

I got my code from here, which needs both the decompressed CUB-200-2011.tar.gz and pickled files generated from data_processing.py, which are metadata maps of the format

            metadata = {'id': img_id, 'img_path': img_path, 'class_label': i,
                      'attribute_label': attribute_labels_all[img_id], 'attribute_certainty': attribute_certainties_all[img_id],
                      'uncertain_attribute_label': attribute_uncertain_labels_all[img_id]}

The authors of that project have uploaded pre-processed pickled files, but it's on Codalab, which I would need to install a command line tool for to download programmatically.

I chose to implement CUBDataset the following way because the dataset here already can do what I need, I change as little of the rest of ACT's frontend as possible, and I don't know if there is a place we can upload files that to download programmatically, like Caltech Data, which hosts CUB-200-2011.

Other solutions include:

Download the Codalab files and host them somewhere that I can directly download from
Rewrite CUBDataset to not need the Codalab files (I think this is too much work for this)

I definitely need some custom Dataset class at the very least though, for the reasons above. What do you think?

…load config Non-TorchVision datasets (e.g. CUB-200-2011) are declared with a 'download' key (url/md5/image_root + optional index_file/split_file). The loader downloads and extracts the archive into data/torchvision/<name>/raw, wraps image_root in ImageFolder, and applies the dataset's official deterministic train/test split via Subset. The custom/cub package is removed: ImageFolder covers the CUB layout, the official split comes from images.txt + train_test_split.txt, and the ConceptBottleneck-derived attribute/pickle code was unused by ACT's pipeline. Verified on real CUB-200-2011: 5994 train / 5794 test.

Data acquisition.

hanyuone added 2 commits June 9, 2026 16:26

feat: add support for non-TorchVision datasets

55dbd83

fix: use get_dataset_info over MAPPING directly

277f8a7

guanqin-123 reviewed Jun 9, 2026

View reviewed changes

Comment thread ipynb/cub_load.ipynb Outdated

guanqin-123 reviewed Jun 9, 2026

View reviewed changes

hanyuone and others added 3 commits June 10, 2026 11:22

chore: remove cub_load.ipynb

bd0c0b8

Merge pull request #1 from guanqin-123/data

4f53a1c

Data acquisition.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEATURE: Add support for non-TorchVision datasets#77

FEATURE: Add support for non-TorchVision datasets#77
hanyuone wants to merge 5 commits into
SVF-tools:mainfrom
hanyuone:cub-200-2011

hanyuone commented Jun 9, 2026

Uh oh!

codecov Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

guanqin-123 Jun 9, 2026 •

edited

Loading

Uh oh!

guanqin-123 Jun 9, 2026 •

edited

Loading

Uh oh!

guanqin-123 Jun 9, 2026

Uh oh!

guanqin-123 commented Jun 9, 2026

Uh oh!

hanyuone commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hanyuone commented Jun 9, 2026

Uh oh!

codecov Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

guanqin-123 Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guanqin-123 Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guanqin-123 Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

guanqin-123 commented Jun 9, 2026

Uh oh!

hanyuone commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 9, 2026 •

edited

Loading

guanqin-123 Jun 9, 2026 •

edited

Loading

guanqin-123 Jun 9, 2026 •

edited

Loading

hanyuone commented Jun 10, 2026 •

edited

Loading