Each evaluation task expects a dataset root directory configured via an environment variable (set in your .env or shell). The table below summarises all variables; detailed download instructions follow.
| Variable | Task(s) | Section |
|---|---|---|
NAVI_ROOT |
correspondence_navi | NAVI |
SCANNET_ROOT |
correspondence_scannet | ScanNet |
SPAIR_ROOT |
correspondence_spair | SPair-71k |
NYU_ROOT |
depth, snorm | NYU |
ADE20K_ROOT |
segmentation_ade20k | ADE20K |
IMAGENET_ROOT |
classification_imagenet_knn, classification_imagenet_linear | ImageNet |
IMAGENET3D_ROOT |
pose_imagenet3d | ImageNet3D |
TAPVID_DAVIS_ROOT |
tracking_tapvid | TAP-Vid DAVIS |
AP10K_ROOT |
correspondence_ap10k | AP-10K |
SOCO_ROOT |
correspondence_soco | SOCO |
NAVI is a multi-view dataset depicting 36 objects in varied scenes and poses, with high-quality meshes and precise image-object alignment. NAVI has very high resolution images, so we recommend downsampling first to avoid slow data loading.
cd data/
wget http://storage.googleapis.com/gresearch/navi-dataset/navi_v1.tar.gz
tar -xzf navi_v1.tar.gz
# downsample images (writes downsampled_ prefixed copies next to originals)
cd ../data_processing
python resize_navi.py --data-root ../data/navi_v1Set NAVI_ROOT to the navi_v1/ directory.
ScanNet is a large RGB-D video dataset of indoor scenes. We use the SuperGlue test split (1500 image pairs), downloaded from the LoFTR website.
cd data/
# download the tar file provided by LoFTR
gdown --id 1wtl-mNicxGlXZ-UQJxFnKuWPvvssQBwd
tar -xvf scannet_test_1500.tar
rm scannet_test_1500.tar
cd scannet_test_1500
wget https://raw.githubusercontent.com/zju3dv/LoFTR/master/assets/scannet_test_1500/intrinsics.npz
wget https://raw.githubusercontent.com/zju3dv/LoFTR/master/assets/scannet_test_1500/test.npzSet SCANNET_ROOT to the scannet_test_1500/ directory.
SPair-71k consists of image pairs depicting instances of the same class with keypoint annotations and viewpoint attributes.
cd data/
wget http://cvlab.postech.ac.kr/research/SPair-71k/data/SPair-71k.tar.gz
tar -xvf SPair-71k.tar.gz
rm SPair-71k.tar.gzSet SPAIR_ROOT to the SPair-71k/ directory.
The NYU Depth V2 dataset is a standard depth estimation benchmark. We evaluate on the labeled test set using surface normal annotations from Ladicky et al.. For training we use the larger set annotated by Bansal et al. / Qi et al. (GeoNet).
Step 1 — Download GeoNet training data:
Download data1.zip and data2.zip from GeoNet:
cd data/
# Download data1.zip and data2.zip from https://github.com/xjqi/GeoNet
unzip data1.zip
unzip data2.zip
mkdir nyu_geonet
mv data1/* nyu_geonet/
mv data2/* nyu_geonet/
rmdir data1 data2Step 2 — Download test set and surface normal annotations:
cd data/
mkdir nyuv2 && cd nyuv2
wget http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat
wget https://dl.fbaipublicfiles.com/fair_self_supervision_benchmark/nyuv2_surfacenormal_metadata.zip
unzip nyuv2_surfacenormal_metadata.zip
mv surfacenormal_metadata/* .
rm nyuv2_surfacenormal_metadata.zip
rmdir surfacenormal_metadataStep 3 — Pack into a single pkl:
cd ../../data_processing
python create_nyu_pkl.py --root ../data/nyuv2Set NYU_ROOT to the parent directory containing both nyu_geonet/ and nyuv2/.
ADE20K (MIT Scene Parsing Benchmark) is used for linear-probe semantic segmentation evaluation. Download from the official site or via:
cd data/
wget http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip
unzip ADEChallengeData2016.zip
rm ADEChallengeData2016.zipExpected structure:
ADEChallengeData2016/
images/training/
images/validation/
annotations/training/
annotations/validation/
Set ADE20K_ROOT to the ADEChallengeData2016/ directory.
ImageNet (ILSVRC 2012) is used for kNN and linear-probe classification evaluation. Download from the official site (requires registration).
The code loads splits via torchvision.datasets.ImageFolder from {IMAGENET_ROOT}/{split}/. The default split names in the task configs are ordered_train and ordered; override via CLI if your layout differs (e.g. task.train_split=train task.val_split=val).
Set IMAGENET_ROOT to the directory containing the split subdirectories.
ImageNet3D provides 3D pose annotations for ImageNet images. Follow the download instructions in the official repository.
Expected structure:
ImageNet3D/
JPEGImages/{class_name}/*.JPEG
Segmentation/{class_name}/*.png
PairAnnotation/{split}/*.json
ImageAnnotation/{class_name}/*.json
Set IMAGENET3D_ROOT to the ImageNet3D/ directory.
TAP-Vid is a benchmark for tracking any point through video. We use the DAVIS subset. Download from the TAP-Vid data page:
cd data/
wget https://storage.googleapis.com/dm-tapnet/tapvid_davis.zip
unzip tapvid_davis.zip
rm tapvid_davis.zipSet TAPVID_DAVIS_ROOT to the extracted .pkl file path (not a directory) — the code opens it directly as a pickle file.
AP-10K is an animal pose estimation dataset with keypoint annotations across species. Follow the download instructions in the official repository.
Expected structure:
AP-10K/
JPEGImages/{family}/{species}/*.jpg
ImageAnnotation/{family}/{species}/*.json
PairAnnotation/{split}/*.json
Set AP10K_ROOT to the AP-10K/ directory.
SOCO is a semantic object correspondence dataset with 100 categories (40 images each), per-view keypoint annotations, and within- and cross-category pair annotations (20k intra-category, 20k cross-category, plus a predefined 10k / 10k intra-category train/test split).
The dataset is distributed on the Hugging Face Hub as three zip archives (Images.zip, KeypointAnnotations.zip, PairAnnotations.zip) plus an unpacked Metadata/ folder. Download the repository and unzip the archives in place:
huggingface-cli download GenIntelLab/SOCO --repo-type dataset --local-dir data/SOCOv1
cd data/SOCOv1 && for z in *.zip; do unzip -q "$z" && rm "$z"; doneExpected structure after extraction:
SOCOv1/
Images/{category}/*.JPEG
KeypointAnnotations/{category}/*.json
PairAnnotations/
intra/{category}/*.json # all within-category pairs (20k)
cross/{category}/*.json # cross-category pairs (20k)
trainsplits/
train/{category}/*.json # training split (10k)
test/{category}/*.json # test split (10k)
Metadata/
keypoint_taxonomy.json
filename_mapping.json
Set SOCO_ROOT to the downloaded SOCOv1/ directory.
The pair_subdir config parameter selects which pair annotations to use (default: PairAnnotations/intra).
For linear probe training, train_pair_subdir and test_pair_subdir select the predefined splits.