|
| 1 | +## Tools Directory Usage |
| 2 | + |
| 3 | +The `tools` directory contains utility scripts to help you work with OSL (Open Sports Lab) datasets, particularly for downloading annotated datasets and associated videos from Hugging Face. Below you'll find an explanation and usage instructions. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +### 1. Download OSL Dataset and Videos from Hugging Face |
| 8 | + |
| 9 | +**Script:** `tools/download_osl_hf.py` |
| 10 | + |
| 11 | +This script automates the download of an OSL-format JSON file (annotation file) and all referenced videos from a Hugging Face dataset repository. |
| 12 | + |
| 13 | +#### **Features:** |
| 14 | + |
| 15 | +* Downloads a specific OSL JSON annotation file. |
| 16 | +* Parses the JSON to identify referenced video files and downloads them as well. |
| 17 | +* Can perform a “dry run” to show which files would be downloaded and their total size, without actually downloading. |
| 18 | + |
| 19 | + |
| 20 | +#### ⚠️ Authentication Required for Gated Datasets |
| 21 | +Some Hugging Face datasets (including SoccerNetPro localization and classification datasets) are restricted / gated. |
| 22 | + |
| 23 | +To download files from these datasets, you must: |
| 24 | + |
| 25 | +1.Have access to the dataset on Hugging Face |
| 26 | + |
| 27 | +2.Be authenticated locally using your Hugging Face account |
| 28 | + |
| 29 | +#### Login to Hugging Face (Required) |
| 30 | +Before running the script, authenticate once on your machine: |
| 31 | +```bash |
| 32 | +huggingface-cli login |
| 33 | +``` |
| 34 | +<img width="1182" height="710" alt="b6a32f46-9962-49cc-9882-a5dba710d606" src="https://github.com/user-attachments/assets/d848f451-58f6-40c6-96e3-e65cde7b4dc1" /> |
| 35 | + |
| 36 | +Follow the instructions to paste your Hugging Face access token. |
| 37 | + |
| 38 | +You can verify that authentication is working with: |
| 39 | + |
| 40 | +```bash |
| 41 | +python -c "from huggingface_hub import HfApi; print(HfApi().whoami())" |
| 42 | +``` |
| 43 | + |
| 44 | +If authentication is missing or access is not granted, the script will fail with a |
| 45 | +`GatedRepoError (401)`. |
| 46 | + |
| 47 | +#### **Requirements** |
| 48 | + |
| 49 | +* Python 3.x |
| 50 | +* `huggingface_hub` Python package (install with `pip install huggingface_hub`) |
| 51 | + |
| 52 | +#### **Usage** |
| 53 | + |
| 54 | + |
| 55 | +**Basic Command:** |
| 56 | + |
| 57 | +```bash |
| 58 | +python tools/download_osl_hf.py \ |
| 59 | + --url https://huggingface.co/datasets/<org>/<dataset>/blob/<revision>/<annotations.json> \ |
| 60 | + --output-dir <output_directory> |
| 61 | +``` |
| 62 | +- The URL should be copied directly from the Hugging Face web interface |
| 63 | +(i.e. `blob/... URLs`). |
| 64 | +- The script automatically converts it to the correct `resolve/...` format internally. |
| 65 | + |
| 66 | +**Arguments:** |
| 67 | + |
| 68 | +* `--url`: (required) The direct Hugging Face URL of the OSL JSON file (should be in “blob/main/...” form, like you see in the web interface). |
| 69 | +* `--output-dir`: (optional) Path to the directory where the dataset and videos should be downloaded. Defaults to `downloaded_data` if not specified. |
| 70 | +* `--dry-run`: (optional) If provided, lists all files that would be downloaded and total size, but does not actually download any files. |
| 71 | + |
| 72 | + |
| 73 | +**Example:** |
| 74 | +Classification – svfouls |
| 75 | + |
| 76 | +```bash |
| 77 | +python tools/download_osl_hf.py \ |
| 78 | + --url https://huggingface.co/datasets/OpenSportsLab/soccernetpro-classification-vars/blob/svfouls/annotations_test.json \ |
| 79 | + --output-dir Test_Data/Classification/svfouls |
| 80 | +``` |
| 81 | + |
| 82 | +Classification – mvfouls |
| 83 | + |
| 84 | +```bash |
| 85 | +python tools/download_osl_hf.py \ |
| 86 | + --url https://huggingface.co/datasets/OpenSportsLab/soccernetpro-classification-vars/blob/mvfouls/annotations_test.json \ |
| 87 | + --output-dir Test_Data/Classification/mvfouls |
| 88 | +``` |
| 89 | + |
| 90 | +Localization – Action Spotting |
| 91 | + |
| 92 | +```bash |
| 93 | +python tools/download_osl_hf.py \ |
| 94 | + --url https://huggingface.co/datasets/OpenSportsLab/soccernetpro-localization-snbas/blob/224p/annotations-test.json \ |
| 95 | + --output-dir Test_Data/Localization |
| 96 | +``` |
| 97 | + |
| 98 | +**Dry Run Example:** |
| 99 | +Before downloading large video files, run the script in dry-run mode |
| 100 | +```bash |
| 101 | +python tools/download_osl_hf.py \ |
| 102 | + --url https://huggingface.co/datasets/OpenSportsLab/soccernetpro-classification-vars/blob/svfouls/annotations_test.json \ |
| 103 | + --dry-run |
| 104 | +``` |
| 105 | +Dry-run mode will: |
| 106 | +- List all video files that would be downloaded |
| 107 | +- Show the estimated total storage required |
| 108 | +- Report missing files (if any) |
| 109 | +- Download nothing |
| 110 | + |
| 111 | +--- |
| 112 | + |
| 113 | +**Output Structure:** |
| 114 | +Output Structure |
| 115 | +After downloading, the output directory will contain: |
| 116 | +- The annotation JSON file |
| 117 | +- All referenced video files |
| 118 | +- The original Hugging Face repository folder structure |
| 119 | + |
| 120 | + |
| 121 | +Example: |
| 122 | + |
| 123 | +```bash |
| 124 | +output_dir/ |
| 125 | +├── annotations-test.json |
| 126 | +└── test/ |
| 127 | + └── action_0/ |
| 128 | + ├── clip_0.mp4 |
| 129 | + └── clip_1.mp4 |
| 130 | +``` |
| 131 | + |
| 132 | + |
| 133 | +### 2. Zip the folder(Optional) |
| 134 | + |
| 135 | +```bash |
| 136 | +zip -r DatasetAnnotationTool.zip * |
| 137 | +``` |
| 138 | + |
| 139 | +--- |
| 140 | + |
| 141 | +### **Notes** |
| 142 | + |
| 143 | +* The script automatically converts Hugging Face “blob” URLs to the proper “resolve” format for direct file access. |
| 144 | +* After downloading, the output directory will contain the JSON annotation and all video files referenced in it, keeping the original folder structure. |
| 145 | +* For datasets with a large number of videos, downloads will be parallelized for efficiency. |
| 146 | +* If a video is missing in the repo, it will be reported (especially useful in dry run mode). |
0 commit comments