Skip to content

Commit c0ae03b

Browse files
Add files via upload
1 parent a961aea commit c0ae03b

1 file changed

Lines changed: 146 additions & 0 deletions

File tree

test_data/README.md

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
## Tools Directory Usage
2+
3+
The `tools` directory contains utility scripts to help you work with OSL (Open Sports Lab) datasets, particularly for downloading annotated datasets and associated videos from Hugging Face. Below you'll find an explanation and usage instructions.
4+
5+
---
6+
7+
### 1. Download OSL Dataset and Videos from Hugging Face
8+
9+
**Script:** `tools/download_osl_hf.py`
10+
11+
This script automates the download of an OSL-format JSON file (annotation file) and all referenced videos from a Hugging Face dataset repository.
12+
13+
#### **Features:**
14+
15+
* Downloads a specific OSL JSON annotation file.
16+
* Parses the JSON to identify referenced video files and downloads them as well.
17+
* Can perform a “dry run” to show which files would be downloaded and their total size, without actually downloading.
18+
19+
20+
#### ⚠️ Authentication Required for Gated Datasets
21+
Some Hugging Face datasets (including SoccerNetPro localization and classification datasets) are restricted / gated.
22+
23+
To download files from these datasets, you must:
24+
25+
1.Have access to the dataset on Hugging Face
26+
27+
2.Be authenticated locally using your Hugging Face account
28+
29+
#### Login to Hugging Face (Required)
30+
Before running the script, authenticate once on your machine:
31+
```bash
32+
huggingface-cli login
33+
```
34+
<img width="1182" height="710" alt="b6a32f46-9962-49cc-9882-a5dba710d606" src="https://github.com/user-attachments/assets/d848f451-58f6-40c6-96e3-e65cde7b4dc1" />
35+
36+
Follow the instructions to paste your Hugging Face access token.
37+
38+
You can verify that authentication is working with:
39+
40+
```bash
41+
python -c "from huggingface_hub import HfApi; print(HfApi().whoami())"
42+
```
43+
44+
If authentication is missing or access is not granted, the script will fail with a
45+
`GatedRepoError (401)`.
46+
47+
#### **Requirements**
48+
49+
* Python 3.x
50+
* `huggingface_hub` Python package (install with `pip install huggingface_hub`)
51+
52+
#### **Usage**
53+
54+
55+
**Basic Command:**
56+
57+
```bash
58+
python tools/download_osl_hf.py \
59+
--url https://huggingface.co/datasets/<org>/<dataset>/blob/<revision>/<annotations.json> \
60+
--output-dir <output_directory>
61+
```
62+
- The URL should be copied directly from the Hugging Face web interface
63+
(i.e. `blob/... URLs`).
64+
- The script automatically converts it to the correct `resolve/...` format internally.
65+
66+
**Arguments:**
67+
68+
* `--url`: (required) The direct Hugging Face URL of the OSL JSON file (should be in “blob/main/...” form, like you see in the web interface).
69+
* `--output-dir`: (optional) Path to the directory where the dataset and videos should be downloaded. Defaults to `downloaded_data` if not specified.
70+
* `--dry-run`: (optional) If provided, lists all files that would be downloaded and total size, but does not actually download any files.
71+
72+
73+
**Example:**
74+
Classification – svfouls
75+
76+
```bash
77+
python tools/download_osl_hf.py \
78+
--url https://huggingface.co/datasets/OpenSportsLab/soccernetpro-classification-vars/blob/svfouls/annotations_test.json \
79+
--output-dir Test_Data/Classification/svfouls
80+
```
81+
82+
Classification – mvfouls
83+
84+
```bash
85+
python tools/download_osl_hf.py \
86+
--url https://huggingface.co/datasets/OpenSportsLab/soccernetpro-classification-vars/blob/mvfouls/annotations_test.json \
87+
--output-dir Test_Data/Classification/mvfouls
88+
```
89+
90+
Localization – Action Spotting
91+
92+
```bash
93+
python tools/download_osl_hf.py \
94+
--url https://huggingface.co/datasets/OpenSportsLab/soccernetpro-localization-snbas/blob/224p/annotations-test.json \
95+
--output-dir Test_Data/Localization
96+
```
97+
98+
**Dry Run Example:**
99+
Before downloading large video files, run the script in dry-run mode
100+
```bash
101+
python tools/download_osl_hf.py \
102+
--url https://huggingface.co/datasets/OpenSportsLab/soccernetpro-classification-vars/blob/svfouls/annotations_test.json \
103+
--dry-run
104+
```
105+
Dry-run mode will:
106+
- List all video files that would be downloaded
107+
- Show the estimated total storage required
108+
- Report missing files (if any)
109+
- Download nothing
110+
111+
---
112+
113+
**Output Structure:**
114+
Output Structure
115+
After downloading, the output directory will contain:
116+
- The annotation JSON file
117+
- All referenced video files
118+
- The original Hugging Face repository folder structure
119+
120+
121+
Example:
122+
123+
```bash
124+
output_dir/
125+
├── annotations-test.json
126+
└── test/
127+
└── action_0/
128+
├── clip_0.mp4
129+
└── clip_1.mp4
130+
```
131+
132+
133+
### 2. Zip the folder(Optional)
134+
135+
```bash
136+
zip -r DatasetAnnotationTool.zip *
137+
```
138+
139+
---
140+
141+
### **Notes**
142+
143+
* The script automatically converts Hugging Face “blob” URLs to the proper “resolve” format for direct file access.
144+
* After downloading, the output directory will contain the JSON annotation and all video files referenced in it, keeping the original folder structure.
145+
* For datasets with a large number of videos, downloads will be parallelized for efficiency.
146+
* If a video is missing in the repo, it will be reported (especially useful in dry run mode).

0 commit comments

Comments
 (0)