diff --git a/README.md b/README.md
index 9c084cb..a2da831 100644
--- a/README.md
+++ b/README.md
@@ -22,6 +22,7 @@ OpenSportsLib is designed for **researchers, ML engineers, and sports analytics
 ## Quick links
 
 - **Documentation:** https://opensportslab.github.io/opensportslib/
+- **OSL JSON format:** https://opensportslab.github.io/opensportslib/data/osl-json-format/
 - **PyPI:** https://pypi.org/project/opensportslib/
 - **Issues:** https://github.com/OpenSportsLab/opensportslib/issues
 
@@ -81,7 +82,82 @@ Use it as the main entry point to find:
 See the [Model Zoo](docs/model-zoo.md) for available pretrained models,
 reported scores, datasets, and loading snippets.
 
---
+---
+
+## Dataset format
+
+OpenSportsLib annotation files use the **OSL JSON v2.0** format. A dataset JSON
+contains top-level metadata, a shared `labels` schema, and a `data` array where
+each sample points to one or more inputs.
+
+Minimal classification sample:
+
+```json
+{
+  "labels": {
+    "action": {
+      "type": "single_label",
+      "labels": ["pass", "shot"]
+    }
+  },
+  "data": [
+    {
+      "id": "clip_0001",
+      "inputs": [
+        {
+          "type": "video",
+          "path": "clips/clip_0001.mp4",
+          "fps": 25.0
+        }
+      ],
+      "labels": {
+        "action": {
+          "label": "shot"
+        }
+      }
+    }
+  ]
+}
+```
+
+Minimal localization sample:
+
+```json
+{
+  "labels": {
+    "action": {
+      "type": "single_label",
+      "labels": ["pass", "shot"]
+    }
+  },
+  "data": [
+    {
+      "id": "game_0001",
+      "inputs": [
+        {
+          "type": "video",
+          "path": "games/game_0001.mp4",
+          "fps": 25.0
+        }
+      ],
+      "events": [
+        {
+          "head": "action",
+          "label": "pass",
+          "position_ms": 1240
+        }
+      ]
+    }
+  ]
+}
+```
+
+Relative paths in `inputs[].path` are resolved from the split media root in the
+YAML config, for example `DATA.train.video_path`. See the full
+[OSL JSON format guide](docs/data/osl-json-format.md) for field definitions,
+multi-modal examples, prediction payloads, and conversion notes.
+
+---
 
 ## Quickstart
 
@@ -188,8 +264,8 @@ from opensportslib.tools import (
 ### Scripts
 
 ```bash
-python tools/download_osl_hf.py --repo-id <org/repo> --revision main --split test --format parquet --output-dir downloaded_data
-python tools/upload_osl_hf.py --repo-id <org/repo> --json-path <local_dataset.json> --split test --revision main
+python tools/download/download_osl_hf.py --repo-id <org/repo> --revision main --split test --format parquet --output-dir downloaded_data
+python tools/download/upload_osl_hf.py --repo-id <org/repo> --json-path <local_dataset.json> --split test --revision main
 ```
 
 Downloads are placed under `<output-dir>/<revision>/<split>`.
@@ -206,9 +282,13 @@ Predict when key events happen in long untrimmed sports videos.
 
 ### Action Retrieval
 Search and retrieve relevant clips or moments from a collection of sports videos.
+This is part of the roadmap and OSL data model, not a first-class OpenSportsLib
+training workflow yet.
 
 ### Action Description / Captioning
 Generate text descriptions for sports events and temporal segments.
+This is part of the roadmap and OSL data model, not a first-class OpenSportsLib
+training workflow yet.
 
 ---
 
@@ -228,6 +308,7 @@ Generate text descriptions for sports events and temporal segments.
 Use the README for the fast start, then go deeper through:
 
 - Full documentation: https://opensportslab.github.io/opensportslib/
+- OSL JSON format: [docs/data/osl-json-format.md](docs/data/osl-json-format.md)
 - High-level API guide: [opensportslib/apis/README.md](opensportslib/apis/README.md)
 - Configuration guide: https://opensportslab.github.io/opensportslib/tni/config-guide/
 - Example configs: [examples/configs/](examples/configs/)
diff --git a/docs/api/api.md b/docs/api/api.md
index d2a7957..8a61751 100644
--- a/docs/api/api.md
+++ b/docs/api/api.md
@@ -38,6 +38,32 @@ High-level entry points for training and inference.
 - **`localization.py`**  
   API for temporal action spotting tasks.
 
+#### Public task wrapper contract
+
+Use the high-level wrappers from `opensportslib.apis`:
+
+```python
+from opensportslib.apis import ClassificationModel, LocalizationModel
+```
+
+Both wrappers inherit the shared `BaseTaskModel` contract:
+
+| Method | Purpose | Return value |
+| --- | --- | --- |
+| `load_weights(weights=...)` | Load a local checkpoint or Hugging Face model ID. | `None` |
+| `train(train_set=..., valid_set=...)` | Train on OSL JSON split files. | Best checkpoint path or `None` |
+| `infer(test_set=...)` | Run prediction on an OSL JSON split file. | In-memory OSL JSON-style prediction dict |
+| `evaluate(test_set=...)` | Compute task metrics against ground truth. | Metrics dict |
+| `evaluate(test_set=..., predictions=...)` | Evaluate an existing prediction dict or prediction file. | Metrics dict |
+| `save_predictions(output_path=..., predictions=...)` | Persist a prediction dict returned by `infer()`. | Saved file path |
+
+`infer()` is prediction-focused and returns a payload to the caller. Use
+`save_predictions(...)` when a workflow needs an explicit prediction file. Do
+not rely on task-specific trainer artifacts as the public persistence API.
+
+Annotation and prediction payloads follow the OSL JSON data model. See
+[OSL JSON Format](../data/osl-json-format.md) for the user-facing schema.
+
 ---
 
 ### `core/`
@@ -194,4 +220,4 @@ This is where you can modify:
 
 ### High-Level Workflow
 
-YAML Config -> APIs (apis/) -> Datasets (datasets/) -> Models (models/) -> Trainer (core/trainer/) -> Metrics (metrics/)
\ No newline at end of file
+YAML Config -> APIs (apis/) -> Datasets (datasets/) -> Models (models/) -> Trainer (core/trainer/) -> Metrics (metrics/)
diff --git a/docs/contributing.md b/docs/contributing.md
index e079654..b3bcc3f 100644
--- a/docs/contributing.md
+++ b/docs/contributing.md
@@ -1 +1,102 @@
---8<-- "CONTRIBUTING.md"
\ No newline at end of file
+# CONTRIBUTING.md
+This guide outlines the workflow and standards for developers looking to extend or maintain the OpenSportsLib library.
+
+## AI Agent Contributions
+For AI-agent driven development, follow `AGENTS.md` in the repository root.
+
+## 1. Development Environment Setup
+To begin contributing, set up a local development environment in "editable" mode so your changes are immediately reflected in the package.
+
+#### Step 1: Clone the Repository
+```bash
+git clone https://github.com/OpenSportsLab/opensportslib.git 
+cd opensportslib
+```
+#### Step 2: Create a Virtual Environment
+Use Conda to manage dependencies and ensure Python 3.12 compatibility.
+```bash
+conda create -n osl python=3.12 pip
+conda activate osl
+```
+#### Step 3: Install in Editable Mode
+Install the base package or include optional dependencies for specific tasks like localization:
+```bash
+# Install core package in editable mode
+pip install -e .
+```
+
+#### Step 4: Setup Environment (PyTorch, CUDA aware & Optional Dependencies)
+```bash
+# Install PyTorch (CPU/GPU auto-detected)
+opensportslib setup
+
+# Optional: install PyTorch Geometric support
+opensportslib setup --pyg
+
+# Optional: install for DALI support
+opensportslib setup --dali
+```
+
+## 2. Branching and Merging - Daily workflow for developers
+
+#### Branches
+*main* → stable, production-ready
+*dev* → active development integration branch
+*dev-<name>* → developer personal branch
+*feature-<name>* → new features
+*fix-<name>* → bug fixes
+
+#### Rules
+- ❌ Never push directly to `main`
+- ❌ Never commit directly to `dev`
+- ✅ Always create a feature branch from `dev`
+- ✅ Always use Pull Requests
+- ✅ PRs must target `dev`, NOT `main`
+
+### 1. Sync Repo
+Verify your current branch is `dev` and pull the latest changes before starting work.
+```bash
+git checkout dev
+git pull origin dev
+```
+
+### 2. Create Feature Branch
+Create a new branch from the `dev` source using descriptive naming conventions.
+```bash
+git checkout -b feature-<feature_name>
+```
+Naming Examples:
+- *feature-model*
+- *feature-new-dataset*
+
+### 3. Work Locally
+Commit your work often using the following commit style guidelines:
+
+- *feat:* New feature
+- *fix:* Bug fix
+- *refactor:* Code cleanup
+- *docs:* Documentation update
+
+Example commit:
+```bash
+git add . 
+or 
+git add -u
+
+git commit -m "feat: add model registry"
+```
+
+### 4. Push Branch (just once)
+Push your feature branch to the remote repository.
+```bash
+git push origin feature/your-feature-name
+```
+
+### 5. Open Pull Request (PR) → dev
+Raise a Pull Request (PR) to merge your branch back into the `dev` branch.
+
+✅ PR Checklist:
+- [ ] Tests Pass: All existing logic remains functional.
+- [ ] Runs on GPU: Code is compatible with CUDA environments.
+- [ ] Config Works: YAML configurations resolve correctly.
+- [ ] Docs Updated: Relevant documentation reflects your changes.
diff --git a/docs/data/osl-json-format.md b/docs/data/osl-json-format.md
new file mode 100644
index 0000000..56b5e18
--- /dev/null
+++ b/docs/data/osl-json-format.md
@@ -0,0 +1,541 @@
+# OSL JSON Format
+
+OSL JSON is the canonical annotation format used across OpenSportsLab tools.
+OpenSportsLib uses it for dataset manifests, ground-truth annotations, and
+prediction payloads returned by the high-level APIs.
+
+An OSL JSON file is a single JSON object with project metadata, a shared label
+schema, and a `data` array of samples. Each sample points to one or more input
+files and can carry task-specific annotations.
+
+The current OpenSportsLib implementation supports classification and
+localization workflows. The format also reserves payloads for description,
+dense description, and question/answer tasks so datasets can stay compatible
+with the broader OpenSportsLab ecosystem.
+
+## Minimal Structure
+
+The smallest useful file is a JSON object with a `data` list. For training and
+evaluation, include a root `labels` schema and task-specific sample payloads.
+
+```json
+{
+  "version": "2.0",
+  "date": "2026-05-19",
+  "dataset_name": "soccer-demo",
+  "description": "Example OSL dataset.",
+  "modalities": ["video"],
+  "metadata": {
+    "sport": "soccer",
+    "split": "train"
+  },
+  "labels": {
+    "action": {
+      "type": "single_label",
+      "labels": ["pass", "shot", "foul"]
+    }
+  },
+  "data": []
+}
+```
+
+## Top-Level Fields
+
+| Field | Type | Required | Notes |
+| --- | --- | --- | --- |
+| `version` | string | Recommended | Current canonical version is `"2.0"`. |
+| `date` | string | Recommended | ISO date such as `"2026-05-19"`. |
+| `dataset_name` | string | Recommended | Human-readable dataset or split name. |
+| `description` | string | Optional | Free-text dataset description. |
+| `modalities` | array[string] | Recommended | Input types present in `data[].inputs[]`, such as `["video"]`. |
+| `metadata` | object | Optional | Dataset-level custom metadata. |
+| `labels` | object | Required for supervised tasks | Shared label schema by annotation head. |
+| `data` | array[object] | Required | Sample list. Must be a list. |
+
+Unknown top-level keys are preserved by conversion tools where possible. Keep
+custom dataset metadata under `metadata` unless another key is part of a
+documented workflow.
+
+## Label Schema
+
+The root `labels` object defines annotation heads. Each head name is a key, and
+each definition should include:
+
+| Field | Type | Notes |
+| --- | --- | --- |
+| `type` | string | Use `single_label` for one class per sample/event, or `multi_label` for several labels. |
+| `labels` | array[string] | Allowed class names for this head. |
+
+```json
+{
+  "labels": {
+    "action": {
+      "type": "single_label",
+      "labels": ["pass", "shot", "foul"]
+    },
+    "attributes": {
+      "type": "multi_label",
+      "labels": ["left_foot", "header", "set_piece"]
+    }
+  }
+}
+```
+
+OpenSportsLib classification currently reads the `action` head by default:
+`data[].labels.action.label`. Localization event heads should also point to the
+same root schema, for example `data[].events[].head == "action"`.
+
+## Sample Objects
+
+Each entry in `data` is one sample.
+
+| Field | Type | Notes |
+| --- | --- | --- |
+| `id` | string | Stable sample ID. Required for reliable evaluation and prediction matching. |
+| `inputs` | array[object] | Media, feature, or tracking files for the sample. |
+| `metadata` | object | Optional sample-level metadata such as match, game, clip, or timing fields. |
+| `labels` | object | Classification annotations keyed by label head. |
+| `events` | array[object] | Timestamped localization events. |
+| `captions` | array[object] | Clip-level description captions. |
+| `dense_captions` | array[object] | Timestamped dense descriptions. |
+| `answers` | array[object] | Grouped question/answer annotations. |
+
+Unknown sample keys are preserved by conversion tools where possible.
+
+## Input Objects
+
+Every sample should include `inputs`, even if it has only one input file.
+
+```json
+{
+  "inputs": [
+    {
+      "type": "video",
+      "path": "clips/clip_0001.mp4",
+      "fps": 25.0
+    }
+  ]
+}
+```
+
+Supported input types used by current OpenSportsLib workflows:
+
+| Type | Typical path | Notes |
+| --- | --- | --- |
+| `video` | `clips/clip_0001.mp4` | Raw video clip or full game video. |
+| `frames_npy` | `frames/clip_0001.npy` | NumPy frame array. The legacy alias `frame_npy` is normalized by annotation tooling. |
+| `tracking_parquet` | `tracking/clip_0001.parquet` | Parquet tracking data. |
+
+`fps` is recommended for video and frame-array inputs. Tracking inputs can also
+include `fps` as a fallback when timestamps are not available.
+
+### Relative Path Resolution
+
+OpenSportsLib stores input paths as relative paths inside JSON whenever
+possible.
+
+- Classification and localization training/inference resolve `inputs[].path`
+  from the configured split media root, usually `DATA.<split>.video_path`.
+- Feature-based localization resolves `inputs[0].path` from the configured
+  feature directory.
+- Conversion tools resolve `inputs[].path` from the `media_root` argument passed
+  to `convert_json_to_parquet(...)` or the CLI wrapper.
+- Hugging Face upload/download tools treat `inputs[].path` as repository paths
+  relative to the JSON file or selected split.
+
+Example directory layout:
+
+```text
+dataset/
+├── train/
+│   ├── clips/clip_0001.mp4
+│   └── annotations_train.json
+├── valid/
+│   ├── clips/clip_0101.mp4
+│   └── annotations_valid.json
+└── test/
+    ├── clips/clip_0201.mp4
+    └── annotations_test.json
+```
+
+For the train split, set `DATA.train.video_path` to `dataset/train` and store
+the sample path as `clips/clip_0001.mp4`.
+
+### Multi-Input And Multi-View Samples
+
+Use multiple `inputs` entries when a sample has more than one synchronized view
+or modality.
+
+```json
+{
+  "id": "play_0001",
+  "inputs": [
+    {
+      "type": "video",
+      "path": "wide/play_0001.mp4",
+      "fps": 25.0
+    },
+    {
+      "type": "video",
+      "path": "close/play_0001.mp4",
+      "fps": 25.0
+    },
+    {
+      "type": "tracking_parquet",
+      "path": "tracking/play_0001.parquet"
+    }
+  ]
+}
+```
+
+Classification multi-view loading groups samples by `id` when the config uses
+`DATA.view_type: multi`. The current grouping helper also supports IDs with a
+`_view` suffix, such as `play_0001_view1` and `play_0001_view2`.
+
+## Classification Payload
+
+Classification labels live under `data[].labels`. The key under `labels` should
+match a root label head.
+
+```json
+{
+  "id": "clip_0001",
+  "inputs": [
+    {
+      "type": "video",
+      "path": "clips/clip_0001.mp4",
+      "fps": 25.0
+    }
+  ],
+  "labels": {
+    "action": {
+      "label": "shot"
+    },
+    "attributes": {
+      "labels": ["left_foot", "set_piece"]
+    }
+  }
+}
+```
+
+For classification training, OpenSportsLib expects `labels.action.label` by
+default. Samples without labels can be used for test/inference splits, but
+training and validation need labels.
+
+Smart predictions may include a confidence score. The annotation-tool convention
+is `confidence_score`; current OpenSportsLib prediction exporters use
+`confidence`.
+
+```json
+{
+  "labels": {
+    "action": {
+      "label": "shot",
+      "confidence_score": 0.91
+    }
+  }
+}
+```
+
+## Localization Payload
+
+Localization annotations live under `data[].events`. Each event is a point
+timestamp in milliseconds.
+
+```json
+{
+  "events": [
+    {
+      "head": "action",
+      "label": "pass",
+      "position_ms": 1240
+    },
+    {
+      "head": "action",
+      "label": "shot",
+      "position_ms": 4320,
+      "gameTime": "1 - 00:04",
+      "confidence_score": 0.84
+    }
+  ]
+}
+```
+
+OpenSportsLib localization prefers `position_ms` when present. If
+`position_ms` is missing, feature-based JSON loaders fall back to `gameTime`.
+For predictions and evaluation, current OpenSportsLib spotting outputs use
+`confidence`.
+
+## Description, Dense Description, And Q/A Payloads
+
+These payloads are part of the OSL JSON ecosystem. They are useful for datasets
+that need to round-trip through OpenSportsLab annotation tools, but they are not
+yet first-class OpenSportsLib training tasks.
+
+Clip-level captions:
+
+```json
+{
+  "captions": [
+    {
+      "lang": "en",
+      "text": "A quick attack ends with a shot on goal."
+    }
+  ]
+}
+```
+
+Timestamped dense captions:
+
+```json
+{
+  "dense_captions": [
+    {
+      "position_ms": 1100,
+      "lang": "en",
+      "text": "The midfielder plays a forward pass."
+    },
+    {
+      "position_ms": 3650,
+      "lang": "en",
+      "text": "The striker shoots from inside the area."
+    }
+  ]
+}
+```
+
+Grouped question/answer annotations:
+
+```json
+{
+  "answers": [
+    {
+      "question": "What happens after the pass?",
+      "answers": ["The receiving player shoots."]
+    }
+  ]
+}
+```
+
+## Complete Classification Example
+
+```json
+{
+  "version": "2.0",
+  "date": "2026-05-19",
+  "task": "action_classification",
+  "dataset_name": "soccer-classification-demo",
+  "description": "Clip-level action labels.",
+  "modalities": ["video"],
+  "metadata": {
+    "sport": "soccer",
+    "split": "train"
+  },
+  "labels": {
+    "action": {
+      "type": "single_label",
+      "labels": ["pass", "shot", "foul"]
+    },
+    "attributes": {
+      "type": "multi_label",
+      "labels": ["left_foot", "header", "set_piece"]
+    }
+  },
+  "data": [
+    {
+      "id": "clip_0001",
+      "inputs": [
+        {
+          "type": "video",
+          "path": "clips/clip_0001.mp4",
+          "fps": 25.0
+        }
+      ],
+      "labels": {
+        "action": {
+          "label": "shot"
+        },
+        "attributes": {
+          "labels": ["left_foot"]
+        }
+      },
+      "metadata": {
+        "match_id": "match_01"
+      }
+    }
+  ]
+}
+```
+
+## Complete Localization Example
+
+```json
+{
+  "version": "2.0",
+  "date": "2026-05-19",
+  "task": "action_spotting",
+  "dataset_name": "soccer-localization-demo",
+  "description": "Timestamped action events.",
+  "modalities": ["video"],
+  "metadata": {
+    "sport": "soccer",
+    "split": "train"
+  },
+  "labels": {
+    "action": {
+      "type": "single_label",
+      "labels": ["pass", "shot", "save"]
+    }
+  },
+  "data": [
+    {
+      "id": "attack_0001",
+      "inputs": [
+        {
+          "type": "video",
+          "path": "clips/attack_0001.mp4",
+          "fps": 25.0
+        }
+      ],
+      "events": [
+        {
+          "head": "action",
+          "label": "pass",
+          "position_ms": 1100,
+          "gameTime": "1 - 00:01"
+        },
+        {
+          "head": "action",
+          "label": "shot",
+          "position_ms": 3650,
+          "gameTime": "1 - 00:04"
+        }
+      ]
+    }
+  ]
+}
+```
+
+## Multi-Modal Tracking Example
+
+```json
+{
+  "version": "2.0",
+  "date": "2026-05-19",
+  "task": "action_classification",
+  "dataset_name": "soccer-gar-multimodal-demo",
+  "description": "Frames and tracking inputs for one action sample.",
+  "modalities": ["frames_npy", "tracking_parquet"],
+  "labels": {
+    "action": {
+      "type": "single_label",
+      "labels": ["PASS", "SHOT"]
+    }
+  },
+  "data": [
+    {
+      "id": "train_000001",
+      "inputs": [
+        {
+          "type": "frames_npy",
+          "path": "frames_npy/train/train_000001.npy",
+          "fps": 2.0
+        },
+        {
+          "type": "tracking_parquet",
+          "path": "tracking_parquet/train/train_000001.parquet"
+        }
+      ],
+      "labels": {
+        "action": {
+          "label": "PASS"
+        }
+      },
+      "metadata": {
+        "game_id": "game_001",
+        "position_ms": 124000,
+        "source_fps": 30.0,
+        "effective_fps": 2.0,
+        "window_size": 16,
+        "frame_interval": 15
+      }
+    }
+  ]
+}
+```
+
+## Prediction Payloads
+
+`infer()` returns predictions as an in-memory dictionary. It does not require
+the caller to provide an output path. Use `save_predictions(...)` when you want
+to write that dictionary to disk.
+
+Classification prediction example:
+
+```json
+{
+  "version": "2.0",
+  "task": "action_classification",
+  "date": "2026-05-19",
+  "metadata": {
+    "type": "predictions"
+  },
+  "data": [
+    {
+      "id": "clip_0001",
+      "labels": {
+        "action": {
+          "label": "shot",
+          "confidence": 0.91
+        }
+      }
+    }
+  ]
+}
+```
+
+Localization prediction example:
+
+```json
+{
+  "version": "2.0",
+  "date": "2026-05-19",
+  "task": "action_spotting",
+  "metadata": {
+    "type": "predictions"
+  },
+  "data": [
+    {
+      "inputs": [
+        {
+          "type": "video",
+          "path": "clips/attack_0001.mp4",
+          "fps": 2.0
+        }
+      ],
+      "events": [
+        {
+          "head": "action",
+          "label": "shot",
+          "frame": 73,
+          "position_ms": 36500,
+          "gameTime": "1 - 00:36",
+          "confidence": 0.84
+        }
+      ]
+    }
+  ]
+}
+```
+
+## Validation Checklist
+
+- `data` is a list.
+- Every supervised file has a root `labels` schema.
+- Classification samples use `labels.action.label` unless your code explicitly
+  passes a different task head.
+- Localization samples use `events[].position_ms` whenever possible.
+- Labels in samples/events are present in the matching root label list.
+- `inputs[].path` resolves from the expected split root or conversion
+  `media_root`.
+- Sample IDs are stable and unique, especially for evaluation.
diff --git a/docs/index.md b/docs/index.md
index eda5eb6..4387f82 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -32,6 +32,7 @@ From action recognition and temporal event spotting to retrieval and automatic c
 ## Quick links
 
 - [Installation](getting-started/installation.md)
+- [OSL JSON Format](data/osl-json-format.md)
 - [Project Structure](getting-started/project_structure.md)
 - [SLURM Guide (salloc, srun)](getting-started/slurm.md)
 - [Training & Inference](tni/tni.md)
@@ -49,4 +50,4 @@ This project offers two licensing options to suit different needs:
   Designed for commercial use, this option allows integration of the software into proprietary products and services without the open-source obligations of AGPL-3.0.  
   For commercial deployment, please contact the project maintainers to obtain a commercial license.
 
-**Contact:** OpenSportsLab / project maintainers
\ No newline at end of file
+**Contact:** OpenSportsLab / project maintainers
diff --git a/docs/tni/tni.md b/docs/tni/tni.md
index 69605d1..029a4ed 100644
--- a/docs/tni/tni.md
+++ b/docs/tni/tni.md
@@ -12,44 +12,305 @@ For full key-by-key config documentation and Python-only override workflow, see
 ---
 ## Configuration Sample (.yaml) file
 
-The examples below are included directly from the latest YAML files in
-`opensportslib/config/`, so the documentation stays aligned with the runnable
-configs.
+The snippets below show the main structure of the runnable configs in
+`opensportslib/config/`. Use the source files when you need the complete
+experiment defaults.
 
 ### 1. Classification
 
 ```yaml
---8<-- "opensportslib/config/classification.yaml"
+TASK: classification
+
+DATA:
+  dataset_name: mvfouls
+  data_dir: /path/to/OSL-XFoul/224p
+  data_modality: video
+  view_type: multi
+  train:
+    video_path: ${DATA.data_dir}/train
+    path: ${DATA.train.video_path}/train.json
+    dataloader:
+      batch_size: 8
+      shuffle: true
+      num_workers: 4
+  valid:
+    video_path: ${DATA.data_dir}/valid
+    path: ${DATA.valid.video_path}/valid.json
+    dataloader:
+      batch_size: 1
+      shuffle: false
+  test:
+    video_path: ${DATA.data_dir}/test
+    path: ${DATA.test.video_path}/test.json
+    dataloader:
+      batch_size: 1
+      shuffle: false
+  num_frames: 16
+  input_fps: 25
+  target_fps: 17
+  frame_size: [224, 224]
+
+MODEL:
+  type: custom
+  backbone:
+    type: mvit_v2_s
+  neck:
+    type: MV_Aggregate
+    agr_type: max
+  head:
+    type: MV_LinearLayer
+
+TRAIN:
+  monitor: balanced_accuracy
+  mode: max
+  epochs: 20
+  criterion:
+    type: CrossEntropyLoss
+  optimizer:
+    type: AdamW
+    lr: 0.0001
+
+SYSTEM:
+  save_dir: ./checkpoints
+  device: cuda
+  GPU: 4
 ```
 
 ### 2. Classification (Tracking)
 
 ```yaml
---8<-- "opensportslib/config/sngar-tracking.yaml"
+TASK: classification
+
+DATA:
+  dataset_name: sngar
+  data_modality: tracking_parquet
+  data_dir: /path/to/soccernetpro-classification-GAR/tracking-parquet
+  train:
+    video_path: ${DATA.data_dir}/train
+    path: ${DATA.train.video_path}/train.json
+    dataloader:
+      batch_size: 32
+      shuffle: true
+  valid:
+    video_path: ${DATA.data_dir}/valid
+    path: ${DATA.valid.video_path}/valid.json
+    dataloader:
+      batch_size: 32
+      shuffle: false
+  test:
+    video_path: ${DATA.data_dir}/test
+    path: ${DATA.test.video_path}/test.json
+    dataloader:
+      batch_size: 32
+      shuffle: false
+  num_frames: 16
+  frame_interval: 9
+  normalize: true
+  num_objects: 23
+  feature_dim: 8
+
+MODEL:
+  type: custom
+  backbone:
+    type: graph_conv
+    encoder: gin
+    hidden_dim: 64
+    num_layers: 20
+  neck:
+    type: TemporalAggregation
+    agr_type: maxpool
+  head:
+    type: TrackingClassifier
+    num_classes: 10
+  edge: positional
+  k: 8
+  r: 15.0
+
+TRAIN:
+  monitor: loss
+  mode: min
+  epochs: 100
+  optimizer:
+    type: Adam
+    lr: 0.001
+
+SYSTEM:
+  save_dir: ./checkpoints_tracking
+  device: cuda
+  GPU: 1
 ```
 
 ### 3. Localization
 
 ```yaml
---8<-- "opensportslib/config/localization.yaml"
+TASK: localization
+dali: true
+
+DATA:
+  dataset_name: SoccerNet
+  data_dir: /path/to/OSL-SNBAS/224p-2024
+  classes:
+    - PASS
+    - DRIVE
+    - HEADER
+    - HIGH PASS
+    - OUT
+    - CROSS
+    - THROW IN
+    - SHOT
+    - BALL PLAYER BLOCK
+    - PLAYER SUCCESSFUL TACKLE
+    - FREE KICK
+    - GOAL
+  modality: rgb
+  clip_len: 100
+  input_fps: 25
+  extract_fps: 2
+  target_height: 224
+  target_width: 398
+  train:
+    type: VideoGameWithDali
+    video_path: ${DATA.data_dir}/train
+    path: ${DATA.train.video_path}/train.json
+    dataloader:
+      batch_size: 8
+      shuffle: true
+  valid:
+    type: VideoGameWithDali
+    video_path: ${DATA.data_dir}/valid
+    path: ${DATA.valid.video_path}/valid.json
+    dataloader:
+      batch_size: 8
+      shuffle: true
+  test:
+    type: VideoGameWithDaliVideo
+    video_path: ${DATA.data_dir}/test
+    path: ${DATA.test.video_path}/test.json
+    results: results_spotting_test
+    nms_window: 2
+    metric: tight
+    overlap_len: 50
+
+MODEL:
+  type: E2E
+  runner:
+    type: runner_e2e
+  backbone:
+    type: rny008_gsm
+  head:
+    type: gru
+  multi_gpu: true
+
+TRAIN:
+  type: trainer_e2e
+  num_epochs: 10
+  criterion_valid: map
+  criterion:
+    type: CrossEntropyLoss
+  optimizer:
+    type: AdamWithScaler
+    lr: 0.01
+
+SYSTEM:
+  save_dir: ./checkpoints
+  work_dir: ${SYSTEM.save_dir}
+  device: cuda
+  GPU: 4
 ```
 
-## Annotations (train/valid/test) (.json) Format
-
-Download annotation files from the links below.
-
-### 1. Classification
+## Annotations (train/valid/test) JSON Format
+
+OpenSportsLib uses the OSL JSON v2.0 format for annotation files. Each split
+file is a JSON object with a root `labels` schema and a `data` array of samples.
+For the full schema, supported input types, multi-modal examples, and prediction
+payloads, see [OSL JSON Format](../data/osl-json-format.md).
+
+### Classification annotations
+
+Classification samples use `data[].labels.action.label` by default. The label
+must be present in the root `labels.action.labels` list.
+
+```json
+{
+  "version": "2.0",
+  "task": "action_classification",
+  "labels": {
+    "action": {
+      "type": "single_label",
+      "labels": ["pass", "shot"]
+    }
+  },
+  "data": [
+    {
+      "id": "clip_0001",
+      "inputs": [
+        {
+          "type": "video",
+          "path": "clips/clip_0001.mp4",
+          "fps": 25.0
+        }
+      ],
+      "labels": {
+        "action": {
+          "label": "shot"
+        }
+      }
+    }
+  ]
+}
+```
 
-- **MVFouls**  
-  https://huggingface.co/datasets/OpenSportsLab/opensportslib-classification-vars/tree/mvfouls  
+For video classification, `inputs[].path` is resolved from the split media root
+in the YAML config, such as `DATA.train.video_path`. For tracking
+classification, use `type: tracking_parquet` and set
+`DATA.data_modality: tracking_parquet`.
+
+### Localization annotations
+
+Localization samples use `data[].events[]`. OpenSportsLib prefers
+`position_ms` and falls back to `gameTime` in feature-based JSON loaders.
+
+```json
+{
+  "version": "2.0",
+  "task": "action_spotting",
+  "labels": {
+    "action": {
+      "type": "single_label",
+      "labels": ["pass", "shot"]
+    }
+  },
+  "data": [
+    {
+      "id": "game_0001",
+      "inputs": [
+        {
+          "type": "video",
+          "path": "games/game_0001.mp4",
+          "fps": 25.0
+        }
+      ],
+      "events": [
+        {
+          "head": "action",
+          "label": "pass",
+          "position_ms": 1240,
+          "gameTime": "1 - 00:01"
+        }
+      ]
+    }
+  ]
+}
+```
 
-- **SVFouls**  
-  https://huggingface.co/datasets/OpenSportsLab/opensportslib-classification-vars/tree/svfouls  
+### Public example datasets
 
-### 2. Localization
+Download or inspect annotation files from:
 
-- **Ball Action Spotting**  
-  https://huggingface.co/datasets/OpenSportsLab/opensportslib-localization-snbas/tree/main  
+- **Classification: MVFouls and SVFouls**<br>
+  https://huggingface.co/datasets/OpenSportsLab/opensportslib-classification-vars
+- **Localization: Ball Action Spotting**<br>
+  https://huggingface.co/datasets/OpenSportsLab/opensportslib-localization-snbas
 
 
 ---
@@ -87,7 +348,7 @@ myModel.load_weights(weights=weights)
 ```
 
 ## Train on SINGLE GPU
-```bash
+```python
 from opensportslib import model
 import wandb
 
@@ -110,7 +371,7 @@ myModel.train(
 ```
 
 ## Train on Multiple GPU (DDP)
-```bash
+```python
 from opensportslib import model
 
 def main():
@@ -136,7 +397,7 @@ if __name__ == "__main__":
 
 
 ## Test / Inference on SINGLE GPU
-```bash
+```python
 from opensportslib import model
 
 # Load trained model
@@ -155,18 +416,27 @@ predictions = myModel.infer(
     test_set="/path/to/test_annotations.json",
 )
 
+saved_predictions = myModel.save_predictions(
+    output_path="/path/to/predictions.json",
+    predictions=predictions,
+)
+
 metrics = myModel.evaluate(
     test_set="/path/to/test_annotations.json",
 )
 
 metrics_from_saved_predictions = myModel.evaluate(
     test_set="/path/to/test_annotations.json",
-    predictions="/path/to/predictions.json",
+    predictions=saved_predictions,
 )
 ```
 
+`infer()` returns an in-memory OSL JSON-style prediction payload. It does not
+require an output path. `save_predictions(...)` is the explicit API for writing
+that payload to disk.
+
 ## Test / Inference on Multiple GPU (DDP)
-```bash
+```python
 from opensportslib import model
 
 def main():
diff --git a/docs/tools/dataset-conversion.md b/docs/tools/dataset-conversion.md
new file mode 100644
index 0000000..5f2e549
--- /dev/null
+++ b/docs/tools/dataset-conversion.md
@@ -0,0 +1,220 @@
+# Convert Tools
+
+Scripts for building OpenSportsLib (OSL) datasets from raw sources, and for
+converting OSL JSON annotations to and from a Parquet + WebDataset
+representation suited for large-scale training. For the annotation schema, see
+the OSL JSON format guide in `docs/data/osl-json-format.md`.
+
+## Scripts
+
+Build (raw source -> OSL JSON):
+
+- `build_soccernet_gar.py`: PFF FC raw data -> SoccerNet-GAR classification dataset (OSL JSON manifest of windowed action clips).
+- `build_soccernet_gar_action_spotting.py`: SoccerNet-GAR classification manifest -> SoccerNet-GAR action-spotting dataset (per-game manifest with
+  event timestamps).
+
+Convert (OSL JSON <-> Parquet + WebDataset):
+
+- `osl_json_to_parquet_webdataset.py`: OSL JSON -> Parquet + WebDataset.
+- `parquet_webdataset_to_osl_json.py`: Parquet + WebDataset -> OSL JSON.
+
+## Pipeline overview
+
+Stage 1 and stage 2 are SoccerNet-GAR-specific (they know about PFF schemas,
+event labels, and clip windowing). The conversion scripts are generic OSL
+tooling: they accept any OSL JSON manifest and do not assume a particular sport
+or task.
+
+## Build scripts
+
+### `build_soccernet_gar.py`
+
+Builds the SoccerNet-GAR classification dataset from raw PFF FC data. The script has two CLI subcommands:
+
+- `convert`: Organize raw PFF data into per-split (train/valid/test) folders and produce a per-split manifest JSON file. Tracking files (`.jsonl.bz2`) are converted to Parquet; video files are copied as-is.
+- `extract`: Read a `convert` output and emit fixed-length action clips centered on each annotated event. Each clip carries a tracking-window
+  (Parquet) and/or a frames-window (NumPy) modality.
+
+Per-sample metadata is written into each entry's `metadata` block in the OSL JSON, not at the top level. Fields: `game_id`, `game_time`, `position_ms`, `team`, `source_fps` (rate of the underlying video), `effective_fps` (rate the clip was sampled at), `window_size`, `frame_interval`.
+
+The `convert` subcommand reads from raw PFF FC data, available on Hugging Face: https://huggingface.co/datasets/OpenSportsLab/PFF. Download it so the local layout matches the structure below. If you only want to run `extract` against a previously-built dataset, you can skip this download.
+
+Expected input layout:
+
+```
+PFF-FC/
+├── RawEventsData/      # one .json per game (PFF event format)
+├── PlayerPoseTracking/ # one .jsonl.bz2 per game (PFF tracking)
+└── 224p/               # one .mp4 per game (broadcast video)
+```
+
+CLI usage:
+
+```bash
+# Stage 1a: convert tracking files (.jsonl.bz2 -> .parquet) per split
+python tools/convert/build_soccernet_gar.py convert \
+    --modality tracking \
+    --events-dir PFF-FC/RawEventsData \
+    --tracking-dir PFF-FC/PlayerPoseTracking \
+    --output-dir data/tracking_dataset \
+    --num-workers 24 \
+    --fps 30
+
+# Stage 1b: copy videos per split
+python tools/convert/build_soccernet_gar.py convert \
+    --modality video \
+    --events-dir PFF-FC/RawEventsData \
+    --video-dir PFF-FC/224p \
+    --output-dir data/video_dataset \
+    --fps 30
+
+# Stage 2: extract windowed clips (modality: frames, tracking, or both)
+python tools/convert/build_soccernet_gar.py extract \
+    --video-dir data/video_dataset \
+    --tracking-dir data/tracking_dataset \
+    --output-dir data/soccernet_gar \
+    --modality both \
+    --window-size 16 \
+    --frame-interval 9 \
+    --num-workers 24
+# Stage 2 alternative: express the sampling rate directly.
+# --target-fps replaces --frame-interval. The two are mutually exclusive.
+# Stride is derived per game as round(source_fps / target_fps);
+# a 16-frame window at 2 Hz covers 8 seconds.
+python tools/convert/build_soccernet_gar.py extract \
+    --video-dir data/video_dataset \
+    --tracking-dir data/tracking_dataset \
+    --output-dir data/soccernet_gar \
+    --modality both \
+    --window-size 16 \
+    --target-fps 2 \
+    --num-workers 24
+```
+
+Stage 1 has a skip-if-output-exists guard. If you change upstream data or the conversion logic, delete the output directory before rerunning.
+
+### `build_soccernet_gar_action_spotting.py`
+
+Reads a SoccerNet-GAR classification manifest (clip-level) and emits a SoccerNet-GAR action-spotting dataset (per-game manifest with all events
+sorted by `position_ms`). Splits are inherited from the input manifest, so each game stays in the split it had for classification.
+
+The script does not re-derive events; it groups the same clips by `game_id` and reformats. Two modalities supported, run independently:
+
+- `video`: copy `{game_id}.mp4` from `--source-dir` to
+  `{output-dir}/{split}/{game_id}.mp4`.
+- `tracking`: read `{game_id}.parquet` from
+  `{source-dir}/{split}/videos/`, sort by `(videoTimeMs, frameNum)`, drop
+  duplicate rows, and write to `{output-dir}/{split}/{game_id}.parquet`.
+
+CLI usage:
+
+```bash
+# video spotting dataset
+python tools/convert/build_soccernet_gar_action_spotting.py \
+    --modality video \
+    --manifest-dir sngar-frames \
+    --source-dir /path/to/PFF-FC/720p \
+    --output-dir data/spotting_video
+
+# tracking spotting dataset
+python tools/convert/build_soccernet_gar_action_spotting.py \
+    --modality tracking \
+    --manifest-dir sngar-frames \
+    --source-dir data/tracking_dataset \
+    --output-dir data/spotting_tracking
+```
+
+## Convert scripts
+
+### JSON -> Parquet + WebDataset
+
+```bash
+python tools/convert/osl_json_to_parquet_webdataset.py <json_path> <media_root> <output_dir> [options]
+```
+
+### Parquet + WebDataset -> JSON
+
+```bash
+python tools/convert/parquet_webdataset_to_osl_json.py <dataset_dir> <output_json_path> [options]
+```
+
+## Python API
+
+```python
+from opensportslib.tools import convert_json_to_parquet, convert_parquet_to_json
+```
+
+## Round-trip examples
+
+```bash
+# Localization
+python tools/convert/osl_json_to_parquet_webdataset.py \
+    /path/to/Localization/gymnastics/annotations.json \
+    /path/to/Localization/gymnastics \
+    /tmp/gymnastics_wds \
+    --overwrite
+
+python tools/convert/parquet_webdataset_to_osl_json.py \
+    /tmp/gymnastics_wds \
+    /tmp/gymnastics_reconstructed.json
+
+# Classification
+python tools/convert/osl_json_to_parquet_webdataset.py \
+    /path/to/Classification/svfouls/annotations_test.json \
+    /path/to/Classification/svfouls \
+    /path/to/svfouls_parquet_webdataset \
+    --shard-size 500MB \
+    --missing-policy skip \
+    --overwrite
+
+python tools/convert/parquet_webdataset_to_osl_json.py \
+    /path/to/svfouls_parquet_webdataset \
+    /path/to/svfouls_back_to_json/reconstructed_annotations.json \
+    --extract-media \
+    --output-media-root /path/to/svfouls_back_to_json \
+    --indent 2
+
+# SN-GAR-tracking
+python tools/convert/osl_json_to_parquet_webdataset.py \
+    /path/to/sngar-tracking/annotations_test.json \
+    /path/to/sngar-tracking \
+    /path/to/sngar-tracking_parquet_webdataset \
+    --shard-size 500MB \
+    --missing-policy skip \
+    --overwrite
+
+python tools/convert/parquet_webdataset_to_osl_json.py \
+    /path/to/sngar-tracking_parquet_webdataset \
+    /path/to/sngar-tracking_back_to_json/reconstructed_annotations.json \
+    --extract-media \
+    --output-media-root /path/to/sngar-tracking_back_to_json \
+    --indent 2
+```
+
+## End-to-end SoccerNet-GAR example
+
+Build the classification dataset from raw PFF, derive the spotting variant, then convert both to Parquet + WebDataset:
+
+```bash
+# 1. classification: raw PFF -> OSL JSON (clip-level)
+python tools/convert/build_soccernet_gar.py convert --modality tracking
+python tools/convert/build_soccernet_gar.py convert --modality video
+python tools/convert/build_soccernet_gar.py extract --modality both \
+    --output-dir data/sngar_frames
+
+# 2. spotting: classification manifest -> per-game OSL JSON
+python tools/convert/build_soccernet_gar_action_spotting.py \
+    --modality tracking \
+    --manifest-dir data/sngar_frames \
+    --source-dir data/tracking_dataset \
+    --output-dir data/spotting_tracking
+
+# 3. either dataset -> Parquet + WebDataset for training
+python tools/convert/osl_json_to_parquet_webdataset.py \
+    data/sngar_frames/annotations_train.json \
+    data/sngar_frames \
+    data/sngar_frames_wds \
+    --shard-size 500MB \
+    --missing-policy skip \
+    --overwrite
+```
diff --git a/docs/tools/hf-dataset-transfer.md b/docs/tools/hf-dataset-transfer.md
new file mode 100644
index 0000000..9fe65bd
--- /dev/null
+++ b/docs/tools/hf-dataset-transfer.md
@@ -0,0 +1,159 @@
+# Download Tools
+
+Scripts to download and upload OSL datasets via Hugging Face Hub. These tools
+read file references from OSL JSON `data[].inputs[]`; see
+`docs/data/osl-json-format.md` for the dataset schema.
+
+## Scripts
+
+- `download_osl_hf.py`
+	- Downloads an OSL split by repo, revision, and split name.
+	- JSON mode downloads `<split>.json` and all referenced inputs; Parquet mode downloads `<split>/`.
+- `download_hf_repo.py`
+	- Downloads a full HuggingFace repository snapshot for a given repo and revision.
+	- Best when you want the entire repo content for a branch/tag/commit.
+- `upload_osl_hf.py`
+	- Uploads local dataset inputs from JSON to a HuggingFace dataset repo.
+	- Automatically creates the target dataset repo if it does not exist.
+	- Automatically creates the target revision branch when `--revision` is not `main` and the branch is missing.
+
+## Full-repo download (recommended for complete branches)
+
+Basic usage:
+
+```bash
+python tools/download/download_hf_repo.py \
+	--repo-id OpenSportsLab/OSL-XFoul \
+	--revision main-parquet \
+	--output-dir /ibex/project/c2134/opensportslab/datasets/OSL-XFoul/main-parquet
+```
+
+Examples for all repos mentioned so far:
+
+```bash
+# OSL-XFoul
+python tools/download/download_hf_repo.py \
+	--repo-id OpenSportsLab/OSL-XFoul \
+	--revision 224p \
+	--output-dir /ibex/project/c2134/opensportslab/datasets/OSL-XFoul/224p
+
+# SoccerNet localization SNAS (224p)
+python tools/download/download_hf_repo.py \
+	--repo-id OpenSportsLab/soccernetpro-localization-snas \
+	--revision 224p \
+	--output-dir /ibex/project/c2134/opensportslab/datasets/soccernetpro-localization-snas/224p
+
+python tools/download/download_hf_repo.py \
+	--repo-id OpenSportsLab/soccernetpro-localization-snas \
+	--revision 720p \
+	--output-dir /ibex/project/c2134/opensportslab/datasets/soccernetpro-localization-snas/720p
+
+# SoccerNet localization SNAS (ResNET_PCA512)
+python tools/download/download_hf_repo.py \
+	--repo-id OpenSportsLab/soccernetpro-localization-snas \
+	--revision ResNET_PCA512 \
+	--output-dir /ibex/project/c2134/opensportslab/datasets/soccernetpro-localization-snas/ResNET_PCA512
+
+# SoccerNet localization SNBAS (224p-2023)
+python tools/download/download_hf_repo.py \
+	--repo-id OpenSportsLab/soccernetpro-localization-snbas \
+	--revision 224p-2023 \
+	--output-dir /ibex/project/c2134/opensportslab/datasets/soccernetpro-localization-snbas/224p-2023
+
+# SoccerNet classification VARS (mvfouls)
+python tools/download/download_hf_repo.py \
+	--repo-id OpenSportsLab/soccernetpro-classification-vars \
+	--revision mvfouls \
+	--output-dir /ibex/project/c2134/opensportslab/datasets/soccernetpro-classification-vars/mvfouls
+
+# SoccerNet classification GAR (tracking-parquet, gated)
+python tools/download/download_hf_repo.py \
+	--repo-id OpenSportsLab/soccernetpro-classification-GAR \
+	--revision tracking-parquet \
+	--output-dir /ibex/project/c2134/opensportslab/datasets/soccernetpro-classification-GAR/tracking-parquet \
+	--token hf_xxx
+
+# SoccerNet classification GAR (frames-parquet, gated)
+python tools/download/download_hf_repo.py \
+	--repo-id OpenSportsLab/soccernetpro-classification-GAR \
+	--revision frames-parquet \
+	--output-dir /ibex/project/c2134/opensportslab/datasets/soccernetpro-classification-GAR/frames-parquet \
+	--token hf_xxx
+```
+
+SLURM equivalent using positional args:
+
+```bash
+sbatch tools/slurm/datasets/download_hf_repo.sbatch \
+	OpenSportsLab/soccernetpro-localization-snas \
+	224p \
+	/ibex/project/c2134/opensportslab/datasets/soccernetpro-localization-snas/224p
+```
+
+## Targeted download from OSL JSON or folder URL
+
+```bash
+for revision in 224p 720p; do
+for split in test valid train; do
+python tools/download/download_osl_hf.py \
+	--repo-id OpenSportsLab/OSL-XFoul --revision $revision --split $split --format parquet \
+	--output-dir /ibex/project/c2134/opensportslab/datasets/OSL-XFoul
+done
+done
+```
+
+The split downloader treats `--output-dir` as a root and writes files under
+`<output-dir>/<revision>/<split>`.
+
+## Upload
+
+```bash
+# JSON mode (upload dataset JSON + referenced input files)
+python tools/download/upload_osl_hf.py \
+	--repo-id <org/repo> \
+	--json-path <local_dataset.json> \
+	--split test \
+	--format json \
+	--revision main
+
+# Parquet mode (convert to Parquet + WebDataset and upload folder)
+python tools/download/upload_osl_hf.py \
+	--repo-id <org/repo> \
+	--json-path <local_dataset.json> \
+	--split test \
+	--format parquet \
+	--shard-size 1GB \
+	--revision main
+```
+
+```bash
+for revision in ResNET_PCA512 224p 720p; do
+case "$revision" in
+ResNET_PCA512) shard_size="1GB" ;;
+224p) shard_size="5GB" ;;
+720p) shard_size="20GB" ;;
+*) shard_size="5GB" ;;
+esac
+
+for split in test valid train challenge; do
+python tools/download/upload_osl_hf.py \
+	--repo-id OpenSportsLab/OSL-SoccerNet --revision $revision --split $split --format parquet --shard-size $shard_size \
+	--json-path /ibex/project/c2134/opensportslab/datasets/soccernetpro-localization-snas/$revision/$split.json
+done
+done
+```
+
+
+
+## Notes
+
+- Gated repos require accepted access terms and authentication (`huggingface-cli login` or `--token`).
+- `download_hf_repo.py` accepts `--repo-type` (`dataset`, `model`, `space`) and optional `--ignore` glob patterns.
+- `upload_osl_hf.py` accepts `--format` (`json`, `parquet`).
+- In parquet mode, output is uploaded under a folder named after the JSON file stem.
+
+## Python API
+
+```python
+from opensportslib.tools import download_dataset_split_from_hf, upload_dataset_inputs_from_json_to_hf
+```
diff --git a/mkdocs.yml b/mkdocs.yml
index 42a74df..e357d72 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -38,6 +38,13 @@ nav:
       - Training & Inference: tni/tni.md
       - Configuration Guide: tni/config-guide.md
 
+  - Data Formats:
+      - OSL JSON Format: data/osl-json-format.md
+
+  - Tools:
+      - Dataset Conversion: tools/dataset-conversion.md
+      - Hugging Face Dataset Transfer: tools/hf-dataset-transfer.md
+
   - API Reference:
       - API: api/api.md
   
diff --git a/opensportslib/apis/README.md b/opensportslib/apis/README.md
index 318e081..1ea33aa 100644
--- a/opensportslib/apis/README.md
+++ b/opensportslib/apis/README.md
@@ -30,13 +30,22 @@ Each task model exposes:
 
 Current behavior:
 
-- `infer()` runs the model on `test_set` and returns predictions directly as an in-memory OSL JSON payload (including confidence scores when provided by the task output format)
-- `infer()` does not write predictions to disk
-- `evaluate()` runs inference on `test_set` and computes metrics against that same test set ground truth when `predictions` is not provided
-- `evaluate(predictions=...)` skips inference and evaluates an in-memory predictions dict or prediction file path directly
-- `save_predictions(output_path=..., predictions=...)` saves an OSL JSON predictions payload to a file
-- `ClassificationModel(config=..., weights=...)` uses constructor weights as the default for later `train()` / `infer()` calls
-- `LocalizationModel(config=..., weights=...)` stores constructor weights lazily and loads them on the first `train()` / `infer()` call that needs them
+| Method | Main inputs | Returns | Notes |
+| --- | --- | --- | --- |
+| `load_weights(weights=...)` | Local checkpoint path or Hugging Face model ID | `None` | Loads weights into the task wrapper. |
+| `train(train_set=..., valid_set=...)` | OSL JSON train/validation files | Best checkpoint path or `None` | Split paths can also come from the YAML config. |
+| `infer(test_set=...)` | OSL JSON test/inference file | In-memory OSL JSON-style prediction dict | The public API does not require an output path. Use `save_predictions(...)` for explicit persistence. |
+| `evaluate(test_set=...)` | OSL JSON test file | Metrics dict | Runs inference first when `predictions` is not provided. |
+| `evaluate(test_set=..., predictions=...)` | OSL JSON test file plus prediction dict/path | Metrics dict | Skips inference and evaluates the provided predictions. |
+| `save_predictions(output_path=..., predictions=...)` | Prediction dict returned by `infer()` | Saved file path | Explicitly writes an OSL JSON prediction payload to disk. |
+
+Additional weight behavior:
+
+- `ClassificationModel(config=..., weights=...)` uses constructor weights as the default for later `train()` / `infer()` calls.
+- `LocalizationModel(config=..., weights=...)` stores constructor weights lazily and loads them on the first `train()` / `infer()` call that needs them.
+
+Annotation and prediction payloads follow the OSL JSON data model. For the full
+schema, see the docs page `docs/data/osl-json-format.md`.
 
 ## Minimal Usage
 
diff --git a/tools/README.md b/tools/README.md
index 33f328b..71034a9 100644
--- a/tools/README.md
+++ b/tools/README.md
@@ -35,6 +35,7 @@ tools/
 
 ## Folder guides
 
+- See [docs/data/osl-json-format.md](../docs/data/osl-json-format.md) for the OSL JSON schema used by dataset tools.
 - See [tools/convert/README.md](convert/README.md) for conversion scripts and examples.
 - See [tools/download/README.md](download/README.md) for HuggingFace download/upload scripts.
 - See [tools/slurm/README.md](slurm/README.md) for Ibex SLURM workflows (`salloc`, `srun`, `sbatch`).
diff --git a/tools/convert/README.md b/tools/convert/README.md
index 97259e1..5f2e549 100644
--- a/tools/convert/README.md
+++ b/tools/convert/README.md
@@ -1,7 +1,9 @@
 # Convert Tools
 
-Scripts for building OpenSportsLib (OSL) datasets from raw sources, and for converting OSL JSON annotations to and from a Parquet + WebDataset
-representation suited for large-scale training.
+Scripts for building OpenSportsLib (OSL) datasets from raw sources, and for
+converting OSL JSON annotations to and from a Parquet + WebDataset
+representation suited for large-scale training. For the annotation schema, see
+the OSL JSON format guide in `docs/data/osl-json-format.md`.
 
 ## Scripts
 
@@ -18,7 +20,10 @@ Convert (OSL JSON <-> Parquet + WebDataset):
 
 ## Pipeline overview
 
-Stage 1 and stage 2 are SoccerNet-GAR-specific (they know about PFF schemas, event labels, and clip windowing). The conversion scripts at the right are generic OSL tooling: they accept any OSL JSON manifest and don't assume a particular sport or task.
+Stage 1 and stage 2 are SoccerNet-GAR-specific (they know about PFF schemas,
+event labels, and clip windowing). The conversion scripts are generic OSL
+tooling: they accept any OSL JSON manifest and do not assume a particular sport
+or task.
 
 ## Build scripts
 
@@ -72,9 +77,10 @@ python tools/convert/build_soccernet_gar.py extract \
     --window-size 16 \
     --frame-interval 9 \
     --num-workers 24
-```
-
-# Stage 2 alternative: express the sampling rate directly. --target-fps replaces --frame-interval. The two are mutually exclusive. Stride is derived per game as round(source_fps / target_fps); a 16-frame window at 2 Hz covers 8 seconds.
+# Stage 2 alternative: express the sampling rate directly.
+# --target-fps replaces --frame-interval. The two are mutually exclusive.
+# Stride is derived per game as round(source_fps / target_fps);
+# a 16-frame window at 2 Hz covers 8 seconds.
 python tools/convert/build_soccernet_gar.py extract \
     --video-dir data/video_dataset \
     --tracking-dir data/tracking_dataset \
@@ -211,4 +217,4 @@ python tools/convert/osl_json_to_parquet_webdataset.py \
     --shard-size 500MB \
     --missing-policy skip \
     --overwrite
-```
\ No newline at end of file
+```
diff --git a/tools/download/README.md b/tools/download/README.md
index 61ed064..9fe65bd 100644
--- a/tools/download/README.md
+++ b/tools/download/README.md
@@ -1,6 +1,8 @@
 # Download Tools
 
-Scripts to download and upload OSL datasets via HuggingFace Hub.
+Scripts to download and upload OSL datasets via Hugging Face Hub. These tools
+read file references from OSL JSON `data[].inputs[]`; see
+`docs/data/osl-json-format.md` for the dataset schema.
 
 ## Scripts
 
@@ -143,7 +145,6 @@ done
 
 
 
-```
 ## Notes
 
 - Gated repos require accepted access terms and authentication (`huggingface-cli login` or `--token`).