isu-score-parser is a versatile tool designed to extract and structure figure skating data, including results, metadata, and technical protocols.
While it is optimized for Synchronized Skating, it supports most artistic disciplines. The project is built into two independent modules, allowing you to use only what you need.
Note
Maintenance is primarily focused on Synchronized Skating. While other disciplines are generally supported, full compatibility is not guaranteed if their PDF or page structures deviate significantly from the tested standards.
To see examples of outputs you can check the test-outputs folder in the test branch.
Extracts locations, results, and panel data from ISU event pages.
- Retro-compatible: Supports both modern and legacy competition page layouts.
- Wayback Machine Integration: Automatically explores archived pages and their relative links using the archive.org API.
- Robust Parsing: Handles various statuses (Ranked, Withdrawn, Did Not Reach Final).
pip install requests lxml pandas beautifulsoup4 regex1. Scrape an event page
Extract metadata and results. You can also trigger the PDF download immediately with the -d flag.
python3 main.py event scrape <url> [OPTIONS]| Option | Description |
|---|---|
-d, --download-pdf |
Dowload the scores PDF found during the scrapping |
-o, --output-dir |
Output directory. Created if it doesn't exist. If not specified, a generic directory will be created. |
2. Download PDFs from a JSON output
If you already have a JSON result from a previous scrape, use this to fetch the PDF files.
python3 main.py event dl <FILE.json> [OPTIONS]| Option | Description |
|---|---|
-o, --output-dir |
Output directory name. If it doesnt exists it will be created. Defaults to the same directory as the JSON file. |
Examples
python3 main.py event dl example.json -o Directorypython3 main.py event scrape https://example.com -o DirectoryA tool to extract score tables from synchro skating score PDFs using python. The extracted tables of scores are stored into json files, and can be completed by adding a yaml file to the parser. The parser also support other artistic disciplines.
Features:
- Retrocompatible (Up to 2005)
- Multiple discipline support
- base value bonus support
- Deduction votes support
- No call support
Requires :
-
Python 3.10+
-
Python dependencies :
- pandas (2.3.3)
- camelot-py (1.0.9)
- pdfplumber (0.11.9)
- PyYaml (6.0.3) (optional : download if you intend using YAML file to complete your output)
pip install pandas "camelot-py[base]" pdfplumber pyyamlUse the following options to parse your pdf:
| Options | Required | Descriptions |
|---|---|---|
-p, --pdf |
yes | PDF file path |
-y, --yaml |
no | YAML file path to complete the competition info |
-b, --begin |
yes | First page to parse |
-e, --end |
no | Last page to parse. If not specified only the first page entered will be parsed |
-o, --output |
no | Output directory. If it doesnt exists it will be created, if not specified a generical output directory will be created to put the jsons generated. |
Usage :
python3 main.py [OPTIONS]With the a YAML file following this patern:
schema_version: 1
competition:
name: ISU World Synchronized Skating Championships
location:
country: SWE
city: Stockholm
date: 2018-04-06
season: 2017-2018
source_url: example.orgNone of the entries (except shema_version) or required when parsing. You can remove some of them if data is missing.
- Possibility to parse multiple pdfs at a time.
- Connect the 2 modules