isu-score-parser

isu-score-parser is a versatile tool designed to extract and structure figure skating data, including results, metadata, and technical protocols.

While it is optimized for Synchronized Skating, it supports most artistic disciplines. The project is built into two independent modules, allowing you to use only what you need.

Note

Maintenance is primarily focused on Synchronized Skating. While other disciplines are generally supported, full compatibility is not guaranteed if their PDF or page structures deviate significantly from the tested standards.

To see examples of outputs you can check the test-outputs folder in the test branch.

Event Data Scraper

Extracts locations, results, and panel data from ISU event pages.

Features

Retro-compatible: Supports both modern and legacy competition page layouts.
Wayback Machine Integration: Automatically explores archived pages and their relative links using the archive.org API.
Robust Parsing: Handles various statuses (Ranked, Withdrawn, Did Not Reach Final).

Installation

pip install requests lxml pandas beautifulsoup4 regex

Usage

1. Scrape an event page

Extract metadata and results. You can also trigger the PDF download immediately with the -d flag.

python3 main.py event scrape <url> [OPTIONS]

Option	Description
`-d, --download-pdf`	Dowload the scores PDF found during the scrapping
`-o, --output-dir`	Output directory. Created if it doesn't exist. If not specified, a generic directory will be created.

2. Download PDFs from a JSON output

If you already have a JSON result from a previous scrape, use this to fetch the PDF files.

python3 main.py event dl <FILE.json> [OPTIONS]

Option	Description
`-o, --output-dir`	Output directory name. If it doesnt exists it will be created. Defaults to the same directory as the JSON file.

Examples

python3 main.py event dl example.json -o Directory

python3 main.py event scrape https://example.com -o Directory

Extract PDFs scores

A tool to extract score tables from synchro skating score PDFs using python. The extracted tables of scores are stored into json files, and can be completed by adding a yaml file to the parser. The parser also support other artistic disciplines.

Features:

Retrocompatible (Up to 2005)
Multiple discipline support
- base value bonus support
Deduction votes support
No call support

Installation

Requires :

Python 3.10+
Python dependencies :
- pandas (2.3.3)
- camelot-py (1.0.9)
- pdfplumber (0.11.9)
- PyYaml (6.0.3) (optional : download if you intend using YAML file to complete your output)

pip install pandas "camelot-py[base]" pdfplumber pyyaml

Usage

Use the following options to parse your pdf:

Options	Required	Descriptions
`-p`, `--pdf`	yes	PDF file path
`-y`, `--yaml`	no	YAML file path to complete the competition info
`-b`, `--begin`	yes	First page to parse
`-e`, `--end`	no	Last page to parse. If not specified only the first page entered will be parsed
`-o`, `--output`	no	Output directory. If it doesnt exists it will be created, if not specified a generical output directory will be created to put the jsons generated.

Usage :

python3 main.py [OPTIONS]

Add info to the jsons generated

With the a YAML file following this patern:

schema_version: 1
competition:
  name: ISU World Synchronized Skating Championships
  location:
    country: SWE
    city: Stockholm
  date: 2018-04-06
season: 2017-2018
source_url: example.org

None of the entries (except shema_version) or required when parsing. You can remove some of them if data is missing.

Futur Objectvies

Possibility to parse multiple pdfs at a time.
Connect the 2 modules

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

isu-score-parser

Event Data Scraper

Features

Installation

Usage

Extract PDFs scores

Installation

Usage

Add info to the jsons generated

Futur Objectvies

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

isu-score-parser

Event Data Scraper

Features

Installation

Usage

Extract PDFs scores

Installation

Usage

Add info to the jsons generated

Futur Objectvies

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages