GitHub - Par-t/tennis-commentary

Tennis Audio Event Detection, Classification & Commentary

This repository contains an end-to-end pipeline that starts from raw tennis match audio and ends with structured rally transcriptions and automatically generated broadcast-style commentary. It combines classic signal processing, machine learning (SVM, XGBoost, Random Forest), transformer-based audio models, and large language models (LLMs) to understand and narrate tennis points.

Top-level structure

Data/
Processed datasets used across detection, classification, and rally-level modeling.
- Processed/Detection/: Windowed event–vs–no-event datasets built around audio peaks for multiple recording sessions. The file processed_data_description.json documents, for each CSV, the source audio, window size, positive/negative sampling strategy, and which time spans were held out as rally test segments.
- Processed/Classification/: Aggregated features for downstream classification tasks (e.g., shot type, spin vs slice, net events), as well as rally-level feature summaries.
- audio and annotations/: Intended location for raw audio files and manual annotations. This folder is empty in the repository because the original audio cannot be shared on GitHub; you must provide your own compatible recordings and annotations to fully reproduce the results.
Event Classification/
Models and notebooks for classifying what happened during a tennis shot, using mostly per-shot features.
- Spin / slice classification (Spin Slice Classification/Spin Slice Transformer Classifier.ipynb): Uses a Wav2Vec2-based transformer (and SVM baselines) to distinguish spin vs slice (binary) and, in extended versions, none/slice/spin. Includes preprocessing from raw audio, handling of class imbalance (weighting and sampling), and comparison to PCA + SVM pipelines.
- Net shot classification (Net_Shot_Classifier/net_shot_classifier.ipynb, xgb_net_shot_model.joblib): Binary classifier to determine whether a shot hit the net (net = 1) or not (net = 0) using MFCC and spectral features. Trains SVM and XGBoost models with cross-validation and standard evaluation (confusion matrices, ROC curves), and can batch-predict on unseen data.
- TDOA and distance estimation (TDOA_Calculator/tdoa_calculator.ipynb, distance_classifier.pkl, distance_scaler.pkl): Computes time-difference-of-arrival (TDOA) and other spatial audio features from stereo recordings to classify:
  - Side of court: left / right / center
  - Distance from net: deep / short / at net
    A Random Forest classifier is trained on features like ILD, RMS, and spectral centroid to estimate shot position relative to the net.
- Commentary prototype (LLM-based) (described in Event Classification/info.md): A notebook referred to as llm.ipynb generates template commentary from rally events (serves, shots, outcomes), rewrites it using an instruction-tuned LLM (e.g., Mistral-7B-Instruct) into natural broadcast-style text, and optionally applies text-to-speech (gTTS) to produce audio commentary clips.
Event Classification PT/
PyTorch/SVM experiments focused on side and serve detection.
- court_side.ipynb: Explores stereo channel waveforms and RMS values to validate simple heuristics for determining which side of the court a shot originates from. Demonstrates that alternating shots in a long rally can often be captured with per-channel RMS thresholds, and also shows cases where RMS alone is insufficient for event vs no-event separation.
- serve_detection.ipynb: Binary SVM classifier for detecting serve events. Compares different MFCC configurations and performs hyperparameter tuning via cross-validation. The Models/ folder contains the best-performing SVM checkpoints (optimized separately for recall and F1).
Event Detetction/
(Directory name preserved as-is.) Pipelines for event detection: deciding whether a given short audio window contains any tennis event (serve/shot) or just background.
- create_detection_dataset.ipynb: Builds labeled detection datasets from longer session recordings, centering windows around signal peaks and sampling negatives at a fixed ratio (typically 1:5 positives-to-negatives). Certain time intervals are excluded to preserve clean rally segments for testing; these are documented in processed_data_description.json.
- EDA.ipynb: Exploratory data analysis and visualization of audio segments used for detection, including waveform plots and feature distributions.
- binary_classification.ipynb: Trains an SVM-based binary classifier for event vs non-event detection using MFCC features. Performs MFCC configuration sweeps and cross-validated hyperparameter search. Serves are intentionally excluded here because serve detection is handled separately in Event Classification PT/serve_detection.ipynb. The Models/ folder stores best SVM models tuned for recall and F1 on the detection task.
Rally/
Rally-level processing and commentary generation.
- tennis_pipeline.ipynb: Implements a finite state machine (FSM) to track points and events across an input rally audio clip. It uses outputs from the detection and classification models (including optional net detection) to produce a structured JSON transcription of each rally: serves, shots, net events, outcomes, and timing.
  - Supports two operating modes: with net detection activated, or with it disabled.
  - The JSON outputs for sample rallies are stored in rallies_with_net.json and rallies_without_net.json.
- commentary_generation_pipeline.ipynb: Takes the JSON rally transcription (e.g., from tennis_pipeline.ipynb) plus the underlying test audio and generates natural-language commentary. It constructs a template description of the point and then uses an LLM (via OpenAI API) to create polished broadcast-style commentary, which can be overlaid on the rally audio. The notebook can also incorporate TTS to produce a finalized audio track.
- data_split.ipynb: Exploratory notebook for splitting rally data into train/test and performing initial EDA on rally configurations; primarily for internal experimentation (often marked as safe to ignore).
- flash_new.wav: Example rally audio used in presentation/demo materials.

Conceptual end-to-end flow

1. Raw audio & annotations (external to repo): Match and practice session recordings, plus manual event/rally annotations, live outside this repo. They conceptually reside in Data/audio and annotations/, which is empty here because the audio is not checked in.
2. Event detection datasets: Long sessions are windowed into short segments centered on peaks, with labels for event vs non-event and documented in Data/Processed/Detection/.
3. Event & attribute classifiers:
- Event detection SVMs (Event Detetction/) decide whether a window contains a tennis event.
- Serve detection (Event Classification PT/) and net / spin / slice / spatial classifiers (Event Classification/) specialize on particular event types and shot attributes.
4. Rally-level transcription: The FSM in Rally/tennis_pipeline.ipynb combines model outputs into a point-by-point JSON description of rallies, with timing and outcomes.
5. Automated commentary: LLM-based notebooks (Event Classification commentary prototype and Rally/commentary_generation_pipeline.ipynb) turn the structured rally JSON into human-like commentary text and, optionally, synthesized speech overlaid on the rally audio.

How to use this repository

Exploration and reproduction: Open the notebooks in Jupyter (or VS Code / Colab) to inspect the full pipelines, experiments, and evaluation plots. Most notebooks assume a Python environment with common scientific and audio libraries (e.g., NumPy, pandas, scikit-learn, librosa, PyTorch/Transformers, and an LLM/TTS client such as OpenAI or gTTS).
Working with your own data: Place your raw audio and annotations in a local mirror of Data/audio and annotations/, follow the dataset creation notebooks (Event Detetction/create_detection_dataset.ipynb, classification dataset builders), then retrain or fine-tune the models as needed before running the rally pipeline and commentary generators.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tennis Audio Event Detection, Classification & Commentary

Top-level structure

Conceptual end-to-end flow

How to use this repository

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Data		Data
Event Classification PT		Event Classification PT
Event Classification		Event Classification
Event Detetction		Event Detetction
Rally		Rally
.DS_Store		.DS_Store
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Tennis Audio Event Detection, Classification & Commentary

Top-level structure

Conceptual end-to-end flow

How to use this repository

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages