This repository contains an end-to-end pipeline that starts from raw tennis match audio and ends with structured rally transcriptions and automatically generated broadcast-style commentary. It combines classic signal processing, machine learning (SVM, XGBoost, Random Forest), transformer-based audio models, and large language models (LLMs) to understand and narrate tennis points.
-
Data/
Processed datasets used across detection, classification, and rally-level modeling.Processed/Detection/: Windowed event–vs–no-event datasets built around audio peaks for multiple recording sessions. The fileprocessed_data_description.jsondocuments, for each CSV, the source audio, window size, positive/negative sampling strategy, and which time spans were held out as rally test segments.Processed/Classification/: Aggregated features for downstream classification tasks (e.g., shot type, spin vs slice, net events), as well as rally-level feature summaries.audio and annotations/: Intended location for raw audio files and manual annotations. This folder is empty in the repository because the original audio cannot be shared on GitHub; you must provide your own compatible recordings and annotations to fully reproduce the results.
-
Event Classification/
Models and notebooks for classifying what happened during a tennis shot, using mostly per-shot features.- Spin / slice classification (
Spin Slice Classification/Spin Slice Transformer Classifier.ipynb): Uses a Wav2Vec2-based transformer (and SVM baselines) to distinguish spin vs slice (binary) and, in extended versions, none/slice/spin. Includes preprocessing from raw audio, handling of class imbalance (weighting and sampling), and comparison to PCA + SVM pipelines. - Net shot classification (
Net_Shot_Classifier/net_shot_classifier.ipynb,xgb_net_shot_model.joblib): Binary classifier to determine whether a shot hit the net (net = 1) or not (net = 0) using MFCC and spectral features. Trains SVM and XGBoost models with cross-validation and standard evaluation (confusion matrices, ROC curves), and can batch-predict on unseen data. - TDOA and distance estimation (
TDOA_Calculator/tdoa_calculator.ipynb,distance_classifier.pkl,distance_scaler.pkl): Computes time-difference-of-arrival (TDOA) and other spatial audio features from stereo recordings to classify:- Side of court: left / right / center
- Distance from net: deep / short / at net
A Random Forest classifier is trained on features like ILD, RMS, and spectral centroid to estimate shot position relative to the net.
- Commentary prototype (LLM-based) (described in
Event Classification/info.md): A notebook referred to asllm.ipynbgenerates template commentary from rally events (serves, shots, outcomes), rewrites it using an instruction-tuned LLM (e.g., Mistral-7B-Instruct) into natural broadcast-style text, and optionally applies text-to-speech (gTTS) to produce audio commentary clips.
- Spin / slice classification (
-
Event Classification PT/
PyTorch/SVM experiments focused on side and serve detection.court_side.ipynb: Explores stereo channel waveforms and RMS values to validate simple heuristics for determining which side of the court a shot originates from. Demonstrates that alternating shots in a long rally can often be captured with per-channel RMS thresholds, and also shows cases where RMS alone is insufficient for event vs no-event separation.serve_detection.ipynb: Binary SVM classifier for detecting serve events. Compares different MFCC configurations and performs hyperparameter tuning via cross-validation. TheModels/folder contains the best-performing SVM checkpoints (optimized separately for recall and F1).
-
Event Detetction/
(Directory name preserved as-is.) Pipelines for event detection: deciding whether a given short audio window contains any tennis event (serve/shot) or just background.create_detection_dataset.ipynb: Builds labeled detection datasets from longer session recordings, centering windows around signal peaks and sampling negatives at a fixed ratio (typically 1:5 positives-to-negatives). Certain time intervals are excluded to preserve clean rally segments for testing; these are documented inprocessed_data_description.json.EDA.ipynb: Exploratory data analysis and visualization of audio segments used for detection, including waveform plots and feature distributions.binary_classification.ipynb: Trains an SVM-based binary classifier for event vs non-event detection using MFCC features. Performs MFCC configuration sweeps and cross-validated hyperparameter search. Serves are intentionally excluded here because serve detection is handled separately inEvent Classification PT/serve_detection.ipynb. TheModels/folder stores best SVM models tuned for recall and F1 on the detection task.
-
Rally/
Rally-level processing and commentary generation.tennis_pipeline.ipynb: Implements a finite state machine (FSM) to track points and events across an input rally audio clip. It uses outputs from the detection and classification models (including optional net detection) to produce a structured JSON transcription of each rally: serves, shots, net events, outcomes, and timing.- Supports two operating modes: with net detection activated, or with it disabled.
- The JSON outputs for sample rallies are stored in
rallies_with_net.jsonandrallies_without_net.json.
commentary_generation_pipeline.ipynb: Takes the JSON rally transcription (e.g., fromtennis_pipeline.ipynb) plus the underlying test audio and generates natural-language commentary. It constructs a template description of the point and then uses an LLM (via OpenAI API) to create polished broadcast-style commentary, which can be overlaid on the rally audio. The notebook can also incorporate TTS to produce a finalized audio track.data_split.ipynb: Exploratory notebook for splitting rally data into train/test and performing initial EDA on rally configurations; primarily for internal experimentation (often marked as safe to ignore).flash_new.wav: Example rally audio used in presentation/demo materials.
- 1. Raw audio & annotations (external to repo): Match and practice session recordings, plus manual event/rally annotations, live outside this repo. They conceptually reside in
Data/audio and annotations/, which is empty here because the audio is not checked in. - 2. Event detection datasets: Long sessions are windowed into short segments centered on peaks, with labels for event vs non-event and documented in
Data/Processed/Detection/. - 3. Event & attribute classifiers:
- Event detection SVMs (
Event Detetction/) decide whether a window contains a tennis event. - Serve detection (
Event Classification PT/) and net / spin / slice / spatial classifiers (Event Classification/) specialize on particular event types and shot attributes.
- Event detection SVMs (
- 4. Rally-level transcription: The FSM in
Rally/tennis_pipeline.ipynbcombines model outputs into a point-by-point JSON description of rallies, with timing and outcomes. - 5. Automated commentary: LLM-based notebooks (
Event Classificationcommentary prototype andRally/commentary_generation_pipeline.ipynb) turn the structured rally JSON into human-like commentary text and, optionally, synthesized speech overlaid on the rally audio.
- Exploration and reproduction: Open the notebooks in Jupyter (or VS Code / Colab) to inspect the full pipelines, experiments, and evaluation plots. Most notebooks assume a Python environment with common scientific and audio libraries (e.g., NumPy, pandas, scikit-learn, librosa, PyTorch/Transformers, and an LLM/TTS client such as OpenAI or gTTS).
- Working with your own data: Place your raw audio and annotations in a local mirror of
Data/audio and annotations/, follow the dataset creation notebooks (Event Detetction/create_detection_dataset.ipynb, classification dataset builders), then retrain or fine-tune the models as needed before running the rally pipeline and commentary generators.