Skip to content

Par-t/tennis-commentary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tennis Audio Event Detection, Classification & Commentary

This repository contains an end-to-end pipeline that starts from raw tennis match audio and ends with structured rally transcriptions and automatically generated broadcast-style commentary. It combines classic signal processing, machine learning (SVM, XGBoost, Random Forest), transformer-based audio models, and large language models (LLMs) to understand and narrate tennis points.

Top-level structure

  • Data/
    Processed datasets used across detection, classification, and rally-level modeling.

    • Processed/Detection/: Windowed event–vs–no-event datasets built around audio peaks for multiple recording sessions. The file processed_data_description.json documents, for each CSV, the source audio, window size, positive/negative sampling strategy, and which time spans were held out as rally test segments.
    • Processed/Classification/: Aggregated features for downstream classification tasks (e.g., shot type, spin vs slice, net events), as well as rally-level feature summaries.
    • audio and annotations/: Intended location for raw audio files and manual annotations. This folder is empty in the repository because the original audio cannot be shared on GitHub; you must provide your own compatible recordings and annotations to fully reproduce the results.
  • Event Classification/
    Models and notebooks for classifying what happened during a tennis shot, using mostly per-shot features.

    • Spin / slice classification (Spin Slice Classification/Spin Slice Transformer Classifier.ipynb): Uses a Wav2Vec2-based transformer (and SVM baselines) to distinguish spin vs slice (binary) and, in extended versions, none/slice/spin. Includes preprocessing from raw audio, handling of class imbalance (weighting and sampling), and comparison to PCA + SVM pipelines.
    • Net shot classification (Net_Shot_Classifier/net_shot_classifier.ipynb, xgb_net_shot_model.joblib): Binary classifier to determine whether a shot hit the net (net = 1) or not (net = 0) using MFCC and spectral features. Trains SVM and XGBoost models with cross-validation and standard evaluation (confusion matrices, ROC curves), and can batch-predict on unseen data.
    • TDOA and distance estimation (TDOA_Calculator/tdoa_calculator.ipynb, distance_classifier.pkl, distance_scaler.pkl): Computes time-difference-of-arrival (TDOA) and other spatial audio features from stereo recordings to classify:
      • Side of court: left / right / center
      • Distance from net: deep / short / at net
        A Random Forest classifier is trained on features like ILD, RMS, and spectral centroid to estimate shot position relative to the net.
    • Commentary prototype (LLM-based) (described in Event Classification/info.md): A notebook referred to as llm.ipynb generates template commentary from rally events (serves, shots, outcomes), rewrites it using an instruction-tuned LLM (e.g., Mistral-7B-Instruct) into natural broadcast-style text, and optionally applies text-to-speech (gTTS) to produce audio commentary clips.
  • Event Classification PT/
    PyTorch/SVM experiments focused on side and serve detection.

    • court_side.ipynb: Explores stereo channel waveforms and RMS values to validate simple heuristics for determining which side of the court a shot originates from. Demonstrates that alternating shots in a long rally can often be captured with per-channel RMS thresholds, and also shows cases where RMS alone is insufficient for event vs no-event separation.
    • serve_detection.ipynb: Binary SVM classifier for detecting serve events. Compares different MFCC configurations and performs hyperparameter tuning via cross-validation. The Models/ folder contains the best-performing SVM checkpoints (optimized separately for recall and F1).
  • Event Detetction/
    (Directory name preserved as-is.) Pipelines for event detection: deciding whether a given short audio window contains any tennis event (serve/shot) or just background.

    • create_detection_dataset.ipynb: Builds labeled detection datasets from longer session recordings, centering windows around signal peaks and sampling negatives at a fixed ratio (typically 1:5 positives-to-negatives). Certain time intervals are excluded to preserve clean rally segments for testing; these are documented in processed_data_description.json.
    • EDA.ipynb: Exploratory data analysis and visualization of audio segments used for detection, including waveform plots and feature distributions.
    • binary_classification.ipynb: Trains an SVM-based binary classifier for event vs non-event detection using MFCC features. Performs MFCC configuration sweeps and cross-validated hyperparameter search. Serves are intentionally excluded here because serve detection is handled separately in Event Classification PT/serve_detection.ipynb. The Models/ folder stores best SVM models tuned for recall and F1 on the detection task.
  • Rally/
    Rally-level processing and commentary generation.

    • tennis_pipeline.ipynb: Implements a finite state machine (FSM) to track points and events across an input rally audio clip. It uses outputs from the detection and classification models (including optional net detection) to produce a structured JSON transcription of each rally: serves, shots, net events, outcomes, and timing.
      • Supports two operating modes: with net detection activated, or with it disabled.
      • The JSON outputs for sample rallies are stored in rallies_with_net.json and rallies_without_net.json.
    • commentary_generation_pipeline.ipynb: Takes the JSON rally transcription (e.g., from tennis_pipeline.ipynb) plus the underlying test audio and generates natural-language commentary. It constructs a template description of the point and then uses an LLM (via OpenAI API) to create polished broadcast-style commentary, which can be overlaid on the rally audio. The notebook can also incorporate TTS to produce a finalized audio track.
    • data_split.ipynb: Exploratory notebook for splitting rally data into train/test and performing initial EDA on rally configurations; primarily for internal experimentation (often marked as safe to ignore).
    • flash_new.wav: Example rally audio used in presentation/demo materials.

Conceptual end-to-end flow

  • 1. Raw audio & annotations (external to repo): Match and practice session recordings, plus manual event/rally annotations, live outside this repo. They conceptually reside in Data/audio and annotations/, which is empty here because the audio is not checked in.
  • 2. Event detection datasets: Long sessions are windowed into short segments centered on peaks, with labels for event vs non-event and documented in Data/Processed/Detection/.
  • 3. Event & attribute classifiers:
    • Event detection SVMs (Event Detetction/) decide whether a window contains a tennis event.
    • Serve detection (Event Classification PT/) and net / spin / slice / spatial classifiers (Event Classification/) specialize on particular event types and shot attributes.
  • 4. Rally-level transcription: The FSM in Rally/tennis_pipeline.ipynb combines model outputs into a point-by-point JSON description of rallies, with timing and outcomes.
  • 5. Automated commentary: LLM-based notebooks (Event Classification commentary prototype and Rally/commentary_generation_pipeline.ipynb) turn the structured rally JSON into human-like commentary text and, optionally, synthesized speech overlaid on the rally audio.

How to use this repository

  • Exploration and reproduction: Open the notebooks in Jupyter (or VS Code / Colab) to inspect the full pipelines, experiments, and evaluation plots. Most notebooks assume a Python environment with common scientific and audio libraries (e.g., NumPy, pandas, scikit-learn, librosa, PyTorch/Transformers, and an LLM/TTS client such as OpenAI or gTTS).
  • Working with your own data: Place your raw audio and annotations in a local mirror of Data/audio and annotations/, follow the dataset creation notebooks (Event Detetction/create_detection_dataset.ipynb, classification dataset builders), then retrain or fine-tune the models as needed before running the rally pipeline and commentary generators.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors