Skip to content

noise-lab/vcaml

Repository files navigation

vcaml

CI

An end-to-end pipeline for estimating QoE metrics (frames/sec, bitrate, jitter, resolution) for WebRTC-based video conferencing without using application-layer headers. Published at IMC 2023.

Architecture

flowchart TD
    subgraph Collect["Data Collection"]
        A([VCA Session\nMeet · Teams · Webex]) -->|"tcpdump / tshark"| P[".pcap files"]
        A -->|"chrome://webrtc-internals"| C["WebRTC dump (.json)"]
        P -->|pcap2csv| B["Network trace (.csv)"]
    end

    subgraph Prepare["File Preparation"]
        B --> D["FileProcessor\nLinks CSV ↔ JSON pairs"]
        C --> D
        D --> E["FileValidator\nFilters anomalous traces"]
        E --> F["KfoldCVOverFiles\n5-fold CV splits"]
    end

    subgraph Train["Training & Evaluation"]
        F --> G["ModelRunner"]
        G --> H["FeatureExtractor\nLSTATS · TSTATS · SIZE · IAT"]
        C --> N["WebRTCReader\nGround truth labels"]
        H --> I{"Estimation\nMethod"}
        N --> I
        I -->|ip-udp-ml| J["IP_UDP_ML\nRandom Forest"]
        I -->|rtp-ml| K["RTP_ML\nRandom Forest + RTP features"]
        I -->|ip-udp-heuristic| L["IP_UDP_Heuristic\nFrame grouping"]
        I -->|rtp-heuristic| M["RTP_Heuristic\nRTP timestamp grouping"]
        J & K & L & M --> O["MLflow run\nmetrics · models · predictions"]
    end

    subgraph Downstream["Downstream Use Cases"]
        O --> Q["Resource Allocation\nAdaptive bitrate · bandwidth management"]
        O --> R["Traffic Engineering\nQoE-aware routing · prioritisation"]
        O --> S["Network Monitoring\nPassive QoE inference at scale"]
    end
Loading

Supported Configurations

Metrics

Metric Description Unit Heuristic ML
framesReceivedPerSecond Inbound video frame rate frames/sec
bitrate Inbound video bitrate bits/sec
frame_jitter Inter-frame delay standard deviation ms
frameHeight Inbound video resolution height px

FPS metrics use ±2 frames/sec tolerance accuracy in addition to MAE. frameHeight uses classification accuracy.

Estimation Methods

Method Uses RTP headers frameHeight Description
ip-udp-ml No Random Forest over IP/UDP packet features
rtp-ml Yes Random Forest over IP/UDP + RTP-specific features
ip-udp-heuristic No Groups packets into frames by size similarity
rtp-heuristic Yes Groups packets into frames by RTP timestamp

Feature Subsets (ML methods)

Subset Description
LSTATS Packet length statistics per window: mean, std, min, max, Q1/Q2/Q3, count, total bytes, unique sizes
TSTATS Inter-arrival time statistics per window: mean, std, min, max, Q1/Q2/Q3, burst count
SIZE Raw per-packet sizes padded/truncated to a fixed-length vector
IAT Raw inter-arrival times padded/truncated to a fixed-length vector

rtp-ml additionally extracts RTP-specific features per window: buffer time statistics, unique RTP timestamps, out-of-order sequence number count, RTP lag statistics.

Supported Platforms and Datasets

VCA In-lab Real-world
Google Meet
Microsoft Teams
Webex

Prerequisites

Tool Purpose Install
uv Python dependency management curl -LsSf https://astral.sh/uv/install.sh | sh
tshark PCAP → CSV conversion brew install wireshark (macOS) · apt install tshark (Debian/Ubuntu)

tshark must be on PATH before running pcap2csv. If you are working from pre-converted CSVs, tshark is not required.

1. Download Datasets

Use the download script (requires gdown, included in dependencies):

make download-data                               # download both datasets to data_root from config.yaml
make download-data DATAROOT=data                 # download to ./data/ instead
make download-data ARGS='--dataset in_lab_data'  # download only one dataset

Datasets are extracted automatically to <DATAROOT>/:

<DATAROOT>/
├── in_lab_data/
└── real_world_data/

Alternatively, download manually from Google Drive: In-Lab · Real World

2. Install Dependencies

make install   # or: uv sync

PCAP → CSV conversion requires tshark to be on PATH. See src/vcaml/io/pcap2csv.py. For data collection dependencies, see src/data_collection/real-world/README.md.

3. Configure

Edit config.yaml in the project root. The most commonly changed fields are:

data_root: /data/taveesh/vca   # root for datasets; used by make download-data and make train
mlflow_db: mlruns.db           # SQLite file for MLflow tracking (resolved relative to data_root)

training:
  metrics: [framesReceivedPerSecond, bitrate, frame_jitter, frameHeight]
  estimation_methods: [ip-udp-heuristic, rtp-heuristic, ip-udp-ml, rtp-ml]
  feature_subsets: [[LSTATS, TSTATS]]
  k_folds: 5

data_root defaults to /data/taveesh/vca — change it to wherever you placed the datasets (or override at the command line with make train DATASET=<path>).

4. Train and Evaluate Models

# In-lab dataset (default)
make train

# Real-world dataset
make train-rw

# Custom dataset path
make train DATASET=data/my_dataset

# Restrict to specific metrics or methods
make train ARGS='--metrics framesReceivedPerSecond --methods ip-udp-ml rtp-ml'

Progress and per-experiment results (MAE, accuracy) are printed to the terminal. Each run is tracked in MLflow (<data_root>/mlruns.db, SQLite) under an experiment named after the dataset (e.g. in_lab_data). Runs are structured as:

  • Parent run per (metric, method, featureSubset) — aggregated mean MAE/accuracy across folds
  • Child run per CV fold — per-fold MAE/accuracy, model pickle, predictions, and feature importances

Launch the UI with:

make mlflow-ui   # then open http://localhost:5000

5. Analyze Results

Open and run the notebooks in notebooks/ (In_Lab_Analysis, Real_World_Analysis, Sensitivity_Analysis). Each notebook loads results from the local MLflow store via vcaml.io.mlflow_loader.load_results, passing the experiment name (e.g. in_lab_data) to retrieve predictions and feature importances. Artifacts are cached under ~/.cache/vcaml/mlflow_artifacts/ so repeated notebook runs do not re-fetch from MLflow.

6. Collect Additional Data

Refer to In-Lab Data Collection and Real-World Data Collection for more details.

7. Cite

@inproceedings{10.1145/3618257.3624828,
    author = {Sharma, Taveesh and Mangla, Tarun and Gupta, Arpit and Jiang, Junchen and Feamster, Nick},
    title = {Estimating WebRTC Video QoE Metrics Without Using Application Headers},
    year = {2023},
    publisher = {Association for Computing Machinery},
    doi = {10.1145/3618257.3624828},
    booktitle = {Proceedings of the 2023 ACM Internet Measurement Conference},
    series = {IMC '23}
}

About

Estimating WebRTC Video QoE Metrics Without Using Application Headers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors