WhoSays

WhoSays is a real-time pipeline for multi-speaker diarization and transcription that answers three practical questions: who spoke, when, and what did they say?

It combines voice activity detection (VAD), speaker change/overlap handling, speaker embedding, clustering and recognition, and automatic speech recognition (ASR) into a streaming workflow. This project is primarily optimized for speaker recognition quality; transcription accuracy was a secondary goal.

The following diagram illustrates the complete pipeline flow, showing how audio input is processed through speaker segmentation, diarization, and ASR components to produce the final transcribed output with speaker labels:

For further insight, see the demo or the final presentation.

Note:
The design of the pipeline is the key point of this project. The code quality is not on par with the intended design as the project was finished prematurely.

Setup, Build, and Run

Create a .env file at the root of the repo containing:

HF_TOKEN=yourToken

You can create an HF token with read access on HuggingFace and must also accept the model terms for:

Once the token is set in .env, add execution rights to run_docker.sh and run it:

chmod +x run_docker.sh
./run_docker.sh

This builds a Docker image containing the pipeline, backend, and frontend.

Once it’s up, you can open the app at localhost:8000.

Note:
To get Docker to run you need to have Docker Desktop or simply the Docker process running in the background. Furthermore, since everything runs locally, low-end hardware will cause the build process to take a long time, and the performance of the app to be highly unreliable. Therefore, it is recommended to use a modern graphics card if available.

To stop the container:

docker stop whosays-container

To follow logs:

docker logs -f whosays-container

Compare Pipeline Models

Benchmark/analysis CLIs live under scripts/.

Host: python -m scripts.compare --help
Container: docker exec -it whosays-container python -m scripts.compare --help

Compare Models with Benchmark Datasets

See python -m scripts.compare --help for all options and components. A typical end-to-end run looks like:

# Base comparison (WhoSays + Pyannote 3.1)
python -m scripts.compare --component e2e \
    --audio-dir data/benchmark/chunks \
    --annotation-dir data/benchmark/annotations \
    --language english

# Include WhisperX - runs in separate environment component because of conflicting 
# dependencies with the main pipeline (only english version of whisperX is used for now)
python -m scripts.compare --component e2e \
    --audio-dir data/benchmark/chunks \
    --annotation-dir data/benchmark/annotations \
    --language english \
    --include-whisperx

By default, comparison JSON files and plots are written under results/. This folder is intentionally gitignored and won’t exist in a fresh clone until you run a comparison. You can alternatively choose your own directory with --output-dir.

Regenerate E2E Plots From Existing JSON Results:

If you've made changes to plot styling or want to regenerate plots without re-running the entire comparison (which can take time), you can use the plot regeneration script:

python -m scripts.e2e_plot_result_from_json \
    --json-file "results/comparison/english/e2e_comparison_*.json"

Name		Name	Last commit message	Last commit date
Latest commit History 237 Commits
backend		backend
data		data
docs		docs
frontend		frontend
pipeline		pipeline
scripts		scripts
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt
run_docker.sh		run_docker.sh
tuning_presets.json		tuning_presets.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhoSays

Setup, Build, and Run

Compare Pipeline Models

Compare Models with Benchmark Datasets

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WhoSays

Setup, Build, and Run

Compare Pipeline Models

Compare Models with Benchmark Datasets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages