Adaptive Speech Monitoring

Project submitted for the Practical Assignment # 1 - Speech in IPFL, in University of Aveiro.

Miguel Neto | NºMec 119302

Adaptive Speech Monitoring

Real-time keyword spotting + emotion classification for instruction-critical environments. Built for a driving school context: detects when an instructor delivers a command keyword and scores their emotional state relative to a neutral baseline.

Pipeline: mic -> sliding window -> Whisper KWS -> keyword hit -> emotion + speaker analysis -> score

Stack

Component	Model / Library
Keyword spotting	Whisper base (pt) + fuzzy match
Emotion classifier	wav2vec2-large-robust (audeering, MSP-Podcast)
Speaker deviation	w2v-bert-2.0 (Facebook, x-vectors)
Scratch classifier	MelCNN (3-layer CNN on log-mel, trained on EmoProsodyPort)
Backend	FastAPI + WebSocket (uvicorn)
Frontend	Plain HTML/JS

Configured score = 30% wav2vec2 emotion shift + 40% MelCNN emotion shift + 30% speaker deviation from neutral baseline.

Setup

pip install torch transformers fastapi uvicorn openai-whisper numpy scipy

ffmpeg must be on PATH (used by Whisper and infer.py).

Run

python server.py
# open http://localhost:8000

Training (optional)

Pretrained models download automatically on first run. To train the scratch MelCNN:

# prepare dataset (EmoProsodyPort audio has to be extracted respectively in sents/ and pseudosents/)
python prepare_emoprosodyport.py

# train from scratch
python train_scratch.py

# fine-tune wav2vec2 head on EmoProsodyPort
python train.py

# stage-2 fine-tune (unfreeze last N encoder layers)
python finetune.py

Checkpoints are saved to checkpoints/ and checkpoints_scratch/.

Scoring

Edit scoring.py to customize compute_score(). Default weights are in the file header.

Dataset

EmoProsodyPort - Castro & Lima (2010). 368 clips, 7 emotions, 2 native European Portuguese speakers.

Available at https://portulanclarin.net/repository/browse/emoprosodyport/994c2a9ab70711ea8dc202420a000403ae73f66c224b46419406f34218bb76e9/

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
angry.png		angry.png
audio.py		audio.py
calm.png		calm.png
config.py		config.py
finetune.py		finetune.py
frontend.html		frontend.html
infer.py		infer.py
model_scratch.py		model_scratch.py
models.py		models.py
prepare_emoprosodyport.py		prepare_emoprosodyport.py
presentation.pdf		presentation.pdf
scoring.py		scoring.py
server.py		server.py
train.py		train.py
train_scratch.py		train_scratch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Speech Monitoring

Stack

Setup

Run

Training (optional)

Scoring

Dataset

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Adaptive Speech Monitoring

Stack

Setup

Run

Training (optional)

Scoring

Dataset

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages