EmoSense

Speech Emotion Recognition using Machine Learning and Deep Learning

EmoSense is a comprehensive speech emotion recognition system that explores both traditional machine learning and deep learning approaches to classify emotions from audio recordings. The project demonstrates the strengths, limitations, and trade-offs of different modeling techniques for this challenging audio classification task.

Overview

Emotion recognition from speech is a complex problem that sits at the intersection of audio signal processing, machine learning, and human psychology. EmoSense implements and compares two distinct approaches to tackle this challenge:

Multi-Layer Perceptron (MLP) - A traditional neural network with handcrafted features
Convolutional Neural Network (CNN) - A deep learning approach with automatic feature learning

Dataset

The project uses the RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset, which contains emotional speech recordings from professional actors.

Dataset Characteristics:

24 professional actors (12 male, 12 female)
8 emotions: neutral, calm, happy, sad, angry, fearful, disgust, surprised
Multiple intensities and statement variations
High-quality audio recordings

Approaches

1. Multi-Layer Perceptron (MLP)

Methodology:

Extracts handcrafted audio features: MFCC, Chroma, and Mel Spectrogram
Averages features over time to create fixed-size feature vectors
Trains a shallow neural network (300 hidden units) using scikit-learn
Focuses on 4 emotions: calm, happy, fearful, disgust

Key Findings:

Achieves moderate accuracy with interpretable features
Performs well on low-arousal emotions (calm)
Struggles with high-arousal emotion discrimination (disgust, fearful, happy)
Feature overlap between similar emotions limits performance
Lightweight and fast to train

Performance Insights:

Strong performance on acoustically distinct emotions
Confusion between emotions with similar arousal levels
Model confidence correlates with prediction accuracy
Feature distributions reveal fundamental classification challenges

2. Convolutional Neural Network (CNN)

Methodology:

Preserves temporal structure of MFCC features (13 coefficients, 3-second segments)
Uses 1D convolutional layers to automatically learn temporal patterns
Multi-layer architecture: Conv1D → Dropout → MaxPooling → Conv1D → Dense
Trained on all 8 RAVDESS emotions with RMSprop optimizer

Key Findings:

Demonstrates the data requirements of deep learning approaches
Severe overfitting with limited training samples (~800 total)
Training accuracy reaches 60% while validation plateaus at 35%
Architecture capable of learning but constrained by dataset size
Requires data augmentation or transfer learning for production use

Performance Insights:

CNNs need substantially more data than traditional ML (thousands vs hundreds of samples)
Small batch sizes and conservative learning rates hinder convergence
Low recall across all classes indicates feature learning failure
Minimal confidence separation between correct/incorrect predictions
Dataset insufficient for training CNN from scratch

Installation

# Clone the repository
git clone https://github.com/yourusername/emosense.git
cd emosense

# Install dependencies
pip install -r requirements.txt

# Download RAVDESS dataset
# Place audio files in samples/Actor_*/ directories

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
static		static
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EmoSense

Overview

Dataset

Approaches

1. Multi-Layer Perceptron (MLP)

2. Convolutional Neural Network (CNN)

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EmoSense

Overview

Dataset

Approaches

1. Multi-Layer Perceptron (MLP)

2. Convolutional Neural Network (CNN)

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages