Skip to content

MOHD-OMER/emotion-echo

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python FastAPI Gemini PyTorch HuggingFace

Beyond emotion detection — an agentic AI system that listens, understands, and responds like a companion.


🧠 What is EmotionEcho?

Most emotion AI systems detect and stop. EmotionEcho detects and responds.

It reads your emotional state in real time through your face and voice, fuses both signals through a trained multimodal MLP, then engages you in a genuine empathetic conversation via a Gemini-powered agent — adapting its tone and strategy dynamically until your emotional state actually improves.

If you're sad → it comforts. If you're anxious → it grounds you. If you're stressed → it reframes. If you're happy → it celebrates with you.

This is not a chatbot with emotion labels. This is a closed-loop agentic system where perception, reasoning, and response work together continuously.


🏗️ System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      EmotionEcho System                          │
│                                                                 │
│  ┌──────────────────┐         ┌──────────────────┐             │
│  │   Webcam Input   │         │  Microphone Input │             │
│  └────────┬─────────┘         └────────┬──────────┘             │
│           │                            │                         │
│           ▼                            ▼                         │
│  ┌──────────────────┐         ┌──────────────────┐             │
│  │  Face Emotion    │         │  Audio Emotion    │             │
│  │  ViT Model       │         │  Wav2Vec2 Model   │             │
│  │  (google/vit-    │         │  (facebook/       │             │
│  │  base-patch16)   │         │  wav2vec2-base)   │             │
│  │                  │         │                   │             │
│  │  Trained on      │         │  Trained on       │             │
│  │  FER2013         │         │  RAVDESS          │             │
│  │  7 emotions      │         │  8 emotions       │             │
│  │  → probs [7]     │         │  → probs [7]      │             │
│  └────────┬─────────┘         └────────┬──────────┘             │
│           │                            │                         │
│           └──────────────┬─────────────┘                         │
│                          ▼                                       │
│              ┌───────────────────────┐                           │
│              │   Fusion MLP Layer    │                           │
│              │  Input: [7] + [7] = 14│                           │
│              │  Hidden: 64 → 32      │                           │
│              │  Output: 7 emotions   │                           │
│              │  + confidence score   │                           │
│              └───────────┬───────────┘                           │
│                          │                                       │
│                          ▼                                       │
│              ┌───────────────────────┐                           │
│              │    Gemini Agent       │                           │
│              │  Conversational       │                           │
│              │  Reasoning Backbone   │                           │
│              │  + Session Memory     │                           │
│              │  + Strategy Adaptation│                           │
│              └───────────┬───────────┘                           │
│                          │                                       │
│                          ▼                                       │
│              ┌───────────────────────┐                           │
│              │   FastAPI Backend     │                           │
│              │   REST Endpoints      │                           │
│              │   Real-time inference │                           │
│              └───────────────────────┘                           │
└─────────────────────────────────────────────────────────────────┘

✨ Key Features

Feature Description
🎭 Vision Transformer ViT fine-tuned on FER2013 — 7 emotion classes, ~35K images
🎙️ Wav2Vec2 Audio Speech emotion recognition fine-tuned on RAVDESS — 8 emotional states
🔀 Fusion MLP Custom neural network combining face + audio probabilities
🤖 Gemini Agent Conversational backbone with session memory and dynamic strategy
🔄 Closed-Loop Re-detects emotion after each response, adjusts accordingly
🛡️ Responsible AI Signal validation before triggering responses
Low Latency Sub-300ms end-to-end response time

📦 Datasets

Modality Dataset Size Format Emotions
Face FER2013 ~35,000 images 48×48 grayscale 7
Audio RAVDESS 1,440 files · 24 actors .wav 8
Face  → FER2013  : angry · disgust · fear · happy · sad · surprise · neutral
Audio → RAVDESS  : neutral · calm · happy · sad · angry · fearful · disgust · surprised
Fused → 7 unified emotion classes via FusionMLP

📊 Results

Emotion Classification Accuracy : 87%
Emotional States Detected        : 7
  → Angry · Disgust · Fear · Happy · Sad · Surprise · Neutral
End-to-End Response Latency     : < 300ms
Fusion Architecture             : FusionMLP (14 → 64 → 32 → 7)
Face Model trained on           : FER2013 (~35,000 images)
Audio Model trained on          : RAVDESS (1,440 files, 24 actors)

🛠️ Tech Stack

Layer Technology Detail
Face Model ViT (google/vit-base-patch16-224) Fine-tuned on FER2013, 7-class
Audio Model Wav2Vec2 (facebook/wav2vec2-base) Fine-tuned on RAVDESS, 8-class
Fusion Custom FusionMLP (PyTorch) Concatenates face + audio probs
Agent Gemini API Conversational reasoning + memory
Backend FastAPI REST inference endpoints
ML Libs HuggingFace Transformers, PyTorch Model loading + inference
Datasets FER2013 + RAVDESS Face (35K imgs) + Audio (1440 files)
Environment Python 3.9+, Conda Dependency management

📁 Project Structure

emotion-echo/
│
├── backend/                  # FastAPI application
├── frontend/                 # User interface
├── ml/
│   ├── face_model/           # ViT fine-tuned on FER2013
│   ├── audio_model/          # Wav2Vec2 fine-tuned on RAVDESS
│   └── fusion_model/
│       └── fusion_model.py   # FusionMLP — combines both modalities
├── deployment/               # Deployment configs
├── docs/                     # Documentation
├── scripts/
│   ├── train_face.py         # ViT training on FER2013
│   ├── train_audio_simple.py # Wav2Vec2 training on RAVDESS
│   ├── train_audio_continue.py
│   └── trai

About

Real-time multimodal AI companion — detects emotion via face + voice, responds conversationally until your mood improves | Gemini · FastAPI · OpenCV · Librosa

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%