Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
c9847b3
nah need to rewrite later ugh
kwon-encored Sep 4, 2025
59915ab
new datasaet on onigiri code
kwon-encored Sep 4, 2025
4beb923
new modle for onigiri
kwon-encored Sep 4, 2025
7df0073
new module to data process the new onigiri
kwon-encored Sep 4, 2025
ecdab55
to fit the new value
kwon-encored Sep 4, 2025
0ebf631
predict and implement the result
kwon-encored Sep 4, 2025
74e09dd
wrote github copilot instruction by markdown
kwon-encored Sep 4, 2025
e85791b
Update src/utils/separate_date_articulator_that_is_new.py
kwon-encored Sep 4, 2025
982e7ae
unnecessary comment
kwon-encored Sep 4, 2025
4a27990
data type
kwon-encored Sep 4, 2025
5949ee2
list, dict to numpy
kwon-encored Sep 4, 2025
ee822ab
Merge remote-tracking branch 'origin/copilot-002' into copilot-002
kwon-encored Sep 4, 2025
a593cb8
emotional label
kwon-encored Sep 4, 2025
2fa75ec
functional change
kwon-encored Sep 4, 2025
c0753bc
onigiri --- dataload
kwon-encored Sep 4, 2025
40d98f1
inputshape debug
kwon-encored Sep 4, 2025
8d590ed
commentary
kwon-encored Sep 4, 2025
58e7cb6
Merge remote-tracking branch 'origin/copilot-002' into copilot-002
kwon-encored Sep 4, 2025
98f1ebd
label
kwon-encored Sep 5, 2025
6dcedfc
changled label
kwon-encored Sep 5, 2025
e66d90b
readme bro
kwon-encored Sep 5, 2025
0360362
fixed error
kwon-encored Sep 5, 2025
39e3c48
on using ML parameters
kwon-encored Sep 5, 2025
359f3d0
about leadtim 9000
kwon-encored Sep 5, 2025
c424253
index erroring
kwon-encored Sep 5, 2025
a64247d
csv file
kwon-encored Sep 5, 2025
055fec9
int to str
kwon-encored Sep 5, 2025
b65891c
instruction for GitHub CoPilot
kwon-encored Sep 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
description: 'Python coding conventions and guidelines'
applyTo: '**/*.py'
---
- When performing a code review, ensure that variable and function names use snake_case, and class names use CamelCase, following PEP 8 style guidelines.
- When reviewing functions, check if loops or conditionals can be simplified with built-in or vectorized methods (e.g., numpy, pandas, datetime, itertools) while preserving clarity and behavior.
- When reviewing a function, check that its name is appropriate and corresponds to and clearly describes its purpose.
- When reviewing a function, check that its name clearly describes its purpose and that variable names are appropriate and descriptive.
86 changes: 30 additions & 56 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,68 +1,42 @@
# Multi-Modal-Image-Sentiment-Analysis
Final Year Project

Python version used : 3.6.0
## Overview
This PR introduces a new **multidimensional 3D CNN model** within the Onigiri project.
The model leverages a large-scale dataset (~18TB) capturing regressional relationships between **mood, emotion, and facial expressions**, along with **gender attributes**.

# To perform Sentiment Analysis of Text present in Image.
> python3 OCRSentiment.py
# Face classification and detection.
Real-time face detection and emotion/gender classification using fer2013/IMDB datasets with a keras CNN model and openCV.
* IMDB gender classification test accuracy: 96%.
* fer2013 emotion classification test accuracy: 66%.
The goal is to extend the multimodal project by enabling **mood determination** from image and face data, integrated with contextual metadata.

---

### Run real-time emotion demo:
> python3 video_emotion_color_demo.py
## Key Features
- **New Data Integration**
- Added ~18TB of mass data on mood, emotion, and facial expression alongside gender.
- Preprocessing pipeline supports sequence-based image and embedding fusion.

### Make inference on single images:
> python3 image_emotion_gender_demo.py <image_path>
- **3D CNN Model Implementation**
- Input: `data_input` (sequence of facial image tensors).
- Auxiliary Input: `site_id_input` for contextual weather embedding.
- Weather embedding reshaped into a **weather map** and concatenated as an additional channel.
- Temporal-spatial Conv3D layers with ELU activations.
- Dense fully connected layers leading to mood prediction outputs.

e.g.
- **Output**
- Predicts **mood state** given image and contextual inputs.
- Designed to integrate seamlessly with existing multimodal architecture.

> python3 image_emotion_gender_demo.py ../images/test_image.jpg
---

### Steps to run the final application UI.exe
Steps to run project:-
Step 1:- Download project from https://github.com/AnkurKarmakar/Multi-Modal-Image-Sentiment-Analysis
Extract the zip folder and place the entire project folder in any drive except C drive.
## Motivation
This implementation expands Onigiri’s capability:
- Moves beyond **basic sentiment analysis** to deeper **mood-level understanding**.
- Bridges the gap between **visual emotion recognition** and **context-aware multimodal inference**.
- Scales to massive datasets, aligning with the multimodal project’s growth roadmap.

---

Step 2:- Install Python 3.6.0 64 bit from https://www.python.org/downloads/release/python-360/(Note:- Other versions will cause problems with the tensorflow version used)
## Next Steps
- Train and benchmark the new model on curated dataset splits.
- Compare performance against existing CNN and multimodal baselines.
- Integrate evaluation metrics for mood detection accuracy and generalization.


Step 3:- Download site-packages.rar from https://drive.google.com/file/d/1yBVfiMuq6DI8gIF4z__E_gCmwSwEL4uu/view?usp=sharing and extract it into C:\Users\<UserName>\AppData\Local\Programs\Python\Python36\Lib\


Step 4:- Go to project folder where requirements.txt is present.Then open cmd there and type pip install -r requirements.txt


Step 5:- Download Tesseract from https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-setup-3.02.02.exe/download and then install it


Step 6:- Go to project folder. Inside src folder there is UI.exe. Run it and program will run. After the UI pops up click on Browse to select image and then click on Analyze.


### To train previous/new models for emotion classification:


* Download the fer2013.tar.gz file from [here](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data)

* Move the downloaded file to the datasets directory inside this repository.

* Untar the file:
> tar -xzf fer2013.tar

* Run the train_emotion_classification.py file
> python3 train_emotion_classifier.py

### To train previous/new models for gender classification:

* Download the imdb_crop.tar file from [here](https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/) (It's the 7GB button with the tittle Download faces only).

* Move the downloaded file to the datasets directory inside this repository.

* Untar the file:
> tar -xfv imdb_crop.tar

* Run the train_gender_classification.py file
> python3 train_gender_classifier.py
---
12 changes: 12 additions & 0 deletions src/image_emotion_gender_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,24 +18,29 @@
detection_model_path = '../trained_models/detection_models/haarcascade_frontalface_default.xml'
emotion_model_path = '../trained_models/emotion_models/fer2013_mini_XCEPTION.102-0.66.hdf5'
gender_model_path = '../trained_models/gender_models/simple_CNN.81-0.96.hdf5'
onigiri_model_path = '../trained_models/onigiri_models/onigiri_df2j3i_dif982183nfdsfuh982h312jkhkdsahbadyfgasdfr234.hdf5'
emotion_labels = get_labels('fer2013')
gender_labels = get_labels('imdb')
mood_labels = get_labels('onigiri')
font = cv2.FONT_HERSHEY_SIMPLEX

# hyper-parameters for bounding boxes shape
gender_offsets = (30, 60)
gender_offsets = (10, 10)
emotion_offsets = (20, 40)
emotion_offsets = (0, 0)
mood_offsets = (5, 9)

# loading models
face_detection = load_detection_model(detection_model_path)
emotion_classifier = load_model(emotion_model_path, compile=False)
gender_classifier = load_model(gender_model_path, compile=False)
mood_classifier = load_model(onigiri_model_path, compile=False)

# getting input model shapes for inference
emotion_target_size = emotion_classifier.input_shape[1:3]
gender_target_size = gender_classifier.input_shape[1:3]
mood_target_size = mood_classifier.input_shape[1:3]
Copy link

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mood classifier expects 5D input (sequence, height, width, channels) for 3D CNN, but this code extracts only 2D dimensions. This will cause shape mismatch errors during inference. The mood model needs different preprocessing than the 2D CNN models.

Suggested change
mood_target_size = mood_classifier.input_shape[1:3]
mood_target_size = mood_classifier.input_shape[1:4] # (sequence, height, width) for 3D CNN

Copilot uses AI. Check for mistakes.

# loading images
rgb_image = load_image(image_path, grayscale=False)
Expand All @@ -48,11 +53,15 @@
x1, x2, y1, y2 = apply_offsets(face_coordinates, gender_offsets)
rgb_face = rgb_image[y1:y2, x1:x2]

x1, x2, y1, y2 = apply_offsets(face_coordinates, mood_offsets)
moody_face = rgb_image[y1:y2, x1:x2] # uses the same rgb_face since mood comes from face

x1, x2, y1, y2 = apply_offsets(face_coordinates, emotion_offsets)
gray_face = gray_image[y1:y2, x1:x2]

try:
rgb_face = cv2.resize(rgb_face, (gender_target_size))
moody_face = cv2.resize(moody_face, (mood_target_size))
gray_face = cv2.resize(gray_face, (emotion_target_size))
except:
continue
Expand All @@ -67,7 +76,9 @@
gray_face = np.expand_dims(gray_face, 0)
gray_face = np.expand_dims(gray_face, -1)
emotion_label_arg = np.argmax(emotion_classifier.predict(gray_face))
mood_label_arg = np.argmax(mood_classifier.predict(moody_face))
emotion_text = emotion_labels[emotion_label_arg]
mood_text = mood_labels[mood_label_arg]

if gender_text == gender_labels[0]:
color = (0, 0, 255)
Expand All @@ -77,6 +88,7 @@
draw_bounding_box(face_coordinates, rgb_image, color)
draw_text(face_coordinates, rgb_image, gender_text, color, 0, -20, 1, 2)
draw_text(face_coordinates, rgb_image, emotion_text, color, 0, -50, 1, 2)
draw_text(face_coordinates, rgb_image, mood_text, color, 0, -35, 1, 2)

bgr_image = cv2.cvtColor(rgb_image, cv2.COLOR_RGB2BGR)
cv2.imwrite('../images/predicted_test_image.png', bgr_image)
Expand Down
61 changes: 61 additions & 0 deletions src/models/cnn.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from keras.layers import SeparableConv2D
from keras import layers
from keras.regularizers import l2
import tensorflow as tf


def simple_CNN(input_shape, num_classes):
Expand Down Expand Up @@ -344,6 +345,66 @@ def big_XCEPTION(input_shape, num_classes):
model = Model(img_input, output)
return model

def model_allofasudden_that_uses_tensorflow(
sequence_length,
face_front_pixel,
face_back_pixel,
in_channels,
out_features,
number_of_conv3d_layers,
conv3d_channels=32,
fc_features=128,
spatial_kernel_size=3,
temporal_kernel_size=3,
):
"""
Multimodal 데이터를 인코딩하는 3D CNN 기반의 신경망 모델을 생성하는 함수

여러 시계열 데이터를 Conv3D 레이어를 통해 특징을 추출한 후, Fully Connected 레이어를 통해
압축된 feature representation을 생성한다

:param int sequence_length: 입력 데이터의 시간 sequence 길이
:param int face_front_pixel: 입력 데이터의 위도 방향 픽셀 수
:param int face_back_pixel: 입력 데이터의 경도 방향 픽셀 수
:param int in_channels: 입력 데이터의 채널 수
:param int out_features: 최종 출력 feature 크기
:param int number_of_conv3d_layers: 사용할 Conv3D 레이어 개수
:param int conv3d_channels: Conv3D 필터 개수 (기본값: 32)
:param int fc_features: Fully Connected layer에서 사용할 hidden feature 크기 (기본값: 128)
:param int spatial_kernel_size: Conv3D에서 사용할 Spatial 차원의 커널 크기 (기본값: 3)
:param int temporal_kernel_size: Conv3D에서 사용할 Temporal 차원의 커널 크기 (기본값: 3)

:return: Multimodal 데이터를 처리하는 3D CNN 기반의 Keras 모델.
:rtype: tf.keras.Model
"""

# Only the main input
data_input = tf.keras.Input(
shape=(sequence_length, face_front_pixel, face_back_pixel, in_channels),
name="data_input"
)

# Conv3D stack
x = data_input
for _ in range(number_of_conv3d_layers):
x = tf.keras.layers.ZeroPadding3D(padding=((1, 1), (0, 0), (0, 0)))(x) # pad time only
x = tf.keras.layers.Conv3D(
filters=conv3d_channels,
kernel_size=(temporal_kernel_size, spatial_kernel_size, spatial_kernel_size),
strides=(1, 1, 1),
padding="valid"
)(x)
x = tf.keras.layers.ELU()(x)

# Flatten + FC
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(fc_features, activation="elu")(x)
outputs = tf.keras.layers.Dense(out_features, activation="elu")(x)

model = tf.keras.Model(inputs=data_input, outputs=outputs)
return model



if __name__ == "__main__":
input_shape = (64, 64, 1)
Expand Down
68 changes: 65 additions & 3 deletions src/train_gender_classifier.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@


import numpy as np
import pandas as pd
from utils.separate_date_articulator_that_is_new import return_emotions_mood_weather_mixer_combinations
from keras.callbacks import CSVLogger, ModelCheckpoint, EarlyStopping
from keras.callbacks import ReduceLROnPlateau
from utils.datasets import DataManager
from models.cnn import mini_XCEPTION
from models.cnn import mini_XCEPTION, model_allofasudden_that_uses_tensorflow
from utils.data_augmentation import ImageGenerator
from utils.datasets import split_imdb_data

Expand Down Expand Up @@ -36,6 +37,67 @@
grayscale=grayscale,
do_random_crop=do_random_crop)

# onigiri - as of 2025
df_weather_mood = pd.read_csv('../datasets/onigiri/sfj_weir_392834.csv')
all_possible_combinations_input, y_true = return_emotions_mood_weather_mixer_combinations(df_weather_mood, batch_size,num_epochs,patience)
Copy link

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variables 'batch_size', 'num_epochs', and 'patience' are used before being defined. These variables need to be defined earlier in the file or imported from a configuration module.

Copilot uses AI. Check for mistakes.
all_possible_combinations_input = np.array(all_possible_combinations_input)
y_true = np.array(list(y_true.values())) if isinstance(y_true, dict) else np.array(y_true)

mood_model = model_allofasudden_that_uses_tensorflow(
sequence_length=all_possible_combinations_input.shape[1],
face_front_pixel=all_possible_combinations_input.shape[2],
face_back_pixel=all_possible_combinations_input.shape[3],
in_channels=all_possible_combinations_input.shape[4],
out_features=y_true.shape[1] if y_true.ndim > 1 else 1,
number_of_conv3d_layers=3,
conv3d_channels=32,
fc_features=128,
spatial_kernel_size=3,
temporal_kernel_size=3
)

mood_model.compile(
optimizer="adam",
loss="categorical_crossentropy",
metrics=["mae"]
)

mood_model.summary()

# ---- 3) Callbacks (match the style from your example) ----
# fill these in (same variable names you used before)
log_file_path = "mood_train_log.csv"
trained_models_path = "checkpoints/cnn3d_gsp" # no extension; we'll format epochs/metrics into the filename

early_stop = EarlyStopping(monitor="val_loss", patience=patience, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.1, patience=max(1, patience // 2), verbose=1)
csv_logger = CSVLogger(log_file_path, append=False)

# For TF 2.x, use metric names you actually log; here we use val_mae since it's in metrics.
# If you prefer val_loss, change the format accordingly.
model_names = trained_models_path + ".onigiri_df2j3i_dif982183nfdsfuh982h312jkhkdsahbadyfgasdfr234.hdf5"
model_checkpoint = ModelCheckpoint(
filepath=model_names,
monitor="val_loss",
verbose=1,
save_best_only=True,
save_weights_only=False
)

callbacks = [model_checkpoint, csv_logger, early_stop, reduce_lr]

# ---- 4A) Fit with arrays / tf.data (recommended) ----
history = mood_model.fit(
all_possible_combinations_input, y_true,
epochs=num_epochs,
batch_size=batch_size,
validation_split=validation_split,
callbacks=callbacks,
verbose=1
)



# model parameters/compilation
model = mini_XCEPTION(input_shape, num_classes)
model.compile(optimizer='adam',
Expand Down
4 changes: 4 additions & 0 deletions src/utils/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ def __init__(self, dataset_name='imdb',
self.dataset_path = '../datasets/imdb_crop/imdb.mat'
elif self.dataset_name == 'fer2013':
self.dataset_path = '../datasets/fer2013/fer2013.csv'
elif self.dataset_name == 'onigiri':
self.dataset_path = '../datasets/onigiri/sfj_weir_392834.csv'
elif self.dataset_name == 'KDEF':
self.dataset_path = '../datasets/KDEF/'
else:
Expand Down Expand Up @@ -110,6 +112,8 @@ def get_labels(dataset_name):
return {0: 'woman', 1: 'man'}
elif dataset_name == 'KDEF':
return {0: 'AN', 1: 'DI', 2: 'AF', 3: 'HA', 4: 'SA', 5: 'SU', 6: 'NE'}
elif dataset_name == 'onigiri':
return {0: 'A021', 1: 'JSI', 2: 'SOMD', 3: 'KOBS', 4: 'SSOP'}
else:
raise Exception('Invalid dataset name')

Expand Down
Loading