kwon-encored · kwon-encored · Sep 4, 2025 · Sep 4, 2025 · Sep 4, 2025 · Sep 4, 2025
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1,7 @@
+description: 'Python coding conventions and guidelines'
+applyTo: '**/*.py'
+---
+- When performing a code review, ensure that variable and function names use snake_case, and class names use CamelCase, following PEP 8 style guidelines.
+- When reviewing functions, check if loops or conditionals can be simplified with built-in or vectorized methods (e.g., numpy, pandas, datetime, itertools) while preserving clarity and behavior.
+- When reviewing a function, check that its name is appropriate and corresponds to and clearly describes its purpose.
+- When reviewing a function, check that its name clearly describes its purpose and that variable names are appropriate and descriptive.
diff --git a/README.md b/README.md
@@ -1,68 +1,42 @@
 # Multi-Modal-Image-Sentiment-Analysis
-Final Year Project
 
-Python version used : 3.6.0
+## Overview
+This PR introduces a new **multidimensional 3D CNN model** within the Onigiri project.  
+The model leverages a large-scale dataset (~18TB) capturing regressional relationships between **mood, emotion, and facial expressions**, along with **gender attributes**.  
 
-# To perform Sentiment Analysis of Text present in Image.
-> python3 OCRSentiment.py
-# Face classification and detection.
-Real-time face detection and emotion/gender classification using fer2013/IMDB datasets with a keras CNN model and openCV.
-* IMDB gender classification test accuracy: 96%.
-* fer2013 emotion classification test accuracy: 66%.
+The goal is to extend the multimodal project by enabling **mood determination** from image and face data, integrated with contextual metadata.
 
+---
 
-### Run real-time emotion demo:
-> python3 video_emotion_color_demo.py
+## Key Features
+- **New Data Integration**
+  - Added ~18TB of mass data on mood, emotion, and facial expression alongside gender.
+  - Preprocessing pipeline supports sequence-based image and embedding fusion.
 
-### Make inference on single images:
-> python3 image_emotion_gender_demo.py <image_path>
+- **3D CNN Model Implementation**
+  - Input: `data_input` (sequence of facial image tensors).
+  - Auxiliary Input: `site_id_input` for contextual weather embedding.
+  - Weather embedding reshaped into a **weather map** and concatenated as an additional channel.
+  - Temporal-spatial Conv3D layers with ELU activations.
+  - Dense fully connected layers leading to mood prediction outputs.
 
-e.g.
+- **Output**
+  - Predicts **mood state** given image and contextual inputs.
+  - Designed to integrate seamlessly with existing multimodal architecture.
 
-> python3 image_emotion_gender_demo.py ../images/test_image.jpg
+---
 
-### Steps to run the final application UI.exe
-Steps to run project:-
-Step 1:- Download project from https://github.com/AnkurKarmakar/Multi-Modal-Image-Sentiment-Analysis
-Extract the zip folder and place the entire project folder in any drive except C drive.
+## Motivation
+This implementation expands Onigiri’s capability:
+- Moves beyond **basic sentiment analysis** to deeper **mood-level understanding**.
+- Bridges the gap between **visual emotion recognition** and **context-aware multimodal inference**.
+- Scales to massive datasets, aligning with the multimodal project’s growth roadmap.
 
+---
 
-Step 2:- Install Python 3.6.0 64 bit from https://www.python.org/downloads/release/python-360/(Note:- Other versions will cause problems with the tensorflow version used)
+## Next Steps
+- Train and benchmark the new model on curated dataset splits.
+- Compare performance against existing CNN and multimodal baselines.
+- Integrate evaluation metrics for mood detection accuracy and generalization.
 
-
-Step 3:- Download site-packages.rar from https://drive.google.com/file/d/1yBVfiMuq6DI8gIF4z__E_gCmwSwEL4uu/view?usp=sharing and extract it into C:\Users\<UserName>\AppData\Local\Programs\Python\Python36\Lib\
-
-
-Step 4:- Go to project folder where requirements.txt is present.Then open cmd there and type pip install -r requirements.txt
-
-
-Step 5:- Download Tesseract from https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-setup-3.02.02.exe/download  and then install it
-
-
-Step 6:- Go to project folder. Inside src folder there is UI.exe. Run it and program will run. After the UI pops up click on Browse to select image and then click on Analyze.
-
-
-### To train previous/new models for emotion classification:
-
-
-* Download the fer2013.tar.gz file from [here](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data)
-
-* Move the downloaded file to the datasets directory inside this repository.
-
-* Untar the file:
-> tar -xzf fer2013.tar
-
-* Run the train_emotion_classification.py file
-> python3 train_emotion_classifier.py
-
-### To train previous/new models for gender classification:
-
-* Download the imdb_crop.tar file from [here](https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/) (It's the 7GB button with the tittle Download faces only).
-
-* Move the downloaded file to the datasets directory inside this repository.
-
-* Untar the file:
-> tar -xfv imdb_crop.tar
-
-* Run the train_gender_classification.py file
-> python3 train_gender_classifier.py
+---
diff --git a/src/image_emotion_gender_demo.py b/src/image_emotion_gender_demo.py
@@ -18,24 +18,29 @@
 detection_model_path = '../trained_models/detection_models/haarcascade_frontalface_default.xml'
 emotion_model_path = '../trained_models/emotion_models/fer2013_mini_XCEPTION.102-0.66.hdf5'
 gender_model_path = '../trained_models/gender_models/simple_CNN.81-0.96.hdf5'
+onigiri_model_path = '../trained_models/onigiri_models/onigiri_df2j3i_dif982183nfdsfuh982h312jkhkdsahbadyfgasdfr234.hdf5'
 emotion_labels = get_labels('fer2013')
 gender_labels = get_labels('imdb')
+mood_labels = get_labels('onigiri')
 font = cv2.FONT_HERSHEY_SIMPLEX
 
 # hyper-parameters for bounding boxes shape
 gender_offsets = (30, 60)
 gender_offsets = (10, 10)
 emotion_offsets = (20, 40)
 emotion_offsets = (0, 0)
+mood_offsets = (5, 9)
 
 # loading models
 face_detection = load_detection_model(detection_model_path)
 emotion_classifier = load_model(emotion_model_path, compile=False)
 gender_classifier = load_model(gender_model_path, compile=False)
+mood_classifier   = load_model(onigiri_model_path, compile=False)
 
 # getting input model shapes for inference
 emotion_target_size = emotion_classifier.input_shape[1:3]
 gender_target_size = gender_classifier.input_shape[1:3]
+mood_target_size = mood_classifier.input_shape[1:3]
-mood_target_size = mood_classifier.input_shape[1:3]
+mood_target_size = mood_classifier.input_shape[1:4]  # (sequence, height, width) for 3D CNN
-mood_target_size = mood_classifier.input_shape[1:3]
+mood_target_size = mood_classifier.input_shape[1:4]  # (sequence, height, width) for 3D CNN
 
 # loading images
 rgb_image = load_image(image_path, grayscale=False)
@@ -48,11 +53,15 @@
     x1, x2, y1, y2 = apply_offsets(face_coordinates, gender_offsets)
     rgb_face = rgb_image[y1:y2, x1:x2]
 
+    x1, x2, y1, y2 = apply_offsets(face_coordinates, mood_offsets)
+    moody_face = rgb_image[y1:y2, x1:x2]   # uses the same rgb_face since mood comes from face
+
     x1, x2, y1, y2 = apply_offsets(face_coordinates, emotion_offsets)
     gray_face = gray_image[y1:y2, x1:x2]
 
     try:
         rgb_face = cv2.resize(rgb_face, (gender_target_size))
+        moody_face = cv2.resize(moody_face, (mood_target_size))
         gray_face = cv2.resize(gray_face, (emotion_target_size))
     except:
         continue
@@ -67,7 +76,9 @@
     gray_face = np.expand_dims(gray_face, 0)
     gray_face = np.expand_dims(gray_face, -1)
     emotion_label_arg = np.argmax(emotion_classifier.predict(gray_face))
+    mood_label_arg = np.argmax(mood_classifier.predict(moody_face))
     emotion_text = emotion_labels[emotion_label_arg]
+    mood_text = mood_labels[mood_label_arg]
 
     if gender_text == gender_labels[0]:
         color = (0, 0, 255)
@@ -77,6 +88,7 @@
     draw_bounding_box(face_coordinates, rgb_image, color)
     draw_text(face_coordinates, rgb_image, gender_text, color, 0, -20, 1, 2)
     draw_text(face_coordinates, rgb_image, emotion_text, color, 0, -50, 1, 2)
+    draw_text(face_coordinates, rgb_image, mood_text, color, 0, -35, 1, 2)
 
 bgr_image = cv2.cvtColor(rgb_image, cv2.COLOR_RGB2BGR)
 cv2.imwrite('../images/predicted_test_image.png', bgr_image)

diff --git a/src/models/cnn.py b/src/models/cnn.py
@@ -9,6 +9,7 @@
 from keras.layers import SeparableConv2D
 from keras import layers
 from keras.regularizers import l2
+import tensorflow as tf
 
 
 def simple_CNN(input_shape, num_classes):
@@ -344,6 +345,66 @@ def big_XCEPTION(input_shape, num_classes):
     model = Model(img_input, output)
     return model
 
+def model_allofasudden_that_uses_tensorflow(
+    sequence_length,
+    face_front_pixel,
+    face_back_pixel,
+    in_channels,
+    out_features,
+    number_of_conv3d_layers,
+    conv3d_channels=32,
+    fc_features=128,
+    spatial_kernel_size=3,
+    temporal_kernel_size=3,
+):
+    """
+    Multimodal 데이터를 인코딩하는 3D CNN 기반의 신경망 모델을 생성하는 함수
+
+    여러 시계열 데이터를 Conv3D 레이어를 통해 특징을 추출한 후, Fully Connected 레이어를 통해
+    압축된 feature representation을 생성한다
+
+    :param int sequence_length: 입력 데이터의 시간 sequence 길이
+    :param int face_front_pixel: 입력 데이터의 위도 방향 픽셀 수
+    :param int face_back_pixel: 입력 데이터의 경도 방향 픽셀 수
+    :param int in_channels: 입력 데이터의 채널 수
+    :param int out_features: 최종 출력 feature 크기
+    :param int number_of_conv3d_layers: 사용할 Conv3D 레이어 개수
+    :param int conv3d_channels: Conv3D 필터 개수 (기본값: 32)
+    :param int fc_features: Fully Connected layer에서 사용할 hidden feature 크기 (기본값: 128)
+    :param int spatial_kernel_size: Conv3D에서 사용할 Spatial 차원의 커널 크기 (기본값: 3)
+    :param int temporal_kernel_size: Conv3D에서 사용할 Temporal 차원의 커널 크기 (기본값: 3)
+
+    :return: Multimodal 데이터를 처리하는 3D CNN 기반의 Keras 모델.
+    :rtype: tf.keras.Model
+    """
+
+    # Only the main input
+    data_input = tf.keras.Input(
+        shape=(sequence_length, face_front_pixel, face_back_pixel, in_channels),
+        name="data_input"
+    )
+
+    # Conv3D stack
+    x = data_input
+    for _ in range(number_of_conv3d_layers):
+        x = tf.keras.layers.ZeroPadding3D(padding=((1, 1), (0, 0), (0, 0)))(x)  # pad time only
+        x = tf.keras.layers.Conv3D(
+            filters=conv3d_channels,
+            kernel_size=(temporal_kernel_size, spatial_kernel_size, spatial_kernel_size),
+            strides=(1, 1, 1),
+            padding="valid"
+        )(x)
+        x = tf.keras.layers.ELU()(x)
+
+    # Flatten + FC
+    x = tf.keras.layers.Flatten()(x)
+    x = tf.keras.layers.Dense(fc_features, activation="elu")(x)
+    outputs = tf.keras.layers.Dense(out_features, activation="elu")(x)
+
+    model = tf.keras.Model(inputs=data_input, outputs=outputs)
+    return model
+
+
 
 if __name__ == "__main__":
     input_shape = (64, 64, 1)

diff --git a/src/train_gender_classifier.py b/src/train_gender_classifier.py
@@ -1,9 +1,10 @@
-
-
+import numpy as np
+import pandas as pd
+from utils.separate_date_articulator_that_is_new import return_emotions_mood_weather_mixer_combinations
 from keras.callbacks import CSVLogger, ModelCheckpoint, EarlyStopping
 from keras.callbacks import ReduceLROnPlateau
 from utils.datasets import DataManager
-from models.cnn import mini_XCEPTION
+from models.cnn import mini_XCEPTION, model_allofasudden_that_uses_tensorflow
 from utils.data_augmentation import ImageGenerator
 from utils.datasets import split_imdb_data
 
@@ -36,6 +37,67 @@
                                  grayscale=grayscale,
                                  do_random_crop=do_random_crop)
 
+# onigiri - as of 2025
+df_weather_mood = pd.read_csv('../datasets/onigiri/sfj_weir_392834.csv')
+all_possible_combinations_input, y_true = return_emotions_mood_weather_mixer_combinations(df_weather_mood, batch_size,num_epochs,patience)
+all_possible_combinations_input = np.array(all_possible_combinations_input)
+y_true = np.array(list(y_true.values())) if isinstance(y_true, dict) else np.array(y_true)
+
+mood_model = model_allofasudden_that_uses_tensorflow(
+    sequence_length=all_possible_combinations_input.shape[1],
+    face_front_pixel=all_possible_combinations_input.shape[2],
+    face_back_pixel=all_possible_combinations_input.shape[3],
+    in_channels=all_possible_combinations_input.shape[4],
+    out_features=y_true.shape[1] if y_true.ndim > 1 else 1,
+    number_of_conv3d_layers=3,
+    conv3d_channels=32,
+    fc_features=128,
+    spatial_kernel_size=3,
+    temporal_kernel_size=3
+)
+
+mood_model.compile(
+    optimizer="adam",
+    loss="categorical_crossentropy",
+    metrics=["mae"]
+)
+
+mood_model.summary()
+
+# ---- 3) Callbacks (match the style from your example) ----
+# fill these in (same variable names you used before)
+log_file_path = "mood_train_log.csv"
+trained_models_path = "checkpoints/cnn3d_gsp"  # no extension; we'll format epochs/metrics into the filename
+
+early_stop = EarlyStopping(monitor="val_loss", patience=patience, restore_best_weights=True)
+reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.1, patience=max(1, patience // 2), verbose=1)
+csv_logger = CSVLogger(log_file_path, append=False)
+
+# For TF 2.x, use metric names you actually log; here we use val_mae since it's in metrics.
+# If you prefer val_loss, change the format accordingly.
+model_names = trained_models_path + ".onigiri_df2j3i_dif982183nfdsfuh982h312jkhkdsahbadyfgasdfr234.hdf5"
+model_checkpoint = ModelCheckpoint(
+    filepath=model_names,
+    monitor="val_loss",
+    verbose=1,
+    save_best_only=True,
+    save_weights_only=False
+)
+
+callbacks = [model_checkpoint, csv_logger, early_stop, reduce_lr]
+
+# ---- 4A) Fit with arrays / tf.data (recommended) ----
+history = mood_model.fit(
+    all_possible_combinations_input, y_true,
+    epochs=num_epochs,
+    batch_size=batch_size,
+    validation_split=validation_split,
+    callbacks=callbacks,
+    verbose=1
+)
+
+
+
 # model parameters/compilation
 model = mini_XCEPTION(input_shape, num_classes)
 model.compile(optimizer='adam',

diff --git a/src/utils/datasets.py b/src/utils/datasets.py
@@ -21,6 +21,8 @@ def __init__(self, dataset_name='imdb',
             self.dataset_path = '../datasets/imdb_crop/imdb.mat'
         elif self.dataset_name == 'fer2013':
             self.dataset_path = '../datasets/fer2013/fer2013.csv'
+        elif self.dataset_name == 'onigiri':
+            self.dataset_path = '../datasets/onigiri/sfj_weir_392834.csv'
         elif self.dataset_name == 'KDEF':
             self.dataset_path = '../datasets/KDEF/'
         else:
@@ -110,6 +112,8 @@ def get_labels(dataset_name):
         return {0: 'woman', 1: 'man'}
     elif dataset_name == 'KDEF':
         return {0: 'AN', 1: 'DI', 2: 'AF', 3: 'HA', 4: 'SA', 5: 'SU', 6: 'NE'}
+    elif dataset_name == 'onigiri':
+        return {0: 'A021', 1: 'JSI', 2: 'SOMD', 3: 'KOBS', 4: 'SSOP'}
     else:
         raise Exception('Invalid dataset name')