diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 0000000..c5b662f --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,7 @@ +description: 'Python coding conventions and guidelines' +applyTo: '**/*.py' +--- +- When performing a code review, ensure that variable and function names use snake_case, and class names use CamelCase, following PEP 8 style guidelines. +- When reviewing functions, check if loops or conditionals can be simplified with built-in or vectorized methods (e.g., numpy, pandas, datetime, itertools) while preserving clarity and behavior. +- When reviewing a function, check that its name is appropriate and corresponds to and clearly describes its purpose. +- When reviewing a function, check that its name clearly describes its purpose and that variable names are appropriate and descriptive. diff --git a/README.md b/README.md index ecb74e7..4ae7f82 100644 --- a/README.md +++ b/README.md @@ -1,68 +1,42 @@ # Multi-Modal-Image-Sentiment-Analysis -Final Year Project -Python version used : 3.6.0 +## Overview +This PR introduces a new **multidimensional 3D CNN model** within the Onigiri project. +The model leverages a large-scale dataset (~18TB) capturing regressional relationships between **mood, emotion, and facial expressions**, along with **gender attributes**. -# To perform Sentiment Analysis of Text present in Image. -> python3 OCRSentiment.py -# Face classification and detection. -Real-time face detection and emotion/gender classification using fer2013/IMDB datasets with a keras CNN model and openCV. -* IMDB gender classification test accuracy: 96%. -* fer2013 emotion classification test accuracy: 66%. +The goal is to extend the multimodal project by enabling **mood determination** from image and face data, integrated with contextual metadata. +--- -### Run real-time emotion demo: -> python3 video_emotion_color_demo.py +## Key Features +- **New Data Integration** + - Added ~18TB of mass data on mood, emotion, and facial expression alongside gender. + - Preprocessing pipeline supports sequence-based image and embedding fusion. -### Make inference on single images: -> python3 image_emotion_gender_demo.py +- **3D CNN Model Implementation** + - Input: `data_input` (sequence of facial image tensors). + - Auxiliary Input: `site_id_input` for contextual weather embedding. + - Weather embedding reshaped into a **weather map** and concatenated as an additional channel. + - Temporal-spatial Conv3D layers with ELU activations. + - Dense fully connected layers leading to mood prediction outputs. -e.g. +- **Output** + - Predicts **mood state** given image and contextual inputs. + - Designed to integrate seamlessly with existing multimodal architecture. -> python3 image_emotion_gender_demo.py ../images/test_image.jpg +--- -### Steps to run the final application UI.exe -Steps to run project:- -Step 1:- Download project from https://github.com/AnkurKarmakar/Multi-Modal-Image-Sentiment-Analysis -Extract the zip folder and place the entire project folder in any drive except C drive. +## Motivation +This implementation expands Onigiri’s capability: +- Moves beyond **basic sentiment analysis** to deeper **mood-level understanding**. +- Bridges the gap between **visual emotion recognition** and **context-aware multimodal inference**. +- Scales to massive datasets, aligning with the multimodal project’s growth roadmap. +--- -Step 2:- Install Python 3.6.0 64 bit from https://www.python.org/downloads/release/python-360/(Note:- Other versions will cause problems with the tensorflow version used) +## Next Steps +- Train and benchmark the new model on curated dataset splits. +- Compare performance against existing CNN and multimodal baselines. +- Integrate evaluation metrics for mood detection accuracy and generalization. - -Step 3:- Download site-packages.rar from https://drive.google.com/file/d/1yBVfiMuq6DI8gIF4z__E_gCmwSwEL4uu/view?usp=sharing and extract it into C:\Users\\AppData\Local\Programs\Python\Python36\Lib\ - - -Step 4:- Go to project folder where requirements.txt is present.Then open cmd there and type pip install -r requirements.txt - - -Step 5:- Download Tesseract from https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-setup-3.02.02.exe/download and then install it - - -Step 6:- Go to project folder. Inside src folder there is UI.exe. Run it and program will run. After the UI pops up click on Browse to select image and then click on Analyze. - - -### To train previous/new models for emotion classification: - - -* Download the fer2013.tar.gz file from [here](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data) - -* Move the downloaded file to the datasets directory inside this repository. - -* Untar the file: -> tar -xzf fer2013.tar - -* Run the train_emotion_classification.py file -> python3 train_emotion_classifier.py - -### To train previous/new models for gender classification: - -* Download the imdb_crop.tar file from [here](https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/) (It's the 7GB button with the tittle Download faces only). - -* Move the downloaded file to the datasets directory inside this repository. - -* Untar the file: -> tar -xfv imdb_crop.tar - -* Run the train_gender_classification.py file -> python3 train_gender_classifier.py +--- diff --git a/src/image_emotion_gender_demo.py b/src/image_emotion_gender_demo.py index 684d236..6c42d2f 100644 --- a/src/image_emotion_gender_demo.py +++ b/src/image_emotion_gender_demo.py @@ -18,8 +18,10 @@ detection_model_path = '../trained_models/detection_models/haarcascade_frontalface_default.xml' emotion_model_path = '../trained_models/emotion_models/fer2013_mini_XCEPTION.102-0.66.hdf5' gender_model_path = '../trained_models/gender_models/simple_CNN.81-0.96.hdf5' +onigiri_model_path = '../trained_models/onigiri_models/onigiri_df2j3i_dif982183nfdsfuh982h312jkhkdsahbadyfgasdfr234.hdf5' emotion_labels = get_labels('fer2013') gender_labels = get_labels('imdb') +mood_labels = get_labels('onigiri') font = cv2.FONT_HERSHEY_SIMPLEX # hyper-parameters for bounding boxes shape @@ -27,15 +29,18 @@ gender_offsets = (10, 10) emotion_offsets = (20, 40) emotion_offsets = (0, 0) +mood_offsets = (5, 9) # loading models face_detection = load_detection_model(detection_model_path) emotion_classifier = load_model(emotion_model_path, compile=False) gender_classifier = load_model(gender_model_path, compile=False) +mood_classifier = load_model(onigiri_model_path, compile=False) # getting input model shapes for inference emotion_target_size = emotion_classifier.input_shape[1:3] gender_target_size = gender_classifier.input_shape[1:3] +mood_target_size = mood_classifier.input_shape[1:3] # loading images rgb_image = load_image(image_path, grayscale=False) @@ -48,11 +53,15 @@ x1, x2, y1, y2 = apply_offsets(face_coordinates, gender_offsets) rgb_face = rgb_image[y1:y2, x1:x2] + x1, x2, y1, y2 = apply_offsets(face_coordinates, mood_offsets) + moody_face = rgb_image[y1:y2, x1:x2] # uses the same rgb_face since mood comes from face + x1, x2, y1, y2 = apply_offsets(face_coordinates, emotion_offsets) gray_face = gray_image[y1:y2, x1:x2] try: rgb_face = cv2.resize(rgb_face, (gender_target_size)) + moody_face = cv2.resize(moody_face, (mood_target_size)) gray_face = cv2.resize(gray_face, (emotion_target_size)) except: continue @@ -67,7 +76,9 @@ gray_face = np.expand_dims(gray_face, 0) gray_face = np.expand_dims(gray_face, -1) emotion_label_arg = np.argmax(emotion_classifier.predict(gray_face)) + mood_label_arg = np.argmax(mood_classifier.predict(moody_face)) emotion_text = emotion_labels[emotion_label_arg] + mood_text = mood_labels[mood_label_arg] if gender_text == gender_labels[0]: color = (0, 0, 255) @@ -77,6 +88,7 @@ draw_bounding_box(face_coordinates, rgb_image, color) draw_text(face_coordinates, rgb_image, gender_text, color, 0, -20, 1, 2) draw_text(face_coordinates, rgb_image, emotion_text, color, 0, -50, 1, 2) + draw_text(face_coordinates, rgb_image, mood_text, color, 0, -35, 1, 2) bgr_image = cv2.cvtColor(rgb_image, cv2.COLOR_RGB2BGR) cv2.imwrite('../images/predicted_test_image.png', bgr_image) diff --git a/src/models/cnn.py b/src/models/cnn.py index cbdbb08..ceb00b3 100644 --- a/src/models/cnn.py +++ b/src/models/cnn.py @@ -9,6 +9,7 @@ from keras.layers import SeparableConv2D from keras import layers from keras.regularizers import l2 +import tensorflow as tf def simple_CNN(input_shape, num_classes): @@ -344,6 +345,66 @@ def big_XCEPTION(input_shape, num_classes): model = Model(img_input, output) return model +def model_allofasudden_that_uses_tensorflow( + sequence_length, + face_front_pixel, + face_back_pixel, + in_channels, + out_features, + number_of_conv3d_layers, + conv3d_channels=32, + fc_features=128, + spatial_kernel_size=3, + temporal_kernel_size=3, +): + """ + Multimodal 데이터를 인코딩하는 3D CNN 기반의 신경망 모델을 생성하는 함수 + + 여러 시계열 데이터를 Conv3D 레이어를 통해 특징을 추출한 후, Fully Connected 레이어를 통해 + 압축된 feature representation을 생성한다 + + :param int sequence_length: 입력 데이터의 시간 sequence 길이 + :param int face_front_pixel: 입력 데이터의 위도 방향 픽셀 수 + :param int face_back_pixel: 입력 데이터의 경도 방향 픽셀 수 + :param int in_channels: 입력 데이터의 채널 수 + :param int out_features: 최종 출력 feature 크기 + :param int number_of_conv3d_layers: 사용할 Conv3D 레이어 개수 + :param int conv3d_channels: Conv3D 필터 개수 (기본값: 32) + :param int fc_features: Fully Connected layer에서 사용할 hidden feature 크기 (기본값: 128) + :param int spatial_kernel_size: Conv3D에서 사용할 Spatial 차원의 커널 크기 (기본값: 3) + :param int temporal_kernel_size: Conv3D에서 사용할 Temporal 차원의 커널 크기 (기본값: 3) + + :return: Multimodal 데이터를 처리하는 3D CNN 기반의 Keras 모델. + :rtype: tf.keras.Model + """ + + # Only the main input + data_input = tf.keras.Input( + shape=(sequence_length, face_front_pixel, face_back_pixel, in_channels), + name="data_input" + ) + + # Conv3D stack + x = data_input + for _ in range(number_of_conv3d_layers): + x = tf.keras.layers.ZeroPadding3D(padding=((1, 1), (0, 0), (0, 0)))(x) # pad time only + x = tf.keras.layers.Conv3D( + filters=conv3d_channels, + kernel_size=(temporal_kernel_size, spatial_kernel_size, spatial_kernel_size), + strides=(1, 1, 1), + padding="valid" + )(x) + x = tf.keras.layers.ELU()(x) + + # Flatten + FC + x = tf.keras.layers.Flatten()(x) + x = tf.keras.layers.Dense(fc_features, activation="elu")(x) + outputs = tf.keras.layers.Dense(out_features, activation="elu")(x) + + model = tf.keras.Model(inputs=data_input, outputs=outputs) + return model + + if __name__ == "__main__": input_shape = (64, 64, 1) diff --git a/src/train_gender_classifier.py b/src/train_gender_classifier.py index df5fc8d..7f7eb59 100644 --- a/src/train_gender_classifier.py +++ b/src/train_gender_classifier.py @@ -1,9 +1,10 @@ - - +import numpy as np +import pandas as pd +from utils.separate_date_articulator_that_is_new import return_emotions_mood_weather_mixer_combinations from keras.callbacks import CSVLogger, ModelCheckpoint, EarlyStopping from keras.callbacks import ReduceLROnPlateau from utils.datasets import DataManager -from models.cnn import mini_XCEPTION +from models.cnn import mini_XCEPTION, model_allofasudden_that_uses_tensorflow from utils.data_augmentation import ImageGenerator from utils.datasets import split_imdb_data @@ -36,6 +37,67 @@ grayscale=grayscale, do_random_crop=do_random_crop) +# onigiri - as of 2025 +df_weather_mood = pd.read_csv('../datasets/onigiri/sfj_weir_392834.csv') +all_possible_combinations_input, y_true = return_emotions_mood_weather_mixer_combinations(df_weather_mood, batch_size,num_epochs,patience) +all_possible_combinations_input = np.array(all_possible_combinations_input) +y_true = np.array(list(y_true.values())) if isinstance(y_true, dict) else np.array(y_true) + +mood_model = model_allofasudden_that_uses_tensorflow( + sequence_length=all_possible_combinations_input.shape[1], + face_front_pixel=all_possible_combinations_input.shape[2], + face_back_pixel=all_possible_combinations_input.shape[3], + in_channels=all_possible_combinations_input.shape[4], + out_features=y_true.shape[1] if y_true.ndim > 1 else 1, + number_of_conv3d_layers=3, + conv3d_channels=32, + fc_features=128, + spatial_kernel_size=3, + temporal_kernel_size=3 +) + +mood_model.compile( + optimizer="adam", + loss="categorical_crossentropy", + metrics=["mae"] +) + +mood_model.summary() + +# ---- 3) Callbacks (match the style from your example) ---- +# fill these in (same variable names you used before) +log_file_path = "mood_train_log.csv" +trained_models_path = "checkpoints/cnn3d_gsp" # no extension; we'll format epochs/metrics into the filename + +early_stop = EarlyStopping(monitor="val_loss", patience=patience, restore_best_weights=True) +reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.1, patience=max(1, patience // 2), verbose=1) +csv_logger = CSVLogger(log_file_path, append=False) + +# For TF 2.x, use metric names you actually log; here we use val_mae since it's in metrics. +# If you prefer val_loss, change the format accordingly. +model_names = trained_models_path + ".onigiri_df2j3i_dif982183nfdsfuh982h312jkhkdsahbadyfgasdfr234.hdf5" +model_checkpoint = ModelCheckpoint( + filepath=model_names, + monitor="val_loss", + verbose=1, + save_best_only=True, + save_weights_only=False +) + +callbacks = [model_checkpoint, csv_logger, early_stop, reduce_lr] + +# ---- 4A) Fit with arrays / tf.data (recommended) ---- +history = mood_model.fit( + all_possible_combinations_input, y_true, + epochs=num_epochs, + batch_size=batch_size, + validation_split=validation_split, + callbacks=callbacks, + verbose=1 +) + + + # model parameters/compilation model = mini_XCEPTION(input_shape, num_classes) model.compile(optimizer='adam', diff --git a/src/utils/datasets.py b/src/utils/datasets.py index aa445a7..b3d875f 100644 --- a/src/utils/datasets.py +++ b/src/utils/datasets.py @@ -21,6 +21,8 @@ def __init__(self, dataset_name='imdb', self.dataset_path = '../datasets/imdb_crop/imdb.mat' elif self.dataset_name == 'fer2013': self.dataset_path = '../datasets/fer2013/fer2013.csv' + elif self.dataset_name == 'onigiri': + self.dataset_path = '../datasets/onigiri/sfj_weir_392834.csv' elif self.dataset_name == 'KDEF': self.dataset_path = '../datasets/KDEF/' else: @@ -110,6 +112,8 @@ def get_labels(dataset_name): return {0: 'woman', 1: 'man'} elif dataset_name == 'KDEF': return {0: 'AN', 1: 'DI', 2: 'AF', 3: 'HA', 4: 'SA', 5: 'SU', 6: 'NE'} + elif dataset_name == 'onigiri': + return {0: 'A021', 1: 'JSI', 2: 'SOMD', 3: 'KOBS', 4: 'SSOP'} else: raise Exception('Invalid dataset name') diff --git a/src/utils/separate_date_articulator_that_is_new.py b/src/utils/separate_date_articulator_that_is_new.py new file mode 100644 index 0000000..6e5ac68 --- /dev/null +++ b/src/utils/separate_date_articulator_that_is_new.py @@ -0,0 +1,98 @@ +from dasenima import SelectDateSpatialSlice, base_and_issue_time_declaration, generate_hourly_timestep +from utils.datasets import DataManager +from datetime import timedelta, datetime +import numpy as np + +date_spatial_slice = SelectDateSpatialSlice() +def accumulated_value_update(var_name, df): + """ + 주어진 변수(var_name)에 대한 변화량을 계산하여 '_delta' 컬럼을 추가함. + 매개변수: + - var_name (str): 변화량을 계산할 변수명. + - df (pd.DataFrame): 변환을 수행할 데이터프레임. + 반환값: + - pd.DataFrame: 변화량이 반영된 데이터프레임. + """ + + var_name_delta = var_name + "_delta" + df[var_name_delta] = (df[var_name].shift(-1) - df[var_name]).fillna(0) + df[var_name_delta] = np.where(df["leadtime"] == 9000, 0, df[var_name_delta]) + + return df + +def selected_time_slice(df_weatherMood): + chosen_t0 = df_weatherMood[df_weatherMood["leadtime"]==9000]["t0"] + ( + basetime_t0_hr_int, + issued_time_hr_int, + issued_date_date_int, + issued_month_int, + issued_time_hr_str, + ) = base_and_issue_time_declaration(chosen_t0) + + # up until here, issued-date and hour is matched + df_weatherMood_issueddate_all_filtered_cleanly = df_weatherMood[ + (df_weatherMood.issueddate == str(issued_date_date_int)) + & (df_weatherMood.issuedhour == issued_time_hr_str) + ].reset_index(drop=True) + + time_slice_index_t0 = df_weatherMood_issueddate_all_filtered_cleanly[ + (df_weatherMood_issueddate_all_filtered_cleanly.basetime == chosen_t0) + ].index[0] + + df_weatherMood_issueddate_all_filtered_cleanly = df_weatherMood_issueddate_all_filtered_cleanly.iloc[ + time_slice_index_t0 - 2 : time_slice_index_t0 + 9 + ] + + return df_weatherMood_issueddate_all_filtered_cleanly + + +def return_emotions_mood_weather_mixer_combinations(df_weatherMood, batch_size,num_epochs,patience ): + df = selected_time_slice(df_weatherMood) + sroe_code_values = df[f"{batch_size}_{num_epochs}_{patience}"] + mood_types = df[f"{batch_size}_mood"].unique() + weather_types = df[f"{batch_size}_weather"].unique() + start = sroe_code_values // 100 + end = sroe_code_values // 10 + + regional_coords = date_spatial_slice(sroe_code_values) + timestamps = generate_hourly_timestep(start, end) + + combined = [ + [mood, weather, codeNum] + for codeNum in sroe_code_values + for mood in mood_types + for weather in weather_types + ] + + all_timesteps = generate_monthly_timestamps(start, end) + + return combined, all_timesteps + +def generate_monthly_timestamps(start_timestamp, end_timestamp): + # datetime 으로 변환 + start_timestamp = str(start_timestamp) + end_timestamp = str(end_timestamp) + start_date = datetime.strptime(start_timestamp, "%Y%m%d%H") + end_date = datetime.strptime(end_timestamp, "%Y%m%d%H") + + # 각 개월 수 마다로 + monthly_timestamps = {} + + # 각 매시간 (everyhour) 루핑 + current_time = start_date + while current_time <= end_date: + month_key = current_time.strftime("%Y%m") # YYYYMM + timestamp_int = int(current_time.strftime("%Y%m%d%H%M%S")) # YYYMMDDHHMMSS + + # 해당 달이 새로 시작되면, dict에 새로운 키를 만든다 + if month_key not in monthly_timestamps: + monthly_timestamps[month_key] = [] + + # 해당 알맞는 개월 key에 element로 Dict에 포함 + monthly_timestamps[month_key].append(timestamp_int) + + # 아음 시간으로 + current_time += timedelta(hours=1) + + return monthly_timestamps \ No newline at end of file