-
Notifications
You must be signed in to change notification settings - Fork 0
Onigiri - Add 3DCNN Model for Mood, Emotion, and Facial Expression Analysis #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kwon-encored
wants to merge
28
commits into
master
Choose a base branch
from
copilot-003
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
c9847b3
nah need to rewrite later ugh
kwon-encored 59915ab
new datasaet on onigiri code
kwon-encored 4beb923
new modle for onigiri
kwon-encored 7df0073
new module to data process the new onigiri
kwon-encored ecdab55
to fit the new value
kwon-encored 0ebf631
predict and implement the result
kwon-encored 74e09dd
wrote github copilot instruction by markdown
kwon-encored e85791b
Update src/utils/separate_date_articulator_that_is_new.py
kwon-encored 982e7ae
unnecessary comment
kwon-encored 4a27990
data type
kwon-encored 5949ee2
list, dict to numpy
kwon-encored ee822ab
Merge remote-tracking branch 'origin/copilot-002' into copilot-002
kwon-encored a593cb8
emotional label
kwon-encored 2fa75ec
functional change
kwon-encored c0753bc
onigiri --- dataload
kwon-encored 40d98f1
inputshape debug
kwon-encored 8d590ed
commentary
kwon-encored 58e7cb6
Merge remote-tracking branch 'origin/copilot-002' into copilot-002
kwon-encored 98f1ebd
label
kwon-encored 6dcedfc
changled label
kwon-encored e66d90b
readme bro
kwon-encored 0360362
fixed error
kwon-encored 39e3c48
on using ML parameters
kwon-encored 359f3d0
about leadtim 9000
kwon-encored c424253
index erroring
kwon-encored a64247d
csv file
kwon-encored 055fec9
int to str
kwon-encored b65891c
instruction for GitHub CoPilot
kwon-encored File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| description: 'Python coding conventions and guidelines' | ||
| applyTo: '**/*.py' | ||
| --- | ||
| - When performing a code review, ensure that variable and function names use snake_case, and class names use CamelCase, following PEP 8 style guidelines. | ||
| - When reviewing functions, check if loops or conditionals can be simplified with built-in or vectorized methods (e.g., numpy, pandas, datetime, itertools) while preserving clarity and behavior. | ||
| - When reviewing a function, check that its name is appropriate and corresponds to and clearly describes its purpose. | ||
| - When reviewing a function, check that its name clearly describes its purpose and that variable names are appropriate and descriptive. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,68 +1,42 @@ | ||
| # Multi-Modal-Image-Sentiment-Analysis | ||
| Final Year Project | ||
|
|
||
| Python version used : 3.6.0 | ||
| ## Overview | ||
| This PR introduces a new **multidimensional 3D CNN model** within the Onigiri project. | ||
| The model leverages a large-scale dataset (~18TB) capturing regressional relationships between **mood, emotion, and facial expressions**, along with **gender attributes**. | ||
|
|
||
| # To perform Sentiment Analysis of Text present in Image. | ||
| > python3 OCRSentiment.py | ||
| # Face classification and detection. | ||
| Real-time face detection and emotion/gender classification using fer2013/IMDB datasets with a keras CNN model and openCV. | ||
| * IMDB gender classification test accuracy: 96%. | ||
| * fer2013 emotion classification test accuracy: 66%. | ||
| The goal is to extend the multimodal project by enabling **mood determination** from image and face data, integrated with contextual metadata. | ||
|
|
||
| --- | ||
|
|
||
| ### Run real-time emotion demo: | ||
| > python3 video_emotion_color_demo.py | ||
| ## Key Features | ||
| - **New Data Integration** | ||
| - Added ~18TB of mass data on mood, emotion, and facial expression alongside gender. | ||
| - Preprocessing pipeline supports sequence-based image and embedding fusion. | ||
|
|
||
| ### Make inference on single images: | ||
| > python3 image_emotion_gender_demo.py <image_path> | ||
| - **3D CNN Model Implementation** | ||
| - Input: `data_input` (sequence of facial image tensors). | ||
| - Auxiliary Input: `site_id_input` for contextual weather embedding. | ||
| - Weather embedding reshaped into a **weather map** and concatenated as an additional channel. | ||
| - Temporal-spatial Conv3D layers with ELU activations. | ||
| - Dense fully connected layers leading to mood prediction outputs. | ||
|
|
||
| e.g. | ||
| - **Output** | ||
| - Predicts **mood state** given image and contextual inputs. | ||
| - Designed to integrate seamlessly with existing multimodal architecture. | ||
|
|
||
| > python3 image_emotion_gender_demo.py ../images/test_image.jpg | ||
| --- | ||
|
|
||
| ### Steps to run the final application UI.exe | ||
| Steps to run project:- | ||
| Step 1:- Download project from https://github.com/AnkurKarmakar/Multi-Modal-Image-Sentiment-Analysis | ||
| Extract the zip folder and place the entire project folder in any drive except C drive. | ||
| ## Motivation | ||
| This implementation expands Onigiri’s capability: | ||
| - Moves beyond **basic sentiment analysis** to deeper **mood-level understanding**. | ||
| - Bridges the gap between **visual emotion recognition** and **context-aware multimodal inference**. | ||
| - Scales to massive datasets, aligning with the multimodal project’s growth roadmap. | ||
|
|
||
| --- | ||
|
|
||
| Step 2:- Install Python 3.6.0 64 bit from https://www.python.org/downloads/release/python-360/(Note:- Other versions will cause problems with the tensorflow version used) | ||
| ## Next Steps | ||
| - Train and benchmark the new model on curated dataset splits. | ||
| - Compare performance against existing CNN and multimodal baselines. | ||
| - Integrate evaluation metrics for mood detection accuracy and generalization. | ||
|
|
||
|
|
||
| Step 3:- Download site-packages.rar from https://drive.google.com/file/d/1yBVfiMuq6DI8gIF4z__E_gCmwSwEL4uu/view?usp=sharing and extract it into C:\Users\<UserName>\AppData\Local\Programs\Python\Python36\Lib\ | ||
|
|
||
|
|
||
| Step 4:- Go to project folder where requirements.txt is present.Then open cmd there and type pip install -r requirements.txt | ||
|
|
||
|
|
||
| Step 5:- Download Tesseract from https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-setup-3.02.02.exe/download and then install it | ||
|
|
||
|
|
||
| Step 6:- Go to project folder. Inside src folder there is UI.exe. Run it and program will run. After the UI pops up click on Browse to select image and then click on Analyze. | ||
|
|
||
|
|
||
| ### To train previous/new models for emotion classification: | ||
|
|
||
|
|
||
| * Download the fer2013.tar.gz file from [here](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data) | ||
|
|
||
| * Move the downloaded file to the datasets directory inside this repository. | ||
|
|
||
| * Untar the file: | ||
| > tar -xzf fer2013.tar | ||
|
|
||
| * Run the train_emotion_classification.py file | ||
| > python3 train_emotion_classifier.py | ||
|
|
||
| ### To train previous/new models for gender classification: | ||
|
|
||
| * Download the imdb_crop.tar file from [here](https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/) (It's the 7GB button with the tittle Download faces only). | ||
|
|
||
| * Move the downloaded file to the datasets directory inside this repository. | ||
|
|
||
| * Untar the file: | ||
| > tar -xfv imdb_crop.tar | ||
|
|
||
| * Run the train_gender_classification.py file | ||
| > python3 train_gender_classifier.py | ||
| --- |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,10 @@ | ||
|
|
||
|
|
||
| import numpy as np | ||
| import pandas as pd | ||
| from utils.separate_date_articulator_that_is_new import return_emotions_mood_weather_mixer_combinations | ||
| from keras.callbacks import CSVLogger, ModelCheckpoint, EarlyStopping | ||
| from keras.callbacks import ReduceLROnPlateau | ||
| from utils.datasets import DataManager | ||
| from models.cnn import mini_XCEPTION | ||
| from models.cnn import mini_XCEPTION, model_allofasudden_that_uses_tensorflow | ||
| from utils.data_augmentation import ImageGenerator | ||
| from utils.datasets import split_imdb_data | ||
|
|
||
|
|
@@ -36,6 +37,67 @@ | |
| grayscale=grayscale, | ||
| do_random_crop=do_random_crop) | ||
|
|
||
| # onigiri - as of 2025 | ||
| df_weather_mood = pd.read_csv('../datasets/onigiri/sfj_weir_392834.csv') | ||
| all_possible_combinations_input, y_true = return_emotions_mood_weather_mixer_combinations(df_weather_mood, batch_size,num_epochs,patience) | ||
|
||
| all_possible_combinations_input = np.array(all_possible_combinations_input) | ||
| y_true = np.array(list(y_true.values())) if isinstance(y_true, dict) else np.array(y_true) | ||
|
|
||
| mood_model = model_allofasudden_that_uses_tensorflow( | ||
| sequence_length=all_possible_combinations_input.shape[1], | ||
| face_front_pixel=all_possible_combinations_input.shape[2], | ||
| face_back_pixel=all_possible_combinations_input.shape[3], | ||
| in_channels=all_possible_combinations_input.shape[4], | ||
| out_features=y_true.shape[1] if y_true.ndim > 1 else 1, | ||
| number_of_conv3d_layers=3, | ||
| conv3d_channels=32, | ||
| fc_features=128, | ||
| spatial_kernel_size=3, | ||
| temporal_kernel_size=3 | ||
| ) | ||
|
|
||
| mood_model.compile( | ||
| optimizer="adam", | ||
| loss="categorical_crossentropy", | ||
| metrics=["mae"] | ||
| ) | ||
|
|
||
| mood_model.summary() | ||
|
|
||
| # ---- 3) Callbacks (match the style from your example) ---- | ||
| # fill these in (same variable names you used before) | ||
| log_file_path = "mood_train_log.csv" | ||
| trained_models_path = "checkpoints/cnn3d_gsp" # no extension; we'll format epochs/metrics into the filename | ||
|
|
||
| early_stop = EarlyStopping(monitor="val_loss", patience=patience, restore_best_weights=True) | ||
| reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.1, patience=max(1, patience // 2), verbose=1) | ||
| csv_logger = CSVLogger(log_file_path, append=False) | ||
|
|
||
| # For TF 2.x, use metric names you actually log; here we use val_mae since it's in metrics. | ||
| # If you prefer val_loss, change the format accordingly. | ||
| model_names = trained_models_path + ".onigiri_df2j3i_dif982183nfdsfuh982h312jkhkdsahbadyfgasdfr234.hdf5" | ||
| model_checkpoint = ModelCheckpoint( | ||
| filepath=model_names, | ||
| monitor="val_loss", | ||
| verbose=1, | ||
| save_best_only=True, | ||
| save_weights_only=False | ||
| ) | ||
|
|
||
| callbacks = [model_checkpoint, csv_logger, early_stop, reduce_lr] | ||
|
|
||
| # ---- 4A) Fit with arrays / tf.data (recommended) ---- | ||
| history = mood_model.fit( | ||
| all_possible_combinations_input, y_true, | ||
| epochs=num_epochs, | ||
| batch_size=batch_size, | ||
| validation_split=validation_split, | ||
| callbacks=callbacks, | ||
| verbose=1 | ||
| ) | ||
|
|
||
|
|
||
|
|
||
| # model parameters/compilation | ||
| model = mini_XCEPTION(input_shape, num_classes) | ||
| model.compile(optimizer='adam', | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mood classifier expects 5D input (sequence, height, width, channels) for 3D CNN, but this code extracts only 2D dimensions. This will cause shape mismatch errors during inference. The mood model needs different preprocessing than the 2D CNN models.