Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
bce79ed
Create reportreport
lilinSTART Apr 3, 2024
4dff444
Rename reportreport to report.md
lilinSTART Apr 3, 2024
cc7e2bf
Update report.md
lilinSTART Apr 5, 2024
f177ea3
Rename report.md to REPORT.md
lilinSTART Apr 5, 2024
cd0221e
Add files via upload
lilinSTART Apr 5, 2024
f1126a2
Update train.py
lilinSTART Apr 5, 2024
b4c8337
Update train.py
lilinSTART Apr 5, 2024
bf30831
Update train.py
lilinSTART Apr 5, 2024
5048c9e
Add files via upload
lilinSTART Apr 5, 2024
f64f6e6
Create logs_0.log
lilinSTART Apr 5, 2024
6dd6c8a
Add files via upload
lilinSTART Apr 5, 2024
67c5fb7
Delete RESULTS/git/logs_0.log
lilinSTART Apr 5, 2024
6450c74
Delete RESULTS/git/logs_0.txt
lilinSTART Apr 5, 2024
9892a0c
Add files via upload
lilinSTART Apr 5, 2024
3f53455
Create 1
lilinSTART Apr 5, 2024
4d5b097
Update REPORT.md
lilinSTART Apr 5, 2024
7e22191
Update demo_train.sh
lilinSTART Apr 6, 2024
6daaf0f
Update demo_test.sh
lilinSTART Apr 6, 2024
876dc1a
Update cnnlstm_train.sh
lilinSTART Apr 6, 2024
b6e7e15
Update cnnlstm_test.sh
lilinSTART Apr 6, 2024
1d0d7d1
Update cnnlstm_train.sh
lilinSTART Apr 6, 2024
5b2c62a
Update README.md
lilinSTART Apr 7, 2024
a3979b8
Update README.md
lilinSTART Apr 7, 2024
abd8202
Update README.md
lilinSTART Apr 7, 2024
1ca18b6
Delete REPORT.md
lilinSTART Apr 7, 2024
5012d01
Update README.md
lilinSTART Apr 7, 2024
e38897a
Delete results/1
lilinSTART Apr 7, 2024
52696c2
Update README.md
lilinSTART Apr 7, 2024
3565a89
Update README.md
lilinSTART Apr 7, 2024
4b2c867
Update README.md
lilinSTART Apr 7, 2024
2a7ef4f
Update README.md
lilinSTART Apr 7, 2024
4644078
Update README.md
lilinSTART Apr 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 42 additions & 85 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,119 +1,76 @@
# DS598 DL4DS Midterm Project

## Introduction
For this project, you will train a network to generate captions for the
[VizWiz Image Captioning dataset](https://vizwiz.org/tasks-and-datasets/image-captioning/).
The images are taken by people who are blind and typically rely on
human-based image captioning services. Your objective will be to beat a
a baseline score on the [test set leaderboard](https://eval.ai/web/challenges/challenge-page/739/leaderboard/2006).

## Developer Setup
The project aims to provide image-to-caption services for blind people using Transformer technology. The project employs the [blip-image-captioning-base model](https://huggingface.co/Salesforce/blip-image-captioning-base), fine-tuned on the [VizWiz Image Captioning dataset](https://vizwiz.org/tasks-and-datasets/image-captioning/). The optimizer is [AdamW](https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html) with a learning rate of 2e-5 and a weight decay of 5e-4. The model is set to train for up to 16 epochs, but training is stopped early at epoch 7, since it is overfitting afterwards. The batch sizes of training and validation are 6 and 32 respectively. The model achieved a CIDEr-D score of 75.37 on the [test dataset](https://eval.ai/web/challenges/challenge-page/739/leaderboard/2006).

Clone this repo to your directory on the SCC DS598 project space, e.g.
`/projectnb/ds598/students/<userid>`.

Once you have a training script setup, create a shell script, e.g. `train.sh`,
that loads and activates a conda environment and then runs your training
script. An example shell script is below.

```sh
#!/bin/bash -l

# Set SCC project
#$ -P ds598

# load and activate the academic-ml conda environment on SCC
module load miniconda
module load academic-ml/spring-2024
conda activate spring-2024-pyt

# Add the path to your source project directory to the python search path
# so that the local `import` commands will work.
export PYTHONPATH="/projectnb/ds598/students/<userid>/<yourdir>:$PYTHONPATH"

# Update this path to point to your training file
python path/to/train.py

# After updating the two paths above, run the command below from an SCC
# command prompt in the same directory as this file to submit this as a
# batch job.
### qsub -pe omp 4 -P ds598 -l gpus=1 train.sh
```

Note that there are train and test scripts for the two folders already.

## Run Example Scripts

When you run the example scripts, make sure to add the path to the repo
folder before running the script.
## Dataset

```export PYTHONPATH="/projectnb/ds598/path/to/folder:$PYTHONPATH"```
The dataset used in this project is the VizWiz-Captions dataset, which includes 39,181 images sourced from individuals who are blind. Each image is accompanied by 5 descriptive captions.

The example shell scripts include this command.
Download the dataset from the website [VizWiz Image Captioning dataset](https://vizwiz.org/tasks-and-datasets/image-captioning/) and update the paths of annotation_file and image_folder in `src/base/dataset.py`.

## Evaluation

Set the paths in `src/base/constants.py` to the correct paths on your system.
In the VizWiz challenge evaluation they refer to five different evaluation metrics although they use CIDr-D as their primary evaluation.

Follow the .sh files to run the code. As an example, to run the `cnnlstm_train.sh`
script, you would run at the command prompt from the base of your local repo
folder:
They reference the BLUE metric, but there are limitations to that metric as described in [2] below.

```sh
$ qsub -pe omp 4 -P ds598 -l gpus=1 cnnlstm_train.sh
Your job 5437870 ("cnnlstm_train.sh") has been submitted
```
As shown, you should get notification that your job was submitted and get a
job ID number.
### Validation Results

You can check your job status by typing:
At Epoch 7, the training loss was 1.3944. The performance scores for this epoch are as follows:

```sh
$ qstat -u <userid>
ob-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
5437870 0.00000 cnnlstm_tr tgardos qw 03/14/2024 09:40:24
```
| Metric | Score |
|---------|---------|
| BLEU-1 | 0.6757 |
| BLEU-2 | 0.4938 |
| BLEU-3 | 0.3489 |
| BLEU-4 | 0.2419 |
| **CIDEr** | **0.7261** |

The above is showing the example output from user `tgardos`.
Here are two examples of the model's predictions:

## Dataset
Good example:

The dataset is downloaded to
`/projectnb/ds598/materials/datasets/vizwiz/captions`. There is no need to
download the dataset again and the path has already been defined in the
accompanying code.
![good example](https://i.postimg.cc/HWbHNZyJ/good-example.png)

## Evaluation
Bad example:

In the VizWiz challenge evaluation they refer to five different evaluation
metrics although they use CIDr-D as their primary evaluation.
![bad example](https://i.postimg.cc/qqcTCqTc/bad-example.png)

They reference the BLUE metric, but there are limitations to that metric as
described in [2] below.
### Test Results

### Validation Results
I submitted my test results to the VizWiz Image Captioning [Evaluation Server](https://eval.ai/web/challenges/challenge-page/739/overview). Here are the performance scores obtained:

Validation set results are reported in the CNN-LSTM example and code for reporting validation results are in the demo model code.
| Metric | Score |
|---------|-------|
| BLEU-1 | 68.49 |
| BLEU-2 | 50.20 |
| BLEU-3 | 35.68 |
| BLEU-4 | 24.89 |
| ROUGE-L | 48.51 |
| METEOR | 22.06 |
| **CIDEr** | **75.37** |
| SPICE | 17.48 |

### Test Results
## Implementation Suggestions

As is typically the case, the test dataset labels are withheld, and so the only way to get test results is to produce predicted captions and
then submit them to the VizWiz Image Captioning [Evaluation Server](https://eval.ai/web/challenges/challenge-page/739/overview). There are
scripts in both model directories to create the test submission file, although the demo model test script will have to be updated with model
information.
1. Explore trending image-to-text models on the [huggingface repository](https://huggingface.co/models?pipeline_tag=image-to-text&sort=trending) for alternatives, and feed dataset images into the reference API to evaluate the pre-trained models' outputs.

Create an account on the [Evaluation Server](https://eval.ai/web/challenges/challenge-page/739/overview) and submit your test predictions
to get your result.
2. The default learning rates for optimizers such as SGD, Adam, and AdamW are too high for fine-tuning, potentially leading to similar outputs across different inputs. It is recommended to adjust the learning rate to between 1e-5 and 5e-5.

Step-by-step instructions will be added here shortly.
## Limitation and Reflection
1. Facing with challenges such as debugging empty predictions, CUDA version mismatches, limited computational resources, and long training times, my experimentation was limited to a few models such as [blip-image-captioning-base model](https://huggingface.co/Salesforce/blip-image-captioning-base), [blip-image-captioning-large model](https://huggingface.co/Salesforce/blip-image-captioning-large), and [git-base](https://huggingface.co/microsoft/git-base) for fine-tuning.

State-of-the-art CIDEr-D scores on VizWiz Image Captioning is ~125. We're asking that you get a **minimum CIDEr-D test score of 50**.
2. I didn't try methods like data augmentation and dropout that could have potentially improved the model's robustness and generalization capabilities.

## References

1. [CIDEr: Consensus-based image description evaluation](https://ieeexplore.ieee.org/document/7299087)
2. [BLEU: A Misunderstood Metric from Another Age](https://towardsdatascience.com/bleu-a-misunderstood-metric-from-another-age-d434e18f1b37), Medium Post
3. [BLEU Metric](https://huggingface.co/spaces/evaluate-metric/bleu), HuggingFace space
4. [image-to-text models](https://huggingface.co/models?pipeline_tag=image-to-text&sort=trending)
5. [image_captioning](https://huggingface.co/docs/transformers/main/en/tasks/image_captioning)
6. [BlipForConditionalGeneration](https://huggingface.co/docs/transformers/en/model_doc/blip#transformers.BlipForConditionalGeneration)



2 changes: 1 addition & 1 deletion cnnlstm_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ module load academic-ml/spring-2024
conda activate spring-2024-pyt

# Change this path to point to your project directory
export PYTHONPATH="/projectnb/ds598/admin/tgardos/sp2024_midterm:$PYTHONPATH"
PYTHONPATH="/projectnb/ds598/students/lilinj/sp2024_midterm:$PYTHONPATH"

#python -m spacy download en_core_web_sm # download spacy model
python src/cnn_lstm/test.py
Expand Down
4 changes: 2 additions & 2 deletions cnnlstm_train.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ module load academic-ml/spring-2024
conda activate spring-2024-pyt

# Change this path to point to your project directory
export PYTHONPATH="/projectnb/ds598/admin/tgardos/sp2024_midterm:$PYTHONPATH" # Set this!!!
PYTHONPATH="/projectnb/ds598/students/lilinj/sp2024_midterm:$PYTHONPATH" # Set this!!!

python -m spacy download en_core_web_sm # download spacy model
#python -m spacy download en_core_web_sm # download spacy model
python src/cnn_lstm/train.py

### The command below is used to submit the job to the cluster
Expand Down
7 changes: 4 additions & 3 deletions demo_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,10 @@ module load academic-ml/spring-2024
conda activate spring-2024-pyt

# Change this path to point to your project directory
export PYTHONPATH="/projectnb/ds598/admin/tgardos/sp2024_midterm:$PYTHONPATH" # Set this!!!
export PYTHONPATH="/projectnb/ds598/students/lilinj/sp2024_midterm:$PYTHONPATH" # Set this!!!

python src/demo_model/test.py

### The command below is used to submit the job to the cluster
### qsub -pe omp 4 -P ds598 -l gpus=1 git_test.sh
### The commands below are used to submit the job to the cluster
### qsub -pe omp 4 -P ds598 -l gpus=1 demo_test.sh
### qsub -l gpus=1 -l gpu_c=7.0 -pe omp 8 demo_test.sh
6 changes: 4 additions & 2 deletions demo_train.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@ module load academic-ml/spring-2024
conda activate spring-2024-pyt

# Change this path to point to your project directory
export PYTHONPATH="/projectnb/ds598/admin/tgardos/sp2024_midterm:$PYTHONPATH"
export PYTHONPATH="/projectnb/ds598/students/lilinj/sp2024_midterm:$PYTHONPATH"

#python -m spacy download en_core_web_sm # download spacy model
python src/demo_model/train.py

### The command below is used to submit the job to the cluster
### The commands below are used to submit the job to the cluster
### qsub -pe omp 4 -P ds598 -l gpus=1 demo_train.sh
### qsub -l gpus=1 -l gpu_c=7.0 -pe omp 8 demo_train.sh
2 changes: 1 addition & 1 deletion src/base/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import spacy

# set this path to where you want to save results
BASE_DIR = "/projectnb/ds598/projects/tgardos/sp2024_midterm/"
BASE_DIR = "/projectnb/ds598/students/lilinj/sp2024_midterm/"

# Do not edit. This points to the dataset folder
DATA_BASE_DIR = "/projectnb/ds598/materials/datasets/vizwiz/captions/"
Expand Down
15 changes: 9 additions & 6 deletions src/demo_model/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
from src.base.vizwiz_eval_cap.eval import VizWizEvalCap
from dataset import DemoDataset
from tqdm import tqdm
from transformers import AutoProcessor
from transformers import AutoModelForCausalLM
from transformers import BlipProcessor
from transformers import BlipForConditionalGeneration
from PIL import Image
import matplotlib.pyplot as plt
import os
Expand All @@ -20,10 +20,11 @@
create_directory(DEMO_SAVE_PATH + "/examples")

# The path below points to the location where the model was saved
MODEL_PATH = f"{DEMO_SAVE_PATH}/best_model"
MODEL_PATH = f"{DEMO_SAVE_PATH}/best_model_0"

# Load your fine tuned model
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, cache_dir=CACHE_DIR)
#model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, cache_dir=CACHE_DIR)
model = BlipForConditionalGeneration.from_pretrained(MODEL_PATH, cache_dir=CACHE_DIR)

## TODO
# You can use the AutoProcessor.from_pretrained() method to load the HuggingFace
Expand All @@ -33,7 +34,9 @@
#
# Of course you should use the same model you trained with.
try:
processor = AutoProcessor.from_pretrained("replace-with-model-choice", cache_dir=CACHE_DIR)
#processor = AutoProcessor.from_pretrained("replace-with-model-choice", cache_dir=CACHE_DIR)
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base", cache_dir=CACHE_DIR)

except Exception as e:
print("You need to pick a pre-trained model from HuggingFace.")
print("Exception: ", e)
Expand Down Expand Up @@ -70,7 +73,7 @@
{"image_id": img_id.item(), "caption": caption}
) # Used for VizWizEvalCap

with open(DEMO_SAVE_PATH + "/test_captions.json", "w") as f:
with open(DEMO_SAVE_PATH + "/test_captions_0.json", "w") as f:
json.dump(caption_val, f, indent=4)

print("Test captions saved to disk!!")
Loading