Making Better Mistakes in CLIP-Based Zero-Shot Classification with Hierarchy-Aware Language Prompts

The proposed CLIP-based zero-shot classification process

Our approach queries an LLM to generate descriptions of the classes in the downstream tasks based on the associated label hierarchies of the downstream tasks for the subsequent CLIP-based zero-shot classification.

Installation

Our code requires python=3.10 and Ubuntu 20.04.6 LTS (gemini APIs requires python>=3.10 to function properly).

pip install -r requirements.txt

Setup configuration file: data_paths.yml

You need to replace "/path/to/code" in data_paths.yml with your current working directory where main.py of this code repo resides.

Change the openai-key, anthropic-key, and google-key in data_paths.yml to point to your text files storing the api-keys for OpenAI, Anthropic, and Google AI Studio.

Change the dataset-home-dir* in data_paths.yml to a directory that will be used to store the downloaded original datasets.

Change the food-101, ucf-101, cub-200, sun-324, and imagenet in data_paths.yml to point to directories that will store the val/test partitions of datasets employed in our experiments.

Datasets

go to custom_datasets and run the associated dataset preparation scripts. Some datasets need manual download of the original datasets to the configured dataset-home-dir first as specified below. The original datasets occupies ~138 GB of storage, our processed val/test partitions takes ~27 GB of storage space.

cd ./custom_datasets

# food-101
python food101.py

# ucf-101, need manual download of the original dataset, detailed instructions in ucf101.py
python ucf101.py

# cub-200
python cub200.py

# sun-324
python sun324.py

# imagenet-1k, need manual download of the original dataset, detailed instructions in imagenet.py
python imagenet.py

Label hierarchies

The label hierarchy (tree) of each dataset and the associated hierarchical distance files are stored at:

./trees

The associated pretty print of tree structures are stored at:

./tree_viz

The text file containing pretty print of label hierarchy requires proper zoom out to show the structure correctly.

Query LLM to generate image prompts

Once configuration file is set with valid api keys and datasets are prepared, run the following bash script. This may take +20 hours depending on your request rate limits with the associated APIs. Note: To avoid overwriting the image prompts we provided in ./image_prompts, you need to change the existing ./image_prompts folder to a different name.

# generate CuPL image prompts, querying gpt-3.5-turbo
bash bash_LLM_query_cupl.sh

# generate VCD image prompts, querying gpt-3.5-turbo
bash bash_LLM_query_vcd.sh

# generate HIE (HieC and HieT) image prompts, querying gpt-3.5-turbo
bash bash_LLM_query_hie.sh 

# generate the proposed image prompts, this involves querying gpt-3.5-turbo, claude-3.5-sonnet, gemini-1.5-flash
bash bash_LLM_query_ours.sh

CLIP (image-text contrastive loss) v.s. ViT (cross-entropy loss) error structure comparison

To generate the prediction results of CLIP and ViTs provided by PyTorch, you need to donwload the validation set of ImageNet and the associated development toolkit first (please see pretrained_vit.py comments for more information). The validation set of ImageNet is purely used to acquire the classname to numeric label map of PyTorch ViT models. To generate the prediction results of ViTs, run:

bash bash_vits_inference.sh

To generate the histogram visualization of mistake severities, run the jupyter notebook: ./notebooks/CLIP_vs_ViTs_severity.ipynb

Finetuning score offset hyper-parameter for HIE method on validation set

To find the proper score offset parameter employed by HIE method in our comparison, run:

bash bash_hie_finetune.sh

Zero-shot classification

To produce the evaluation results in our main comparison, run:

bash bash_comp_results.sh

Ablation study

To produce the ablation study results, run:

bash bash_ablation.sh

Ensemble methods

To produce the comparison results of two different ensemble methods, run:

bash bash_ensemble_method.sh

Transferability of language prompts

To produce the classification results of image prompts acuqired from different language models, run:

bash bash_lang_prompts_transfer.sh

Acknowledgement

This codebase is partially refactored from the following GitHub repositories:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Making Better Mistakes in CLIP-Based Zero-Shot Classification with Hierarchy-Aware Language Prompts

The proposed CLIP-based zero-shot classification process

Installation

Setup configuration file: data_paths.yml

Datasets

Label hierarchies

Query LLM to generate image prompts

CLIP (image-text contrastive loss) v.s. ViT (cross-entropy loss) error structure comparison

Finetuning score offset hyper-parameter for HIE method on validation set

Zero-shot classification

Ablation study

Ensemble methods

Transferability of language prompts

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
custom_datasets		custom_datasets
image_prompts		image_prompts
lang_prompts		lang_prompts
models		models
notebooks		notebooks
tree_viz		tree_viz
trees		trees
utils		utils
README.md		README.md
bash_LLM_query_cupl.sh		bash_LLM_query_cupl.sh
bash_LLM_query_hie.sh		bash_LLM_query_hie.sh
bash_LLM_query_ours.sh		bash_LLM_query_ours.sh
bash_LLM_query_vcd.sh		bash_LLM_query_vcd.sh
bash_ablation.sh		bash_ablation.sh
bash_comp_results.sh		bash_comp_results.sh
bash_ensemble_method.sh		bash_ensemble_method.sh
bash_hie_finetune.sh		bash_hie_finetune.sh
bash_lang_prompts_transfer.sh		bash_lang_prompts_transfer.sh
bash_vits_inference.sh		bash_vits_inference.sh
data_paths.yml		data_paths.yml
gen_cupl_chatgpt_prompts.py		gen_cupl_chatgpt_prompts.py
gen_hie_chatgpt_prompts.py		gen_hie_chatgpt_prompts.py
gen_ours_chatgpt_prompts.py		gen_ours_chatgpt_prompts.py
gen_ours_claude_prompts.py		gen_ours_claude_prompts.py
gen_ours_gemini_prompts.py		gen_ours_gemini_prompts.py
gen_vcd_chatgpt_prompts.py		gen_vcd_chatgpt_prompts.py
main.py		main.py
merge.py		merge.py
pretrained_vit.py		pretrained_vit.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Making Better Mistakes in CLIP-Based Zero-Shot Classification with Hierarchy-Aware Language Prompts

The proposed CLIP-based zero-shot classification process

Installation

Setup configuration file: data_paths.yml

Datasets

Label hierarchies

Query LLM to generate image prompts

CLIP (image-text contrastive loss) v.s. ViT (cross-entropy loss) error structure comparison

Finetuning score offset hyper-parameter for HIE method on validation set

Zero-shot classification

Ablation study

Ensemble methods

Transferability of language prompts

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages