Open vocabulary semantic segmentation

Open vocabulary semantic segmentation

Project for Trends and application of computer vision at the University of Trento A.Y.2024/2025

Developed by: Gheser Amir, Roman Simone and Mascherin Matteo

Project Description
- SAN
- SAM
- Experiments
Installation
Running the project
- Evaluating SAM with Our Pipeline
- Evaluating SAN

Project Description

For this project, our goal is to explore advanced methods for open-vocabulary semantic segmentation (OVSS), aiming to segment images into regions defined by arbitrary textual concepts. Our research investigates two main approaches: SAN (Side Network) and SAM (Segment Anything), comparing their performance and adaptability in OVSS tasks.

SAN

The Side Adapter Network (SAN) is a lightweight framework designed for open-vocabulary semantic segmentation, leveraging CLIP's pre-trained vision-language capabilities. SAN models segmentation as a region recognition task by attaching a side network to CLIP with two branches: one for mask proposals and the other for attention bias, enabling CLIP-aware segmentation. Its end-to-end training maximizes adaptation to CLIP, ensuring accurate, efficient predictions. Compared to alternatives, SAN achieves state-of-the-art performance with up to 18x fewer parameters and 19x faster inference. It excels in resource efficiency while delivering high-quality segmentation across diverse datasets.

SAM

Segment anything (SAM) is the state of the art AI framework for object segmentation across diverse domains. To adapt SAM to OVSS task we propose a two-stage approach where SAM acts as a class-agnostic mask generator, and Alpha-CLIP is employed for mask classification. Post-processing techniques, such as BBox filtering and background adjustments, refine the mask proposals for enhanced segmentation accuracy in open-vocabulary settings.

Experiments

We conducted the following experiments to evaluate the performance of our pipeline:

Pipeline Post-Processing Analysis:
We tested the pipeline using various types of post-processing techniques to determine their impact on the overall performance.
Model Evaluation with Different Datasets and Vocabularies:
- We explored the effectiveness of SAN (Side-Adapter Network) and SAM (Segment Anything Models) across multiple datasets.
- For each dataset, we used two different vocabulary sources:
  - Caption-generated Vocabulary: Derived from captions generated by the BLIP-2 model.
  - Label-based Vocabulary: Created from predefined dataset labels.

Installation

Warning

To run the experiments you need python 3.10

Install dependencies

In order to install all the dependencies launch this command:

sh setup.sh

Download dataset

First configure appropiately the 'datasets.yaml' file. Download the missing values and then run the following commands:

python download_dataset.py
# After manually getting the missing values
sh preprocess_dataset.sh

Download models

SAN -- model zoo -- default
SAM -- model zoo -- default
AlphaCLIP -- model zoo -- default

Note

AlphaCLIP only has google drive link working, so you need to download it manually and place it in the 'models' folder.

Running the project

You can find examples of how to use SAN (Side Adapter Network) and SAM (Segment Anything Model) in the notebook directory. These examples demonstrate practical implementations and workflows for applying these models effectively.

Evaluating SAM with Our Pipeline

To evaluate SAM using our pipeline, follow these steps:

Browse the configs directory and select the preferred configuration file that suits your dataset and vocabulary requirements.
Launch the pipeline using the following command:

 python sam_pipeline.py

Evaluating SAN

To evaluate SAN, on ADE20K dataset, using a custom vocabulary, follow these steps:

First, you need to slightly change the inference_on_dataset function in the evaluator.py file inside detectron2 in order to perform predictions with a custom vocabulary using the SAN model.

You can find the file in the following path: your_venv/lib/python3.10/site-packages/detectron2/evaluation/evaluator.py. And then you need to change the inference_on_dataset function as shown in the following images:

Browse the dataset/captions_val directory and select the preferred vocabulary you want to test on (e.g., nouns_ade_filtered.pkl or nouns_coco_filtered.pkl, for nouns extracted and filtered from ADE20K and COCO datasets, respectively).
Launch the SAN evaluation from the SAN directory using the following command:

cd SAN
python eval_net.py --eval-only --config-file configs/san_clip_vit_res4_coco.yaml --vocabulary ../datasets/captions_val/nouns_ade_filtered.pkl  OUTPUT_DIR ../output/[Name of the output folder] MODEL.WEIGHTS ../checkpoints/san_vit_b_16.pth DATASETS.TEST "('ade20k_full_sem_seg_val',)"

Adjust the --vocabulary parameter to the desired vocabulary file and the OUTPUT_DIR parameter to the desired output folder name. Take a look to SAN official repository for more information.

Contacts

For any inquiries, feel free to contact:

Simone Roman - simone.roman@studenti.unitn.it
Amir Gheser - amir.gheser@studenti.unitn.it
Matteo Mascherin - matteo.mascherin@studenti.unitn.it

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.vscode		.vscode
AlphaCLIP @ 3457474		AlphaCLIP @ 3457474
CLIP @ dcba3cb		CLIP @ dcba3cb
SAN		SAN
configs		configs
datasets		datasets
detail-api		detail-api
images		images
models		models
notebooks		notebooks
output		output
tools		tools
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
datasets.yaml		datasets.yaml
download_dataset.py		download_dataset.py
generate_captions.py		generate_captions.py
preprocess_dataset.sh		preprocess_dataset.sh
process_caption.py		process_caption.py
process_results_subsetADE.py		process_results_subsetADE.py
requirements.txt		requirements.txt
sam_pipeline.py		sam_pipeline.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open vocabulary semantic segmentation

Project Description

SAN

SAM

Experiments

Installation

Install dependencies

Download dataset

Download models

Running the project

Evaluating SAM with Our Pipeline

Evaluating SAN

Contacts

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Open vocabulary semantic segmentation

Project Description

SAN

SAM

Experiments

Installation

Install dependencies

Download dataset

Download models

Running the project

Evaluating SAM with Our Pipeline

Evaluating SAN

Contacts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages