Skip to content

rogergheser/VF-SemanticSegmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

120 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python PyTorch OpenCV

Open vocabulary semantic segmentation

Project for Trends and application of computer vision at the University of Trento A.Y.2024/2025

Developed by: Gheser Amir, Roman Simone and Mascherin Matteo


Project Description

For this project, our goal is to explore advanced methods for open-vocabulary semantic segmentation (OVSS), aiming to segment images into regions defined by arbitrary textual concepts. Our research investigates two main approaches: SAN (Side Network) and SAM (Segment Anything), comparing their performance and adaptability in OVSS tasks.

OVSS

SAN

The Side Adapter Network (SAN) is a lightweight framework designed for open-vocabulary semantic segmentation, leveraging CLIP's pre-trained vision-language capabilities. SAN models segmentation as a region recognition task by attaching a side network to CLIP with two branches: one for mask proposals and the other for attention bias, enabling CLIP-aware segmentation. Its end-to-end training maximizes adaptation to CLIP, ensuring accurate, efficient predictions. Compared to alternatives, SAN achieves state-of-the-art performance with up to 18x fewer parameters and 19x faster inference. It excels in resource efficiency while delivering high-quality segmentation across diverse datasets.

SAM

Segment anything (SAM) is the state of the art AI framework for object segmentation across diverse domains. To adapt SAM to OVSS task we propose a two-stage approach where SAM acts as a class-agnostic mask generator, and Alpha-CLIP is employed for mask classification. Post-processing techniques, such as BBox filtering and background adjustments, refine the mask proposals for enhanced segmentation accuracy in open-vocabulary settings.

SAM pipeline

Experiments

We conducted the following experiments to evaluate the performance of our pipeline:

  1. Pipeline Post-Processing Analysis:
    We tested the pipeline using various types of post-processing techniques to determine their impact on the overall performance.

  2. Model Evaluation with Different Datasets and Vocabularies:

    • We explored the effectiveness of SAN (Side-Adapter Network) and SAM (Segment Anything Models) across multiple datasets.
    • For each dataset, we used two different vocabulary sources:
      • Caption-generated Vocabulary: Derived from captions generated by the BLIP-2 model.
      • Label-based Vocabulary: Created from predefined dataset labels.

Installation

Warning

To run the experiments you need python 3.10

Install dependencies

In order to install all the dependencies launch this command:

sh setup.sh

Download dataset

First configure appropiately the 'datasets.yaml' file. Download the missing values and then run the following commands:

python download_dataset.py
# After manually getting the missing values
sh preprocess_dataset.sh

Download models

Note

AlphaCLIP only has google drive link working, so you need to download it manually and place it in the 'models' folder.

Running the project

You can find examples of how to use SAN (Side Adapter Network) and SAM (Segment Anything Model) in the notebook directory. These examples demonstrate practical implementations and workflows for applying these models effectively.

Evaluating SAM with Our Pipeline

To evaluate SAM using our pipeline, follow these steps:

  1. Browse the configs directory and select the preferred configuration file that suits your dataset and vocabulary requirements.
  2. Launch the pipeline using the following command:
 python sam_pipeline.py

Evaluating SAN

To evaluate SAN, on ADE20K dataset, using a custom vocabulary, follow these steps:

  1. First, you need to slightly change the inference_on_dataset function in the evaluator.py file inside detectron2 in order to perform predictions with a custom vocabulary using the SAN model.

You can find the file in the following path: your_venv/lib/python3.10/site-packages/detectron2/evaluation/evaluator.py. And then you need to change the inference_on_dataset function as shown in the following images:

Code changes

  1. Browse the dataset/captions_val directory and select the preferred vocabulary you want to test on (e.g., nouns_ade_filtered.pkl or nouns_coco_filtered.pkl, for nouns extracted and filtered from ADE20K and COCO datasets, respectively).

  2. Launch the SAN evaluation from the SAN directory using the following command:

cd SAN
python eval_net.py --eval-only --config-file configs/san_clip_vit_res4_coco.yaml --vocabulary ../datasets/captions_val/nouns_ade_filtered.pkl  OUTPUT_DIR ../output/[Name of the output folder] MODEL.WEIGHTS ../checkpoints/san_vit_b_16.pth DATASETS.TEST "('ade20k_full_sem_seg_val',)"                            

Adjust the --vocabulary parameter to the desired vocabulary file and the OUTPUT_DIR parameter to the desired output folder name. Take a look to SAN official repository for more information.

Contacts

For any inquiries, feel free to contact:


About

Vocabulary-Free Semantic Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors