Skip to content

lab-sun/LearningSA

Repository files navigation

LearningSA

| Paper | Data|


LearningSA: Learning Semantic Alignment using Global Features and Multi-scale Confidence
Huaiyuan Xu, Jing Liao, Huaping Liu, and Yuxiang Sun*

Introduction

Semantic alignment aims to establish pixel correspondences between images based on semantic consistency. It can serve as a fundamental component for various downstream computer vision tasks, such as style transfer and exemplar-based colorization, etc. Many existing methods use local features and their cosine similarities to infer semantic alignment. However, they struggle with significant intra-class variation of objects, such as appearance, size, etc. In other words, contents with the same semantics tend to be significantly different in vision. To address this issue, we propose a novel deep neural network of which the core lies in global feature enhancement and adaptive multi-scale inference. Specifically, two modules are proposed: an enhancement transformer for enhancing semantic features with global awareness; a probabilistic correlation module for adaptively fusing multi-scale information based on the learned confidence scores. We use the unified network architecture to achieve two types of semantic alignment, namely, cross-object semantic alignment and cross-domain semantic alignment. Experimental results demonstrate that our method achieves competitive performance on five standard cross-object semantic alignment benchmarks, and outperforms the state of the arts in cross-domain semantic alignment.

Method

Method Pipeline:

Samples of the CroDom Dataset

Category: person

Get Started

Installation and Data Preparation

step 1. Please prepare environment as that in Docker.

step 2. Prepare LearningSA repo by.

git clone https://github.com/lab-sun/LearningSA.git
cd LearningSA

step 3. Download data and arrange the folder as:

LearningSA/
└── data
    ├── code 
    ├── crodom
    ├── PF_Pascal
    └── PF-dataset
    bbox_test_pairs_pf_pascal.csv
    bbox_test_pairs_pf.csv
    bbox_val_pairs_pf_pascal.csv
    train_pairs_pf_pascal.csv

Train model

python train.py
# if specify gpu:xx for training
python train.py --gpu xx

Test model

# PF-PASCAL
python image.py
# PF-WILLOW
python image_willow.py

Visualize the predicted result

python image.py --vis

Checkpoints

Download checkpoints and save them to the folder weights:

Config PCK(0.05) PCK(0.10) Model
best_checkpoint_pascal 81.4 93.4 gdrive
best_checkpoint_willow 55.6 80.4 gdrive

Applications

Style transfer, which transfers local styles from an exemplar image to regions of the input image with the same semantics:

Semantic alignment-based image morphing:

Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entries.

@ARTICLE{xu2024learning,
  author={Huaiyuan Xu and Jing Liao and Huaping Liu and Yuxiang Sun},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Learning Semantic Alignment Using Global Features and Multi-Scale Confidence}, 
  year={2024},
  volume={34},
  number={2},
  pages={897-910},
  doi={10.1109/TCSVT.2023.3288370}}

About

[TSCVT 2024] Learning Semantic Alignment Using Global Features and Multi-Scale Confidence

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors