LearningSA: Learning Semantic Alignment using Global Features and Multi-scale Confidence
Huaiyuan Xu, Jing Liao, Huaping Liu, and Yuxiang Sun*
Semantic alignment aims to establish pixel correspondences between images based on semantic consistency. It can serve as a fundamental component for various downstream computer vision tasks, such as style transfer and exemplar-based colorization, etc. Many existing methods use local features and their cosine similarities to infer semantic alignment. However, they struggle with significant intra-class variation of objects, such as appearance, size, etc. In other words, contents with the same semantics tend to be significantly different in vision. To address this issue, we propose a novel deep neural network of which the core lies in global feature enhancement and adaptive multi-scale inference. Specifically, two modules are proposed: an enhancement transformer for enhancing semantic features with global awareness; a probabilistic correlation module for adaptively fusing multi-scale information based on the learned confidence scores. We use the unified network architecture to achieve two types of semantic alignment, namely, cross-object semantic alignment and cross-domain semantic alignment. Experimental results demonstrate that our method achieves competitive performance on five standard cross-object semantic alignment benchmarks, and outperforms the state of the arts in cross-domain semantic alignment.
Method Pipeline:
Category: person
step 1. Please prepare environment as that in Docker.
step 2. Prepare LearningSA repo by.
git clone https://github.com/lab-sun/LearningSA.git
cd LearningSAstep 3. Download data and arrange the folder as:
LearningSA/
└── data
├── code
├── crodom
├── PF_Pascal
└── PF-dataset
bbox_test_pairs_pf_pascal.csv
bbox_test_pairs_pf.csv
bbox_val_pairs_pf_pascal.csv
train_pairs_pf_pascal.csvpython train.py
# if specify gpu:xx for training
python train.py --gpu xx# PF-PASCAL
python image.py
# PF-WILLOW
python image_willow.pypython image.py --visDownload checkpoints and save them to the folder weights:
| Config | PCK(0.05) | PCK(0.10) | Model |
|---|---|---|---|
| best_checkpoint_pascal | 81.4 | 93.4 | gdrive |
| best_checkpoint_willow | 55.6 | 80.4 | gdrive |
Style transfer, which transfers local styles from an exemplar image to regions of the input image with the same semantics:
Semantic alignment-based image morphing:
If this work is helpful for your research, please consider citing the following BibTeX entries.
@ARTICLE{xu2024learning,
author={Huaiyuan Xu and Jing Liao and Huaping Liu and Yuxiang Sun},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
title={Learning Semantic Alignment Using Global Features and Multi-Scale Confidence},
year={2024},
volume={34},
number={2},
pages={897-910},
doi={10.1109/TCSVT.2023.3288370}}









