Note: We have retained the original title of our submission. Since modifying the title requires explicit approval from the Program Chairs (PC), the title remains unchanged to ensure consistency with our official submission.
Training-free open-vocabulary semantic segmentation (OVSS) promises rapid adaptation to new label sets without retraining. Yet, many methods rely on heavy post-processing or handle text and vision in isolation, leaving cross-modal geometry underutilized. Others introduce auxiliary vision backbones or multi-model pipelines, which increase complexity and latency while compromising design simplicity. We present PEARL, Procrustes alignment with text-aware Laplacian propagation, a compact two-step inference that follows an align-then-propagate principle. The Procrustes alignment step performs an orthogonal projection inside the last self-attention block, rotating keys toward the query subspace via a stable polar iteration. The text-aware Laplacian propagation then refines per-pixel logits on a small grid through a confidence-weighted, text-guided graph solve: text provides both a data-trust signal and neighbor gating, while image gradients preserve boundaries. In this work, our method is fully training-free, plug-and-play, and uses only fixed constants, adding minimal latency with a small per-head projection and a few conjugate-gradient steps. Our approach, PEARL, sets a new state-of-the-art in training-free OVSS without extra data or auxiliary backbones across standard benchmarks, achieving superior performance under both with-background and without-background protocols.
This project is built upon PyTorch 2.4.1 and MMSegmentation. Please follow the steps below to set up the environment.
We recommend using Anaconda to create a new virtual environment:
conda create -n pearl python=3.10 -y
conda activate pearlPlease install the dependencies with the specified versions to ensure proper functionality:
Install PyTorch (CUDA 11.8):
pip install torch==2.4.1+cu118 torchvision==0.19.1+cu118 --index-url [https://download.pytorch.org/whl/cu118](https://download.pytorch.org/whl/cu118)Install OpenMMLab Libraries:
# Install MMEngine
pip install mmengine==0.10.7
# Install MMCV (Pre-built for PyTorch 2.4.0 and CUDA 11.8)
pip install mmcv==2.2.0 -f [https://download.openmmlab.com/mmcv/dist/cu118/torch2.4.0/index.html](https://download.openmmlab.com/mmcv/dist/cu118/torch2.4.0/index.html)
# Install MMSegmentation
pip install mmsegmentation==1.2.2Please download and extract the corresponding datasets (following the MMSegmentation data preparation document). This project supports the following benchmarks:
- PASCAL VOC (voc20, voc21)
- PASCAL Context (context59, context60)
- COCO (coco_object, coco_stuff164k)
- Cityscapes (city_scapes)
- ADE20K (ade20k)
After downloading and extracting the data, you need to modify the data_root path in the corresponding configuration files within the configs folder.
Open the configuration file (e.g., configs/cfg_voc21.py), find the data_root variable, and change it to your local dataset directory:
# Example
data_root = '/path/to/your/dataset/directory'We provide a one-click script run.sh that automatically evaluates on all supported datasets.
Run the script directly to use the default configuration (PEARL, TLP enabled, using 1 GPU):
bash run.shThe script accepts arguments to modify the runtime settings:
bash run.sh [ATTN_TYPE] [TLP_SWITCH] [NGPUS] [LOGFILE]- ATTN_TYPE: Attention/Method type (Default:
"pearl", ['pearl', 'vanilla']) - TLP_SWITCH: Whether to enable TLP (Default:
"on", ['on', 'off']) - NGPUS: Number of GPUs to use (Default:
1) - LOGFILE: Log output file (Default:
"results.txt")
Running Examples:
-
Run PEARL with accelerated inference using 4 GPUs:
bash run.sh pearl on 4 results_pearl.txt
-
Disable TLP and Enable PA:
bash run.sh pearl off 1 results_pa.txt
-
Disable PA (use vanilla attention) and Enable TLP:
bash run.sh vanilla off 1 results_pa.txt
After the script finishes running, summarize_seg_metrics.py will be automatically called to summarize the segmentation metrics for all test datasets.
We explicitly thank the authors of the following excellent open-source projects, which were instrumental in this work:
- SCLIP: https://github.com/wangf3014/SCLIP
- NACLIP: https://github.com/sinahmr/NACLIP
- MMSegmentation: https://github.com/open-mmlab/mmsegmentation
If you find this repository useful, please consider citing:
@article{pei2026pearl,
title={PEARL: Geometry Aligns Semantics for Training-Free Open-Vocabulary Semantic Segmentation},
author={Pei, Gensheng and Jiang, Xiruo and Cai, Xinhao and Chen, Tao and Yao, Yazhou and Jeon, Byeungwoo},
year={2026},
journal={arXiv preprint arXiv:2603.21528},
}



