Skip to content

dsb-ifi/dHT

Repository files navigation

$\text{Differentiable Hierarchical Visual Tokenization}$

Marius Aasan$^1$, Martine Hjelkrem-Tan$^1$, Nico Catalano$^2$, Chankyu Choi$^3$, Adín Ramírez Rivera$^1$

${}^1\underset{\text{Department of Informatics}}{\text{University of Oslo}}$ $\hspace{1em}$ ${}^2\underset{\text{Artificial Intelligence and Robotics Lab}}{\text{Polytechnic University of Milan}}$ $\hspace{1em}$ ${}^3\underset{\text{Department of Physics and Technology}}{\text{UiT The Arctic University of Norway}}$

Website PaperArxiv PaperNeurIPS SpotlightNeurIPS NotebookR2V

dHT Figure 1 dHT Figure 1

$\text{Abstract}$

Vision Transformers rely on fixed patch tokens that ignore the spatial and semantic structure of images. In this work, we introduce an end-to-end differentiable tokenizer that adapts to image content with pixel-level granularity while remaining backward-compatible with existing architectures for retrofitting pretrained models. Our method uses hierarchical model selection with information criteria to provide competitive performance in both image-level classification and dense-prediction tasks, and even supports out-of-the-box raster-to-vector conversion.

$\partial\text{HT}$: Differentiable Hierarchical Visual Tokenization

This repo contains code for Differentiable Hierarchical Visual Tokenization, accepted as a spotlight paper for NeurIPS 2025.

For an introduction to our work, visit the project webpage.

Installation

The repo can currently be installed as a package via:

# HTTPS
pip install git+https://github.com/dsb-ifi/dHT.git

# SSH
pip install git+ssh://git@github.com/dsb-ifi/dHT.git

Loading models

You can load the Superpixel Transformer models easily via torch.hub:

# Example with raster-to-vector model
model = torch.hub.load(
    'dsb-ifi/dht', 
    'dht_ras2vec',
    pretrained=True,
    source='github',
)

This will load the model and downloaded the pretrained weights, stored in your local torch.hub directory.

Citation

If you find our work useful, please consider citing our paper.

@inproceedings{aasan2025dht,
  title={Differentiable Hierarchical Visual Tokenization},
  author={Aasan, Marius and Hjelkrem-Tan, Martine and Catalano, Nico and Choi, Changkyu and Ram\'irez Rivera, Ad\'in},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year={2025},
  url={https://openreview.net/forum?id=y8VWYf5cVI}
}

🚧 NOTE: The hubconf.py is still under construction, and will be updated with classification models soon.

About

Differentiable Hierarchical Visual Tokenization

Resources

License

Stars

Watchers

Forks

Languages