Tuan Anh Tran1,
Duy Minh Ho Nguyen2,3,
Hoai-Chau Tran4,
Michael Barz1,
Khoa D. Doan4,
Roger Wattenhofer5,
Vien Anh Ngo6,
Mathias Niepert2,3,
Daniel Sonntag1,7,
Paul Swoboda8
1German Research Centre for Artificial Intelligence (DFKI),
2Max Planck Research School for Intelligent Systems (IMPRS-IS),
3University of Stuttgart
4College of Engineering and Computer Science, VinUniversity,
5ETH Zurich,
6VinRobotics, Hanoi, Vietnam
7University of Oldenburg,
8Heinrich Heine University Düsseldorf
Code | Paper | Project Page
GitMerge3D enables merging of up to 80-95% of tokens, substantially reducing computational and memory costs while preserving model performance.
Feature PCA visualizations across different merging rates (0.15 to 0.95) demonstrating that learned representations remain distinctive or unchanged despite aggressive token reduction
- Token Merging Evaluation Guide: How to evaluate models and measure GFLOPs
- Token Merging Training Guide: Fine-tuning with token merging
- Training Guide: Baseline training without token merging
- Evaluation Scripts: Ready-to-use evaluation tools
GitMerge3D/
├── Sonata/ # Main codebase
│ ├── configs/sonata/ # Token merging configurations
│ ├── pointcept/
│ │ └── models/point_transformer_v3/
│ │ └── token_merging_algos.py # Core merging algorithms
│ ├── tools/ # Training and testing scripts
│ ├── token_merging_evaluation/ # Evaluation scripts
│ ├── TOKEN_MERGING_EVALUATION_GUIDE.md
│ ├── FINETUNING_TOKEN_MERGING_GUIDE.md
│ └── README.md # Base installation guide
└── README.md # This file
- Code for Sonata (Pointcept v1.6.0) - Token merging implementation with Sonata backbone
- Training scripts and configurations
- Evaluation tools and documentation
- Model Checkpoints - Pre-trained weights for all retention ratios (r=0.7, 0.8, 0.9, 0.95)
- SpatialLM Integration - Code and checkpoints for SpatialLM with token merging
- PTv3 (Pointcept v1.5.1) - Code and checkpoints for Point Transformer V3 baseline
Stay tuned for updates!
- Ubuntu: 18.04 and above
- CUDA: 11.3 and above
- PyTorch: 1.10.0 and above
-
Clone the repository
git clone https://github.com/anhtuanhsgs/GitMerge3D.git cd GitMerge3D/Sonata -
Create conda environment
Follow the installation instructions in Sonata/README.md:
# Option 1: Using environment.yml conda env create -f environment.yml --verbose conda activate pointcept-torch2.5.0-cu12.4 # Option 2: Manual installation # See Sonata/README.md for detailed manual setup
-
Install additional dependencies
# PTv3 dependencies cd libs/pointops python setup.py install cd ../..
-
Prepare datasets
Follow the data preparation instructions in Sonata/README.md for:
- ScanNet v2
- S3DIS
- ScanNet200
Fine-tune a pre-trained model with token merging enabled:
cd Sonata
# ScanNet with r=0.9 (90% token retention)
sh scripts/train.sh -g 4 -d scannet \
-c gitmerge3d-patch-0c-scannet-ft-finetune-100epochs-r0.9-s10 \
-n gitmerge3d-patch-r0.9-scannet-ft \
-w exp/sonata/your-pretrained-model/model/model_best.pthEvaluate a trained model:
# Using evaluation script
cd Sonata
bash token_merging_evaluation/eval_scannet.shOr use the Python script directly:
python tools/test.py \
--config-file configs/sonata/gitmerge3d-wpatch-0c-scannet-ft-finetune-100epochs-r0.9-s10.py \
--options weight=exp/sonata/your-checkpoint/model/model_best.pthMeasure computational efficiency across multiple retention ratios:
cd Sonata/token_merging_evaluation
python run_cal_flops_sweep.py \
--config ../configs/sonata/gitmerge3d-wpatch-0c-scannet-ft-finetune-100epochs-r0.9-s10.py \
--merge-rates 0.7 0.8 0.9 0.95 \
--max-scene-count 10 \
--gpu 0Results on ScanNet semantic segmentation:
| Retention Ratio | Tokens Retained | mIoU (%) | GFLOPs | Checkpoint |
|---|---|---|---|---|
| Baseline (r=1.0) | 100% | 78.9 | 206 | To be uploaded |
| r=0.95 | 95% | 78.5 | 50 | To be uploaded |
| r=0.90 | 90% | 78.8 | 53 | To be uploaded |
| r=0.80 | 80% | 79.2 | 59 | To be uploaded |
| r=0.70 | 70% | 79.5 | 67 | To be uploaded |
Up to 21% reduction in computational cost with minimal accuracy loss
If you find this work useful in your research, please cite:
@inproceedings{gitmerge3d2025,
title={How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need?},
author={Tuan Anh Tran and Duy Minh Ho Nguyen and Hoai-Chau Tran and Michael Barz and Khoa D. Doan and Roger Wattenhofer and Vien Anh Ngo and Mathias Niepert and Daniel Sonntag and Paul Swoboda},
booktitle={Advances in Neural Information Processing Systems},
year={2025}
}Also cite the base Pointcept and Sonata work:
@misc{pointcept2023,
title={Pointcept: A Codebase for Point Cloud Perception Research},
author={Pointcept Contributors},
howpublished = {\url{https://github.com/Pointcept/Pointcept}},
year={2023}
}This project is built upon:
- Pointcept - Point cloud perception codebase
- Point Transformer V3 - Efficient point cloud backbone
- Sonata - Self-supervised learning for point clouds
This project is released under the MIT License. See LICENSE for details.
