Skip to content

anhtuanhsgs/GitMerge3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need?

Tuan Anh Tran1, Duy Minh Ho Nguyen2,3, Hoai-Chau Tran4, Michael Barz1, Khoa D. Doan4, Roger Wattenhofer5, Vien Anh Ngo6, Mathias Niepert2,3, Daniel Sonntag1,7, Paul Swoboda8
1German Research Centre for Artificial Intelligence (DFKI), 2Max Planck Research School for Intelligent Systems (IMPRS-IS), 3University of Stuttgart
4College of Engineering and Computer Science, VinUniversity, 5ETH Zurich, 6VinRobotics, Hanoi, Vietnam
7University of Oldenburg, 8Heinrich Heine University Düsseldorf

GitMerge3D Teaser

GitMerge3D enables merging of up to 80-95% of tokens, substantially reducing computational and memory costs while preserving model performance.


Feature PCA Layer 5-21 Feature PCA Layer 5-20
Feature PCA visualizations across different merging rates (0.15 to 0.95) demonstrating that learned representations remain distinctive or unchanged despite aggressive token reduction

A. Documentation

B. Project Structure

GitMerge3D/
├── Sonata/                                    # Main codebase
│   ├── configs/sonata/                       # Token merging configurations
│   ├── pointcept/
│   │   └── models/point_transformer_v3/
│   │       └── token_merging_algos.py        # Core merging algorithms
│   ├── tools/                                # Training and testing scripts
│   ├── token_merging_evaluation/             # Evaluation scripts
│   ├── TOKEN_MERGING_EVALUATION_GUIDE.md
│   ├── FINETUNING_TOKEN_MERGING_GUIDE.md
│   └── README.md                             # Base installation guide
└── README.md                                  # This file

C. Roadmap

Current Release

  • Code for Sonata (Pointcept v1.6.0) - Token merging implementation with Sonata backbone
  • Training scripts and configurations
  • Evaluation tools and documentation

Coming Soon

  • Model Checkpoints - Pre-trained weights for all retention ratios (r=0.7, 0.8, 0.9, 0.95)
  • SpatialLM Integration - Code and checkpoints for SpatialLM with token merging
  • PTv3 (Pointcept v1.5.1) - Code and checkpoints for Point Transformer V3 baseline

Stay tuned for updates!

D. Installation

Requirements

  • Ubuntu: 18.04 and above
  • CUDA: 11.3 and above
  • PyTorch: 1.10.0 and above

Setup

  1. Clone the repository

    git clone https://github.com/anhtuanhsgs/GitMerge3D.git
    cd GitMerge3D/Sonata
  2. Create conda environment

    Follow the installation instructions in Sonata/README.md:

    # Option 1: Using environment.yml
    conda env create -f environment.yml --verbose
    conda activate pointcept-torch2.5.0-cu12.4
    
    # Option 2: Manual installation
    # See Sonata/README.md for detailed manual setup
  3. Install additional dependencies

    # PTv3 dependencies
    cd libs/pointops
    python setup.py install
    cd ../..
  4. Prepare datasets

    Follow the data preparation instructions in Sonata/README.md for:

    • ScanNet v2
    • S3DIS
    • ScanNet200

E. Quick Start

Training with Token Merging

Fine-tune a pre-trained model with token merging enabled:

cd Sonata

# ScanNet with r=0.9 (90% token retention)
sh scripts/train.sh -g 4 -d scannet \
    -c gitmerge3d-patch-0c-scannet-ft-finetune-100epochs-r0.9-s10 \
    -n gitmerge3d-patch-r0.9-scannet-ft \
    -w exp/sonata/your-pretrained-model/model/model_best.pth

Evaluation

Evaluate a trained model:

# Using evaluation script
cd Sonata
bash token_merging_evaluation/eval_scannet.sh

Or use the Python script directly:

python tools/test.py \
    --config-file configs/sonata/gitmerge3d-wpatch-0c-scannet-ft-finetune-100epochs-r0.9-s10.py \
    --options weight=exp/sonata/your-checkpoint/model/model_best.pth

Measuring GFLOPs

Measure computational efficiency across multiple retention ratios:

cd Sonata/token_merging_evaluation
python run_cal_flops_sweep.py \
    --config ../configs/sonata/gitmerge3d-wpatch-0c-scannet-ft-finetune-100epochs-r0.9-s10.py \
    --merge-rates 0.7 0.8 0.9 0.95 \
    --max-scene-count 10 \
    --gpu 0

F. Performance

Results on ScanNet semantic segmentation:

Retention Ratio Tokens Retained mIoU (%) GFLOPs Checkpoint
Baseline (r=1.0) 100% 78.9 206 To be uploaded
r=0.95 95% 78.5 50 To be uploaded
r=0.90 90% 78.8 53 To be uploaded
r=0.80 80% 79.2 59 To be uploaded
r=0.70 70% 79.5 67 To be uploaded

Up to 21% reduction in computational cost with minimal accuracy loss

Citation

If you find this work useful in your research, please cite:

@inproceedings{gitmerge3d2025,
    title={How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need?},
    author={Tuan Anh Tran and Duy Minh Ho Nguyen and Hoai-Chau Tran and Michael Barz and Khoa D. Doan and Roger Wattenhofer and Vien Anh Ngo and Mathias Niepert and Daniel Sonntag and Paul Swoboda},
    booktitle={Advances in Neural Information Processing Systems},
    year={2025}
}

Also cite the base Pointcept and Sonata work:

@misc{pointcept2023,
    title={Pointcept: A Codebase for Point Cloud Perception Research},
    author={Pointcept Contributors},
    howpublished = {\url{https://github.com/Pointcept/Pointcept}},
    year={2023}
}

Acknowledgements

This project is built upon:

License

This project is released under the MIT License. See LICENSE for details.

About

[NeurIPS 2025] How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need?

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors