Hi, thanks for the great work!
First of all, thank you for the excellent work. I found this paper highly inspiring, especially in revealing the token redundancy in point cloud architectures and showing the potential of token merging for efficient 3D perception.
I'm trying to reproduce the latency results reported in the paper (Table shown below from the paper):
| Method |
mIoU (%) |
Latency (ms) |
| PTv3 (base model) |
77.6 |
266 |
| Ours |
77.0 |
203 |
The paper reports PTv3 baseline latency as 266 ms and GitMerge3D as 203 ms, showing that GitMerge3D is faster.
My Reproduction
I benchmarked on the ScanNet val set using the same single-sample forward latency protocol (batch_size=1, warmup=1, excluding data loading and post-processing from the timed region) with an NVIDIA GPU. Here are my results:
| Method |
Latency (ms) |
| PTv3 (enable_flash=True, original config) |
93.495 |
| PTv3 (enable_flash=False) |
161.131 |
| GitMerge3D (patch, r=0.8, no weighted) |
150.553 |
| GitMerge3D (patch, r=0.8, weighted) |
152.749 |
The Problem
In the PTv3 original config (semseg-pt-v3m1-0-base.py), Flash Attention is enabled (enable_flash=True). However, in the GitMerge3D config, Flash Attention must be disabled (enable_flash=False) because the code explicitly asserts this:
# point_transformer_v3m1_gitmerge3d.py
if tome_mode in VALID_TOME_MODES:
assert self.enable_flash is False, "Flash attention is not supported with GitMerge3D token merging"
This is because self_attn() needs to add size.log() to the attention scores after token merging, which is not possible with Flash Attention's fused kernel.
As a result:
PTv3 with Flash Attention (93.5 ms) is significantly faster than GitMerge3D (150.6 ms)
Only when I disable Flash Attention for PTv3 (161.1 ms) does GitMerge3D (150.6 ms) become faster
Questions
- In the paper's latency benchmark, was PTv3 baseline tested with or without Flash Attention?
- If Flash Attention was disabled for PTv3 to get the 266 ms result, could you clarify this in the paper? Since PTv3's official config uses Flash Attention by default, readers would naturally assume the baseline uses it.
Thank you for your time!
Hi, thanks for the great work!
First of all, thank you for the excellent work. I found this paper highly inspiring, especially in revealing the token redundancy in point cloud architectures and showing the potential of token merging for efficient 3D perception.
I'm trying to reproduce the latency results reported in the paper (Table shown below from the paper):
The paper reports PTv3 baseline latency as 266 ms and GitMerge3D as 203 ms, showing that GitMerge3D is faster.
My Reproduction
I benchmarked on the ScanNet val set using the same single-sample forward latency protocol (
batch_size=1,warmup=1, excluding data loading and post-processing from the timed region) with an NVIDIA GPU. Here are my results:The Problem
In the PTv3 original config (
semseg-pt-v3m1-0-base.py), Flash Attention is enabled (enable_flash=True). However, in the GitMerge3D config, Flash Attention must be disabled (enable_flash=False) because the code explicitly asserts this:This is because self_attn() needs to add size.log() to the attention scores after token merging, which is not possible with Flash Attention's fused kernel.
As a result:
PTv3 with Flash Attention (93.5 ms) is significantly faster than GitMerge3D (150.6 ms)
Only when I disable Flash Attention for PTv3 (161.1 ms) does GitMerge3D (150.6 ms) become faster
Questions
Thank you for your time!