Skip to content

Question about PTv3 latency benchmark settings #2

@euitbly-cyber

Description

@euitbly-cyber

Hi, thanks for the great work!
First of all, thank you for the excellent work. I found this paper highly inspiring, especially in revealing the token redundancy in point cloud architectures and showing the potential of token merging for efficient 3D perception.

I'm trying to reproduce the latency results reported in the paper (Table shown below from the paper):

Method mIoU (%) Latency (ms)
PTv3 (base model) 77.6 266
Ours 77.0 203

The paper reports PTv3 baseline latency as 266 ms and GitMerge3D as 203 ms, showing that GitMerge3D is faster.

My Reproduction

I benchmarked on the ScanNet val set using the same single-sample forward latency protocol (batch_size=1, warmup=1, excluding data loading and post-processing from the timed region) with an NVIDIA GPU. Here are my results:

Method Latency (ms)
PTv3 (enable_flash=True, original config) 93.495
PTv3 (enable_flash=False) 161.131
GitMerge3D (patch, r=0.8, no weighted) 150.553
GitMerge3D (patch, r=0.8, weighted) 152.749

The Problem

In the PTv3 original config (semseg-pt-v3m1-0-base.py), Flash Attention is enabled (enable_flash=True). However, in the GitMerge3D config, Flash Attention must be disabled (enable_flash=False) because the code explicitly asserts this:

# point_transformer_v3m1_gitmerge3d.py
if tome_mode in VALID_TOME_MODES:
    assert self.enable_flash is False, "Flash attention is not supported with GitMerge3D token merging"

This is because self_attn() needs to add size.log() to the attention scores after token merging, which is not possible with Flash Attention's fused kernel.

As a result:

PTv3 with Flash Attention (93.5 ms) is significantly faster than GitMerge3D (150.6 ms)
Only when I disable Flash Attention for PTv3 (161.1 ms) does GitMerge3D (150.6 ms) become faster

Questions

  1. In the paper's latency benchmark, was PTv3 baseline tested with or without Flash Attention?
  2. If Flash Attention was disabled for PTv3 to get the 266 ms result, could you clarify this in the paper? Since PTv3's official config uses Flash Attention by default, readers would naturally assume the baseline uses it.

Thank you for your time!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions