Hey there,
Some people at NVIDIA recently proposed a "Native Segmentation Vision Transformer" (https://arxiv.org/abs/2505.16993), where they say they use na2d_qk and na2d_av in their content-aware spatial grouping algorithm (see algorithm 2 in appendix E.1). Apparently it is quite essential to the practical use of their model, as shown in the runtime and memory analysis of appendix E.2
Since I saw in the changelog for release 0.20.0 that "unfused kernels may be revisited depending on demand and use case", I figured I would let you know that some people (like me) would be interested in this.
Appreciate all the great work — thanks!
Hey there,
Some people at NVIDIA recently proposed a "Native Segmentation Vision Transformer" (https://arxiv.org/abs/2505.16993), where they say they use na2d_qk and na2d_av in their content-aware spatial grouping algorithm (see algorithm 2 in appendix E.1). Apparently it is quite essential to the practical use of their model, as shown in the runtime and memory analysis of appendix E.2
Since I saw in the changelog for release 0.20.0 that "unfused kernels may be revisited depending on demand and use case", I figured I would let you know that some people (like me) would be interested in this.
Appreciate all the great work — thanks!