ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

被引：0

作者：

He, Chenhang ^{[1
]}

Li, Ruihuang ^{[1
,2
]}

Zhang, Guowen ^{[1
]}

Zhang, Lei ^{[1
,2
]}

机构：

[1] Hong Kong Polytech Univ, Hong Kong, Peoples R China

[2] OPPO Res, Shenzhen, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT XXIX | 2025年 / 15087卷

关键词：

3D Object Detection; Voxel Transformer;

D O I：

10.1007/978-3-031-73397-0_5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Window-based transformers excel in large-scale point cloud understanding by capturing context-aware representations with affordable attention computation in a more localized manner. However, the sparse nature of point clouds leads to a significant variance in the number of voxels per window. Existing methods group the voxels in each window into fixed-length sequences through extensive sorting and padding operations, resulting in a non-negligible computational and memory overhead. In this paper, we introduce ScatterFormer, which to the best of our knowledge, is the first to directly apply attention to voxels across different windows as a single sequence. The key of ScatterFormer is a Scattered Linear Attention (SLA) module, which leverages the pre-computation of key-value pairs in linear attention to enable parallel computation on the variable-length voxel sequences divided by windows. Leveraging the hierarchical structure of GPUs and shared memory, we propose a chunk-wise algorithm that reduces the SLA module's latency to less than 1 millisecond on moderate GPUs. Furthermore, we develop a cross-window interaction module that improves the locality and connectivity of voxel features across different windows, eliminating the need for extensive window shifting. Our proposed ScatterFormer demonstrates 73.8 mAP (L2) on the Waymo Open Dataset and 72.4 NDS on the NuScenes dataset, running at an outstanding detection rate of 23 FPS. The code is available at https://github.com/skyhehe123/ScatterFormer.

引用

页码：74 / 92

页数：19

共 50 条

[21] EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
Liu, Xinyu
Peng, Houwen
Zheng, Ningxin
Yang, Yuqing
Hu, Han
Yuan, Yixuan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14420 - 14430
[22] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
Liu, Zhijian
Yang, Xinyu
Tang, Haotian
Yang, Shang
Han, Song
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1200 - 1211
[23] Efficient memristor accelerator for transformer self-attention functionality
Bettayeb, Meriem
Halawani, Yasmin
Khan, Muhammad Umair
Saleh, Hani
Mohammad, Baker
SCIENTIFIC REPORTS, 2024, 14 (01):
[24] Lightweight and Efficient Human Pose Estimation Fusing Transformer and Attention
Wu, Chengpeng
Tan, Guangxing
Chen, Haifeng
Li, Chunyu
Computer Engineering and Applications, 2024, 60 (22) : 197 - 208
[25] POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery
Zheng, Ce
Liu, Xianpeng
Qi, Guo-Jun
Chen, Chen
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1611 - 1620
[26] Efficient Dual Attention Transformer for Image Super-Resolution
Park, Soobin
Jeong, Yuna
Choi, Yong Suk
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 963 - 970
[27] EFFICIENT TRANSFORMER WITH LOCALLY SHARED ATTENTION FOR VIDEO QUALITY ASSESSMENT
You, Junyong
Lin, Yuan
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 356 - 360
[28] Efficient Attention-guided Transformer for brain stroke segmentation
Soliman, Ahmed
Zafari-Ghadim, Yalda
Rashed, Essam A.
Mabrok, Mohamed
9TH INTERNATIONAL CONFERENCE ON MULTIMEDIA AND IMAGE PROCESSING, ICMIP 2024, 2024, : 127 - 131
[29] An efficient parallel self-attention transformer for CSI feedback
Liu, Ziang
Song, Tianyu
Zhao, Ruohan
Jin, Jiyu
Jin, Guiyue
PHYSICAL COMMUNICATION, 2024, 66
[30] Mixed Attention and Channel Shift Transformer for Efficient Action Recognition
Lu, Xiusheng
Hao, Yanbin
Cheng, Lechao
Zhao, Sicheng
Li, Yutao
Song, Mingli
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2025, 21 (03)

← 1 2 3 4 5 →