FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

被引：28

作者：

Liu, Zhijian ^{[1
]}

Yang, Xinyu ^{[1
,2
]}

Tang, Haotian ^{[1
]}

Yang, Shang ^{[1
,3
]}

Han, Song ^{[1
]}

机构：

[1] MIT, Cambridge, MA 02139 USA

[2] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[3] Tsinghua Univ, Beijing, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

基金：

美国国家科学基金会;

关键词：

VISION;

D O I：

10.1109/CVPR52729.2023.00122

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer, as an alternative to CNN, has been proven effective in many modalities (e.g., texts and images). For 3D point cloud transformers, existing efforts focus primarily on pushing their accuracy to the state-of-the-art level. However, their latency lags behind sparse convolution-based models (3x slower), hindering their usage in resource-constrained, latency-sensitive applications (such as autonomous driving). This inefficiency comes from point clouds' sparse and irregular nature, whereas transformers are designed for dense, regular workloads. This paper presents FlatFormer to close this latency gap by trading spatial proximity for better computational regularity. We first flatten the point cloud with window-based sorting and partition points into groups of equal sizes rather than windows of equal shapes. This effectively avoids expensive structuring and padding overheads. We then apply self-attention within groups to extract local features, alternate sorting axis to gather features from different directions, and shift windows to exchange features across groups. FlatFormer delivers state-of-the-art accuracy on Waymo Open Dataset with 4.6x speedup over (transformer-based) SST and 1.4x speedup over (sparse convolutional) CenterPoint. This is the first point cloud transformer that achieves real-time performance on edge GPUs and is faster than sparse convolutional methods while achieving on-par or even superior accuracy on large-scale benchmarks.

引用

页码：1200 / 1211

页数：12

共 50 条

[31] SMARTformer: Semi-Autoregressive Transformer with Efficient Integrated Window Attention for Long Time Series Forecasting
Li, Yiduo
Qi, Shiyi
Li, Zhe
Rao, Zhongwen
Pan, Lujia
Xu, Zenglin
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 2169 - 2177
[32] Efficient transformer tracking with adaptive attention
Xiao, Dingkun
Wei, Zhenzhong
Zhang, Guangjun
IET COMPUTER VISION, 2024,
[33] An Efficient 3-D Point Cloud Place Recognition Approach Based on Feature Point Extraction and Transformer
Ye, Tao
Yan, Xiangming
Wang, Shouan
Li, Yunwang
Zhou, Fuqiang
IEEE Transactions on Instrumentation and Measurement, 2022, 71
[34] An Efficient 3-D Point Cloud Place Recognition Approach Based on Feature Point Extraction and Transformer
Ye, Tao
Yan, Xiangming
Wang, Shouan
Li, Yunwang
Zhou, Fuqiang
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
[35] PointFaceFormer: local and global attention based transformer for 3D point cloud face recognition
Gao, Ziqi
Li, Qiufu
Wang, Gui
Shen, Linlin
2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
[36] Wave-PCT: Wavelet point cloud transformer for point cloud quality assessment
Guo, Ziyou
Huang, Zhen
Gong, Wenyong
Wu, Tieru
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 257
[37] Point attention network for point cloud semantic segmentation
Dayong REN
Zhengyi WU
Jiawei LI
Piaopiao YU
Jie GUO
Mingqiang WEI
Yanwen GUO
Science China(Information Sciences), 2022, 65 (09) : 99 - 112
[38] Point attention network for point cloud semantic segmentation
Ren, Dayong
Wu, Zhengyi
Li, Jiawei
Yu, Piaopiao
Guo, Jie
Wei, Mingqiang
Guo, Yanwen
SCIENCE CHINA-INFORMATION SCIENCES, 2022, 65 (09)
[39] FPTNet: Full Point Transformer Network for Point Cloud Completion
Wang, Chunmao
Yan, Xuejun
Wang, Jingjing
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT II, 2024, 14426 : 142 - 154
[40] PVT: Point-voxel transformer for point cloud learning
Zhang, Cheng
Wan, Haocheng
Shen, Xinyi
Wu, Zizhao
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (12) : 11985 - 12008

← 1 2 3 4 5 →