FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

被引:28
|
作者
Liu, Zhijian [1 ]
Yang, Xinyu [1 ,2 ]
Tang, Haotian [1 ]
Yang, Shang [1 ,3 ]
Han, Song [1 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[3] Tsinghua Univ, Beijing, Peoples R China
基金
美国国家科学基金会;
关键词
VISION;
D O I
10.1109/CVPR52729.2023.00122
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer, as an alternative to CNN, has been proven effective in many modalities (e.g., texts and images). For 3D point cloud transformers, existing efforts focus primarily on pushing their accuracy to the state-of-the-art level. However, their latency lags behind sparse convolution-based models (3x slower), hindering their usage in resource-constrained, latency-sensitive applications (such as autonomous driving). This inefficiency comes from point clouds' sparse and irregular nature, whereas transformers are designed for dense, regular workloads. This paper presents FlatFormer to close this latency gap by trading spatial proximity for better computational regularity. We first flatten the point cloud with window-based sorting and partition points into groups of equal sizes rather than windows of equal shapes. This effectively avoids expensive structuring and padding overheads. We then apply self-attention within groups to extract local features, alternate sorting axis to gather features from different directions, and shift windows to exchange features across groups. FlatFormer delivers state-of-the-art accuracy on Waymo Open Dataset with 4.6x speedup over (transformer-based) SST and 1.4x speedup over (sparse convolutional) CenterPoint. This is the first point cloud transformer that achieves real-time performance on edge GPUs and is faster than sparse convolutional methods while achieving on-par or even superior accuracy on large-scale benchmarks.
引用
收藏
页码:1200 / 1211
页数:12
相关论文
共 50 条
  • [31] SMARTformer: Semi-Autoregressive Transformer with Efficient Integrated Window Attention for Long Time Series Forecasting
    Li, Yiduo
    Qi, Shiyi
    Li, Zhe
    Rao, Zhongwen
    Pan, Lujia
    Xu, Zenglin
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 2169 - 2177
  • [32] Efficient transformer tracking with adaptive attention
    Xiao, Dingkun
    Wei, Zhenzhong
    Zhang, Guangjun
    IET COMPUTER VISION, 2024,
  • [33] An Efficient 3-D Point Cloud Place Recognition Approach Based on Feature Point Extraction and Transformer
    Ye, Tao
    Yan, Xiangming
    Wang, Shouan
    Li, Yunwang
    Zhou, Fuqiang
    IEEE Transactions on Instrumentation and Measurement, 2022, 71
  • [34] An Efficient 3-D Point Cloud Place Recognition Approach Based on Feature Point Extraction and Transformer
    Ye, Tao
    Yan, Xiangming
    Wang, Shouan
    Li, Yunwang
    Zhou, Fuqiang
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [35] PointFaceFormer: local and global attention based transformer for 3D point cloud face recognition
    Gao, Ziqi
    Li, Qiufu
    Wang, Gui
    Shen, Linlin
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [36] Wave-PCT: Wavelet point cloud transformer for point cloud quality assessment
    Guo, Ziyou
    Huang, Zhen
    Gong, Wenyong
    Wu, Tieru
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 257
  • [37] Point attention network for point cloud semantic segmentation
    Dayong REN
    Zhengyi WU
    Jiawei LI
    Piaopiao YU
    Jie GUO
    Mingqiang WEI
    Yanwen GUO
    Science China(Information Sciences), 2022, 65 (09) : 99 - 112
  • [38] Point attention network for point cloud semantic segmentation
    Ren, Dayong
    Wu, Zhengyi
    Li, Jiawei
    Yu, Piaopiao
    Guo, Jie
    Wei, Mingqiang
    Guo, Yanwen
    SCIENCE CHINA-INFORMATION SCIENCES, 2022, 65 (09)
  • [39] FPTNet: Full Point Transformer Network for Point Cloud Completion
    Wang, Chunmao
    Yan, Xuejun
    Wang, Jingjing
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT II, 2024, 14426 : 142 - 154
  • [40] PVT: Point-voxel transformer for point cloud learning
    Zhang, Cheng
    Wan, Haocheng
    Shen, Xinyi
    Wu, Zizhao
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (12) : 11985 - 12008