Image attention transformer network for indoor 3D object detection

被引:0
|
作者
Ren, Keyan [1 ]
Yan, Tong [1 ]
Hu, Zhaoxin [1 ]
Han, Honggui [1 ]
Zhang, Yunlu [1 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
基金
中国国家自然科学基金;
关键词
3D object detection; transformer; attention mechanism;
D O I
10.1007/s11431-023-2552-x
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Point clouds and RGB images are both critical data for 3D object detection. While recent multi-modal methods combine them directly and show remarkable performances, they ignore the distinct forms of these two types of data. For mitigating the influence of this intrinsic difference on performance, we propose a novel but effective fusion model named LI-Attention model, which takes both RGB features and point cloud features into consideration and assigns a weight to each RGB feature by attention mechanism. Furthermore, based on the LI-Attention model, we propose a 3D object detection method called image attention transformer network (IAT-Net) specialized for indoor RGB-D scene. Compared with previous work on multi-modal detection, IAT-Net fuses elaborate RGB features from 2D detection results with point cloud features in attention mechanism, meanwhile generates and refines 3D detection results with transformer model. Extensive experiments demonstrate that our approach outperforms state-of-the-art performance on two widely used benchmarks of indoor 3D object detection, SUN RGB-D and NYU Depth V2, while ablation studies have been provided to analyze the effect of each module. And the source code for the proposed IAT-Net is publicly available at https://github.com/wisper181/IAT-Net.
引用
收藏
页码:2176 / 2190
页数:15
相关论文
共 50 条
  • [21] RADIANT: Radar-Image Association Network for 3D Object Detection
    Long, Yunfei
    Kumar, Abhinav
    Morris, Daniel
    Liu, Xiaoming
    Castro, Marcos
    Chakravarty, Punarjay
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1808 - 1816
  • [22] High-order multilayer attention fusion network for 3D object detection
    Zhang, Baowen
    Zhao, Yongyong
    Su, Chengzhi
    Cao, Guohua
    [J]. ENGINEERING REPORTS, 2024,
  • [23] Multimodal Transformer for Automatic 3D Annotation and Object Detection
    Liu, Chang
    Qian, Xiaoyan
    Huang, Binxiao
    Qi, Xiaojuan
    Lam, Edmund
    Tan, Siew-Chong
    Wong, Ngai
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 657 - 673
  • [24] SEFormer: Structure Embedding Transformer for 3D Object Detection
    Feng, Xiaoyu
    Du, Heming
    Fan, Hehe
    Duan, Yueqi
    Liu, Yongpan
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 632 - 640
  • [25] Monocular 3D object detection for an indoor robot environment
    Kim, Jiwon
    Lee, GiJae
    Kim, Jun-Sik
    Kim, Hyunwoo J.
    Kim, KangGeon
    [J]. 2020 29TH IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2020, : 438 - 445
  • [26] GFENet: Group-Free Enhancement Network for Indoor Scene 3D Object Detection
    Zhou, Feng
    Dai, Ju
    Pan, Junjun
    Zhu, Mengxiao
    Cai, Xingquan
    Huang, Bin
    Wang, Chen
    [J]. ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT III, 2024, 14497 : 119 - 136
  • [27] TPAFNet: Transformer-Driven Pyramid Attention Fusion Network for 3D Medical Image Segmentation
    Li, Zheng
    Zhang, Jinhui
    Wei, Siyi
    Gao, Yueyang
    Cao, Chengwei
    Wu, Zhiwei
    [J]. IEEE Journal of Biomedical and Health Informatics, 2024, 28 (11) : 6803 - 6814
  • [28] PointGAT: Graph attention networks for 3D object detection
    Zhou H.
    Wang W.
    Liu G.
    Zhou Q.
    [J]. Intelligent and Converged Networks, 2022, 3 (02): : 204 - 216
  • [29] PTA-Det: Point Transformer Associating Point Cloud and Image for 3D Object Detection
    Wan, Rui
    Zhao, Tianyun
    Zhao, Wei
    [J]. SENSORS, 2023, 23 (06)
  • [30] 3D Object Detection Based on Sparse Self-Attention Graph Neural Network
    Peng, Zhichen
    Feng, Ansong
    Wang, Tianzhu
    Shao, Xinzhe
    Ku, Tao
    [J]. Computer Engineering and Applications, 61 (03): : 295 - 305