Image attention transformer network for indoor 3D object detection

被引:0
|
作者
Ren, Keyan [1 ]
Yan, Tong [1 ]
Hu, Zhaoxin [1 ]
Han, Honggui [1 ]
Zhang, Yunlu [1 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
基金
中国国家自然科学基金;
关键词
3D object detection; transformer; attention mechanism;
D O I
10.1007/s11431-023-2552-x
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Point clouds and RGB images are both critical data for 3D object detection. While recent multi-modal methods combine them directly and show remarkable performances, they ignore the distinct forms of these two types of data. For mitigating the influence of this intrinsic difference on performance, we propose a novel but effective fusion model named LI-Attention model, which takes both RGB features and point cloud features into consideration and assigns a weight to each RGB feature by attention mechanism. Furthermore, based on the LI-Attention model, we propose a 3D object detection method called image attention transformer network (IAT-Net) specialized for indoor RGB-D scene. Compared with previous work on multi-modal detection, IAT-Net fuses elaborate RGB features from 2D detection results with point cloud features in attention mechanism, meanwhile generates and refines 3D detection results with transformer model. Extensive experiments demonstrate that our approach outperforms state-of-the-art performance on two widely used benchmarks of indoor 3D object detection, SUN RGB-D and NYU Depth V2, while ablation studies have been provided to analyze the effect of each module. And the source code for the proposed IAT-Net is publicly available at https://github.com/wisper181/IAT-Net.
引用
收藏
页码:2176 / 2190
页数:15
相关论文
共 50 条
  • [1] Image attention transformer network for indoor 3D object detection
    REN KeYan
    YAN Tong
    HU ZhaoXin
    HAN HongGui
    ZHANG YunLu
    [J]. Science China(Technological Sciences), 2024, (07) : 2176 - 2190
  • [2] Image attention transformer network for indoor 3D object detection
    REN KeYan
    YAN Tong
    HU ZhaoXin
    HAN HongGui
    ZHANG YunLu
    [J]. Science China(Technological Sciences), 2024, 67 (07) : 2176 - 2190
  • [3] TBFNT3D: Two-Branch Fusion Network With Transformer for Multimodal Indoor 3D Object Detection
    Cheng, Jun
    Zhang, Sheng
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6523 - 6530
  • [4] Graph Neural Network and Spatiotemporal Transformer Attention for 3D Video Object Detection From Point Clouds
    Yin, Junbo
    Shen, Jianbing
    Gao, Xin
    Crandall, David J.
    Yang, Ruigang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 9822 - 9835
  • [5] Voxel Transformer for 3D Object Detection
    Mao, Jiageng
    Xue, Yujing
    Niu, Minzhe
    Bai, Haoyue
    Feng, Jiashi
    Liang, Xiaodan
    Xu, Hang
    Xu, Chunjing
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3144 - 3153
  • [6] ARPNET: attention region proposal network for 3D object detection
    Yangyang Ye
    Chi Zhang
    Xiaoli Hao
    [J]. Science China Information Sciences, 2019, 62
  • [7] ARPNET: attention region proposal network for 3D object detection
    Yangyang YE
    Chi ZHANG
    Xiaoli HAO
    [J]. Science China(Information Sciences), 2019, 62 (12) : 44 - 52
  • [8] ARPNET: attention region proposal network for 3D object detection
    Ye, Yangyang
    Zhang, Chi
    Hao, Xiaoli
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2019, 62 (12)
  • [9] SPGroup3D: Superpoint Grouping Network for Indoor 3D Object Detection
    Zhu, Yun
    Hui, Le
    Shen, Yaqi
    Xie, Jin
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7811 - 7819
  • [10] Voxel Transformer with Density-Aware Deformable Attention for 3D Object Detection
    Kim, Taeho
    Kim, Joohee
    [J]. SENSORS, 2023, 23 (16)