MuTrans: Multiple Transformers for Fusing Feature Pyramid on 2D and 3D Object Detection

被引:1
|
作者
Xie, Bangquan [1 ,2 ]
Yang, Liang [3 ]
Wei, Ailin [4 ]
Weng, Xiaoxiong [5 ]
Li, Bing [2 ]
机构
[1] South China Univ Technol, Sch Civil Engn & Transportat, Guangzhou 510641, Peoples R China
[2] Clemson Univ, Dept Automot Engn, Int Ctr Automot Res CU ICAR, Greenville, SC 29607 USA
[3] Apple Inc, Sunnyvale, CA 95014 USA
[4] Clemson Univ, Dept Bioengn, Clemson, SC 29631 USA
[5] South China Univ Technol, Sch Civil Engn & Transportat, Guangzhou 510641, Peoples R China
关键词
Transformers; feature pyramid; sensor fusion; object detection;
D O I
10.1109/TIP.2023.3299190
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the major components of the neural network, the feature pyramid plays a vital part in perception tasks, like object detection in autonomous driving. But it is a challenge to fuse multi-level and multi-sensor feature pyramids for object detection. This paper proposes a simple yet effective framework named MuTrans (Multiple Transformers) to fuse feature pyramid in single-stream 2D detector or two stream 3D detector. The MuTrans based on encoder-decoder focuses on the significant features via multiple Transformers. MuTrans encoder uses three innovative self-attention mechanisms: Spatial-wise BoxAlign attention (SB) for low-level spatial locations, Context-wise Affinity attention (CA) for high-level context information, and high-level attention for multi-level features. Then MuTrans decoder processes these significant proposals including the RoI and context affinity. Besides, the Low and High-level Fusion (LHF) in the encoder reduces the number of computational parameters. And the Pre-LN is utilized to accelerate the training convergence. LHF and Pre-LN are proven to reduce self-attention's computational complexity and slow training convergence. Our result demonstrates the higher detection accuracy of MuTrans than that of the baseline method, particularly in small object detection. MuTrans demonstrates a 2.1 higher detection accuracy on AP(S) index in small object detection on MS-COCO 2017 with ResNeXt-101 backbone, a 2.18 higher 3D detection accuracy (moderate difficulty) for small object-pedestrian on KITTI, and 6.85 higher RC index (Town05 Long) on CARLA urban driving simulator platform.
引用
收藏
页码:4407 / 4415
页数:9
相关论文
共 50 条
  • [1] FocusTR: Focusing on Valuable Feature by Multiple Transformers for Fusing Feature Pyramid on Object Detection
    Xie, Bangquan
    Yang, Liang
    Yang, Zongming
    Wei, Ailin
    Weng, Xiaoxiong
    Li, Bing
    [J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 518 - 525
  • [2] Scale Adaptive Feature Pyramid Networks for 2D Object Detection
    He, Lifei
    Jiang, Ming
    Ohbuchi, Ryutarou
    Furuya, Takahiko
    Zhang, Min
    Li, Pengfei
    [J]. SCIENTIFIC PROGRAMMING, 2020, 2020
  • [3] Realistic 3D face modeling by fusing multiple 2D images
    Wang, CH
    Yan, SC
    Zhang, HJ
    Ma, WY
    [J]. 11TH INTERNATIONAL MULTIMEDIA MODELLING CONFERENCE, PROCEEDINGS, 2005, : 139 - 146
  • [4] IoU Loss for 2D/3D Object Detection
    Zhou, Dingfu
    Fang, Jin
    Song, Xibin
    Guan, Chenye
    Yin, Junbo
    Dai, Yuchao
    Yang, Ruigang
    [J]. 2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 85 - 94
  • [5] Enhance the 3D Object Detection With 2D Prior
    Liu, Cheng
    [J]. IEEE ACCESS, 2024, 12 : 67161 - 67169
  • [6] PointFPN: A Frustum-based Feature Pyramid Network for 3D Object Detection
    Fan, Zhaoxin
    Liu, Hongyan
    He, Jun
    Jiang, Siwei
    Du, Xiaoyong
    [J]. 2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 1129 - 1136
  • [7] A Background Study on Feature Extraction for 2D and 3D Object Models
    Yuan, Xiaobu
    Pachika, Shivani
    [J]. PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENGINEERING AND COMMUNICATION SYSTEMS, ICACECS 2021, 2022, : 265 - 273
  • [8] Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles
    Srivastava, Siddharth
    Jurie, Frederic
    Sharma, Gaurav
    [J]. 2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 4504 - 4511
  • [9] Application of uncertainty modeling in 2D and 3D object detection
    Wang, Meng
    Zhu, Bing
    [J]. Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2023, 45 (08): : 2370 - 2376
  • [10] 3D Object Localization With 2D Object Detector and 2D Localization
    Staszak, Rafal
    Belter, Dominik
    [J]. 2022 17TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2022, : 715 - 720