FocusTR: Focusing on Valuable Feature by Multiple Transformers for Fusing Feature Pyramid on Object Detection

被引:1
|
作者
Xie, Bangquan [1 ,3 ]
Yang, Liang [2 ]
Yang, Zongming [3 ]
Wei, Ailin [4 ]
Weng, Xiaoxiong [1 ]
Li, Bing [3 ]
机构
[1] South China Univ Technol, Sch Civil Engn & Transportat, Guangzhou 510641, Peoples R China
[2] CUNY City Coll, 140 Convent Ave, New York, NY 10031 USA
[3] Clemson Univ, Dept Automot Engn, Int Ctr Automot Res CU ICAR, Greenville, SC 29607 USA
[4] Clemson Univ, Dept Bioengn, Clemson, SC 29631 USA
关键词
D O I
10.1109/IROS47612.2022.9981047
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The feature pyramid, which is a vital component of the convolutional neural networks, plays a significant role in several perception tasks, including object detection for autonomous driving. However, how to better fuse multi-level and multi-sensor feature pyramids is still a significant challenge, especially for object detection. This paper presents a FocusTR (Focusing on the valuable features by multiple Transformers), which is a simple yet effective architecture, to fuse feature pyramid for the single-stream 2D detector and two-stream 3D detector. Specifically, FocusTR encompasses several novel selfattention mechanisms, including the spatial-wise boxAlign attention (SB) for low-level spatial locations, context-wise affinity attention (CA) for high-level context information, and level-wise attention for the multi-level feature. To alleviate self-attention's computational complexity and slow training convergence, FocusTR introduces a low and high-level fusion (LHF) to reduce the computational parameters, and the Pre-LN [1] to accelerate the training convergence.
引用
收藏
页码:518 / 525
页数:8
相关论文
共 50 条
  • [1] MuTrans: Multiple Transformers for Fusing Feature Pyramid on 2D and 3D Object Detection
    Xie, Bangquan
    Yang, Liang
    Wei, Ailin
    Weng, Xiaoxiong
    Li, Bing
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4407 - 4415
  • [2] Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers
    Huang, Zhou
    Dai, Hang
    Xiang, Tian-Zhu
    Wang, Shuo
    Chen, Huai-Xin
    Qin, Jie
    Xiong, Huan
    [J]. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023, 2023-June : 5557 - 5566
  • [3] Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers
    Huang, Zhou
    Dai, Hang
    Xiang, Tian-Zhu
    Wang, Shuo
    Chen, Huai-Xin
    Qin, Jie
    Xiong, Huan
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5557 - 5566
  • [4] Centralized Feature Pyramid for Object Detection
    Quan, Yu
    Zhang, Dong
    Zhang, Liyan
    Tang, Jinhui
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4341 - 4354
  • [5] Feature Pyramid Networks for Object Detection
    Lin, Tsung-Yi
    Dollar, Piotr
    Girshick, Ross
    He, Kaiming
    Hariharan, Bharath
    Belongie, Serge
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 936 - 944
  • [6] FEATURE FUSING OF FEATURE PYRAMID NETWORK FOR MULTI-SCALE PEDESTRIAN DETECTION
    Tesema, Fiseha B.
    Lin, Junpeng
    Ou, Jie
    Wu, Hong
    Zhu, William
    [J]. 2018 15TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2018, : 10 - 13
  • [7] Adaptive Feature Pyramid Networks for Object Detection
    Wang, Chengyang
    Zhong, Caiming
    [J]. IEEE ACCESS, 2021, 9 : 107024 - 107032
  • [8] An improved feature pyramid network for object detection
    Zhu, Linxiang
    Lee, Feifei
    Cai, Jiawei
    Yu, Hongliu
    Chen, Qiu
    [J]. NEUROCOMPUTING, 2022, 483 : 127 - 139
  • [9] Deep Feature Pyramid Reconfiguration for Object Detection
    Kong, Tao
    Sun, Fuchun
    Huang, Wenbing
    Liu, Huaping
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 172 - 188
  • [10] Parallel Feature Pyramid Network for Object Detection
    Kim, Seung-Wook
    Kook, Hyong-Keun
    Sun, Jee-Young
    Kang, Mun-Cheon
    Ko, Sung-Jea
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 239 - 256