RESC: REfine the SCore with adaptive transformer head for end-to-end object detection

被引:0
|
作者
Wang, Honglie [1 ]
Jiang, Rong [2 ,3 ]
Xu, Jian [4 ]
Sun, Shouqian [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 321000, Zhejiang, Peoples R China
[2] Zhejiang Univ, Ctr Balance Architecture, Hangzhou 321000, Zhejiang, Peoples R China
[3] Zhejiang Univ, Architectural Design & Res Inst, Zhejiang Univ Co Ltd, Hangzhou 321000, Zhejiang, Peoples R China
[4] Tsinghua Univ, Tsinghua Berkeley Shenzhen Inst, Shenzhen 518000, Guangdong, Peoples R China
来源
NEURAL COMPUTING & APPLICATIONS | 2022年 / 34卷 / 14期
关键词
Detection; Transformer; End-to-end; Attention;
D O I
10.1007/s00521-022-07089-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most detection models employ many detection heads to output their prediction results independently. However, the locality of convolutional neural networks (CNN) causes the features extracted by adjacent convolution kernels to be very similar, which leads to duplicate prediction results. To tackle this issue, the hand-designed non-maximum suppression (NMS) procedure is proposed to remove the duplicate results. However, the NMS procedure cannot be applied to certain scenarios, such as the crowd scenarios, and requires careful adjustment of hyper-parameters. Therefore, end-to-end training is necessary to improve the detection ability on more scenarios. To this end, we propose a model that enables the network to adaptively identify duplicate objects and output non-repetitive results, which can effectively replace the hand-designed non-maximum suppression procedure. By adding differentiated priors to image features, and using Multi-Head Attention to enhance the global communication between features, our model can detect objects in an end-to-end manner. Our model can be easily applied to traditional one-stage detectors, e.g., FCOS and RetinaNet. While fast convergence and high recall rate are achieved, the accuracy is also significantly better than the baseline and outperforms many one-stage and two-stage methods. It also achieves the comparable performance as traditional detectors under the dense scene datasets CrowdHuman. Evaluation results demonstrate that our model with ResNet-50 can achieve 40.5% in AP on COCO dataset and 89.2% in AP(50) on CrowdHuman dataset.
引用
收藏
页码:12017 / 12028
页数:12
相关论文
共 50 条
  • [1] RESC: REfine the SCore with adaptive transformer head for end-to-end object detection
    Honglie Wang
    Rong Jiang
    Jian Xu
    Shouqian Sun
    [J]. Neural Computing and Applications, 2022, 34 : 12017 - 12028
  • [2] SRDD: a lightweight end-to-end object detection with transformer
    Zhu, Yuan
    Xia, Qingyuan
    Jin, Wen
    [J]. CONNECTION SCIENCE, 2022, 34 (01) : 2448 - 2465
  • [3] End-to-End Human Object Interaction Detection with HOI Transformer
    Zou, Cheng
    Wang, Bohan
    Hu, Yue
    Liu, Junqi
    Wu, Qian
    Zhao, Yu
    Li, Boxun
    Zhang, Chenguang
    Zhang, Chi
    Wei, Yichen
    Sun, Jian
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11820 - 11829
  • [4] An End-to-End Transformer Model for 3D Object Detection
    Misra, Ishan
    Girdhar, Rohit
    Joulin, Armand
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2886 - 2897
  • [5] Transformer-based End-to-End Object Detection in Aerial Images
    Vo, Nguyen D.
    Le, Nguyen
    Ngo, Giang
    Doan, Du
    Le, Do
    Nguyen, Khang
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 1072 - 1079
  • [6] HeadNet: An End-to-End Adaptive Relational Network for Head Detection
    Li, Wei
    Li, Hongliang
    Wu, Qingbo
    Meng, Fanman
    Xu, Linfeng
    Ngan, King Ngi
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (02) : 482 - 494
  • [7] End-to-End Object Detection with YOLOF
    Xi, Xing
    Huang, Yangyang
    Wu, Weiye
    Luo, Ronghua
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VII, ICIC 2024, 2024, 14868 : 101 - 112
  • [8] End-to-end lane detection with convolution and transformer
    Ge, Zekun
    Ma, Chao
    Fu, Zhumu
    Song, Shuzhong
    Si, Pengju
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 29607 - 29627
  • [9] End-to-end lane detection with convolution and transformer
    Zekun Ge
    Chao Ma
    Zhumu Fu
    Shuzhong Song
    Pengju Si
    [J]. Multimedia Tools and Applications, 2023, 82 : 29607 - 29627
  • [10] End-to-End Temporal Action Detection With Transformer
    Liu, Xiaolong
    Wang, Qimeng
    Hu, Yao
    Tang, Xu
    Zhang, Shiwei
    Bai, Song
    Bai, Xiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5427 - 5441