RESC: REfine the SCore with adaptive transformer head for end-to-end object detection

被引:0
|
作者
Wang, Honglie [1 ]
Jiang, Rong [2 ,3 ]
Xu, Jian [4 ]
Sun, Shouqian [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 321000, Zhejiang, Peoples R China
[2] Zhejiang Univ, Ctr Balance Architecture, Hangzhou 321000, Zhejiang, Peoples R China
[3] Zhejiang Univ, Architectural Design & Res Inst, Zhejiang Univ Co Ltd, Hangzhou 321000, Zhejiang, Peoples R China
[4] Tsinghua Univ, Tsinghua Berkeley Shenzhen Inst, Shenzhen 518000, Guangdong, Peoples R China
来源
NEURAL COMPUTING & APPLICATIONS | 2022年 / 34卷 / 14期
关键词
Detection; Transformer; End-to-end; Attention;
D O I
10.1007/s00521-022-07089-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most detection models employ many detection heads to output their prediction results independently. However, the locality of convolutional neural networks (CNN) causes the features extracted by adjacent convolution kernels to be very similar, which leads to duplicate prediction results. To tackle this issue, the hand-designed non-maximum suppression (NMS) procedure is proposed to remove the duplicate results. However, the NMS procedure cannot be applied to certain scenarios, such as the crowd scenarios, and requires careful adjustment of hyper-parameters. Therefore, end-to-end training is necessary to improve the detection ability on more scenarios. To this end, we propose a model that enables the network to adaptively identify duplicate objects and output non-repetitive results, which can effectively replace the hand-designed non-maximum suppression procedure. By adding differentiated priors to image features, and using Multi-Head Attention to enhance the global communication between features, our model can detect objects in an end-to-end manner. Our model can be easily applied to traditional one-stage detectors, e.g., FCOS and RetinaNet. While fast convergence and high recall rate are achieved, the accuracy is also significantly better than the baseline and outperforms many one-stage and two-stage methods. It also achieves the comparable performance as traditional detectors under the dense scene datasets CrowdHuman. Evaluation results demonstrate that our model with ResNet-50 can achieve 40.5% in AP on COCO dataset and 89.2% in AP(50) on CrowdHuman dataset.
引用
下载
收藏
页码:12017 / 12028
页数:12
相关论文
共 50 条
  • [21] End-to-End Object Detection with Fully Convolutional Network
    Wang, Jianfeng
    Song, Lin
    Li, Zeming
    Sun, Hongbin
    Sun, Jian
    Zheng, Nanning
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15844 - 15853
  • [22] Progressive End-to-End Object Detection in Crowded Scenes
    Zheng, Anlin
    Zhang, Yuang
    Zhang, Xiangyu
    Qi, Xiaojuan
    Sun, Jian
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 847 - 856
  • [23] Toward End-to-End Object Detection and Tracking on the Edge
    Tabkhi, Hamed
    SEC 2017: 2017 THE SECOND ACM/IEEE SYMPOSIUM ON EDGE COMPUTING (SEC'17), 2017,
  • [24] Dense Distinct Query for End-to-End Object Detection
    Zhang, Shilong
    Wang, Xinjiang
    Wang, Jiaqi
    Pang, Jiangmiao
    Lyu, Chengqi
    Zhang, Wenwei
    Luo, Ping
    Chen, Kai
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7329 - 7338
  • [25] End-to-End Edge Neuromorphic Object Detection System
    Silva, D. A.
    Shymyrbay, A.
    Smagulova, K.
    Elsheikh, A.
    Fouda, M. E.
    Eltawil, A. M.
    2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 194 - 198
  • [26] NucDETR: End-to-End Transformer for Nucleus Detection in Histopathology Images
    Obeid, Ahmad
    Mahbub, Taslim
    Javed, Sajid
    Dias, Jorge
    Werghi, Naoufel
    COMPUTATIONAL MATHEMATICS MODELING IN CANCER ANALYSIS, CMMCA 2022, 2022, 13574 : 47 - 57
  • [27] TSDet: End-to-End Method with Transformer for SAR Ship Detection
    Chen, Yanyu
    Xia, Zhihao
    Liu, Jian
    Wu, Chenwei
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [28] End-to-end power equipment detection and localization with RM transformer
    Fang, Jian
    Wang, Youyuan
    Chen, Weigen
    IET GENERATION TRANSMISSION & DISTRIBUTION, 2022, 16 (19) : 3941 - 3950
  • [29] Enhancing Arabic Cyberbullying Detection with End-to-End Transformer Model
    Mahdi, Mohamed A.
    Fati, Suliman Mohamed
    Hazber, Mohamed A. G.
    Ahamad, Shahanawaj
    Saad, Sawsan A.
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2024, : 1651 - 1671
  • [30] End-to-end adaptive object detection with learnable Retinex for low-light city environment
    Yao, Miao
    Lu, Yijing
    Mou, Jinteng
    Yan, Chen
    Liu, Dongjingdian
    NONDESTRUCTIVE TESTING AND EVALUATION, 2024, 39 (01) : 142 - 163