Combining transformer global and local feature extraction for object detection

被引:6
|
作者
Li, Tianping [1 ]
Zhang, Zhenyi [1 ]
Zhu, Mengdi [1 ]
Cui, Zhaotong [1 ]
Wei, Dongmei [1 ]
机构
[1] Shandong Normal Univ, Sch Phys & Elect, Jinan, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Attention mechanism; Transformer; Anchor-free; Detector head;
D O I
10.1007/s40747-024-01409-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional neural network (CNN)-based object detectors perform excellently but lack global feature extraction and cannot establish global dependencies between object pixels. Although the Transformer is able to compensate for this, it does not incorporate the advantages of convolution, which results in insufficient information being obtained about the details of local features, as well as slow speed and large computational parameters. In addition, Feature Pyramid Network (FPN) lacks information interaction across layers, which can reduce the acquisition of feature context information. To solve the above problems, this paper proposes a CNN-based anchor-free object detector that combines transformer global and local feature extraction (GLFT) to enhance the extraction of semantic information from images. First, the segmented channel extraction feature attention (SCEFA) module was designed to improve the extraction of local multiscale channel features from the model and enhance the discrimination of pixels in the object region. Second, the aggregated feature hybrid transformer (AFHTrans) module combined with convolution is designed to enhance the extraction of global and local feature information from the model and to establish the dependency of the pixels of distant objects. This approach compensates for the shortcomings of the FPN by means of multilayer information aggregation transmission. Compared with a transformer, these methods have obvious advantages. Finally, the feature extraction head (FE-Head) was designed to extract full-text information based on the features of different tasks. An accuracy of 47.0% and 82.76% was achieved on the COCO2017 and PASCAL VOC2007 + 2012 datasets, respectively, and the experimental results validate the effectiveness of our method.
引用
收藏
页码:4897 / 4920
页数:24
相关论文
共 50 条
  • [21] CenterNet-Saccade: Enhancing Sonar Object Detection with Lightweight Global Feature Extraction
    Wang, Wenling
    Zhang, Qiaoxin
    Qi, Zhisheng
    Huang, Mengxing
    SENSORS, 2024, 24 (02)
  • [22] Global-local Feature Aggregation for Event-based Object Detection on EventKITTI
    Liang, Zichen
    Cao, Hu
    Yang, Chu
    Zhang, Zikai
    Chen, Guang
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS (MFI), 2022,
  • [23] Global and Local Multi-scale Feature Fusion for Object Detection and Semantic Segmentation
    Lim, Young-Chul
    Kang, Minsung
    2019 30TH IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV19), 2019, : 2557 - 2562
  • [24] Multiple local feature strategy - A combining approach for object categorization
    Rashidi, R.
    Haddadnia, J.
    Hemati, R.
    International Review on Computers and Software, 2011, 6 (05) : 787 - 794
  • [25] Local and global feature extraction for face recognition
    Lee, Y
    Lee, K
    Pan, S
    AUDIO AND VIDEO BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2005, 3546 : 219 - 228
  • [26] Feature extraction by integrated global and local discriminator
    Wei, Lai
    Xu, Feifei
    Journal of Computational Information Systems, 2014, 10 (07): : 2719 - 2726
  • [27] NLFFTNet: A non-local feature fusion transformer network for multi-scale object detection
    Zeng, Kai
    Ma, Qian
    Wu, Jiawen
    Xiang, Sijia
    Shen, Tao
    Zhang, Lei
    NEUROCOMPUTING, 2022, 493 : 15 - 27
  • [28] Object extraction combining image partition with motion detection
    Yang, Wenming
    Lu, Wang
    Zhang, Naitong
    2007 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-7, 2007, : 1465 - +
  • [29] Robust and Efficient Object Change Detection by Combining Global Semantic Information and Local Geometric Verification
    Langer, Edith
    Patten, Timothy
    Vincze, Markus
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 8453 - 8460
  • [30] Transformer-Based Visual Object Tracking with Global Feature Enhancement
    Wang, Shuai
    Fang, Genwen
    Liu, Lei
    Wang, Jun
    Zhu, Kongfen
    Melo, Silas N.
    APPLIED SCIENCES-BASEL, 2023, 13 (23):