Combining transformer global and local feature extraction for object detection

被引:6
|
作者
Li, Tianping [1 ]
Zhang, Zhenyi [1 ]
Zhu, Mengdi [1 ]
Cui, Zhaotong [1 ]
Wei, Dongmei [1 ]
机构
[1] Shandong Normal Univ, Sch Phys & Elect, Jinan, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Attention mechanism; Transformer; Anchor-free; Detector head;
D O I
10.1007/s40747-024-01409-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional neural network (CNN)-based object detectors perform excellently but lack global feature extraction and cannot establish global dependencies between object pixels. Although the Transformer is able to compensate for this, it does not incorporate the advantages of convolution, which results in insufficient information being obtained about the details of local features, as well as slow speed and large computational parameters. In addition, Feature Pyramid Network (FPN) lacks information interaction across layers, which can reduce the acquisition of feature context information. To solve the above problems, this paper proposes a CNN-based anchor-free object detector that combines transformer global and local feature extraction (GLFT) to enhance the extraction of semantic information from images. First, the segmented channel extraction feature attention (SCEFA) module was designed to improve the extraction of local multiscale channel features from the model and enhance the discrimination of pixels in the object region. Second, the aggregated feature hybrid transformer (AFHTrans) module combined with convolution is designed to enhance the extraction of global and local feature information from the model and to establish the dependency of the pixels of distant objects. This approach compensates for the shortcomings of the FPN by means of multilayer information aggregation transmission. Compared with a transformer, these methods have obvious advantages. Finally, the feature extraction head (FE-Head) was designed to extract full-text information based on the features of different tasks. An accuracy of 47.0% and 82.76% was achieved on the COCO2017 and PASCAL VOC2007 + 2012 datasets, respectively, and the experimental results validate the effectiveness of our method.
引用
收藏
页码:4897 / 4920
页数:24
相关论文
共 50 条
  • [31] LGFCTR: Local and Global Feature Convolutional Transformer for Image Matching
    Zhong, Wenhao
    Jiang, Jie
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 270
  • [32] Combining Local and Global Cues for Closed Contour Extraction
    Movahedi, Vida
    Elder, James H.
    PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2013, 2013,
  • [33] Moving object detection via feature extraction and classification
    Li, Yang
    OPEN COMPUTER SCIENCE, 2024, 14 (01):
  • [34] Object detection of VisDrone by stronger feature extraction FasterRCNN
    Zhang, Xiangxiang
    Wang, Chunyuan
    Jin, Jie
    Huang, Li
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (01)
  • [35] Feature extraction and fusion network for salient object detection
    Dai, Chao
    Pan, Chen
    He, Wei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (23) : 33955 - 33969
  • [36] Lazy Feature Extraction and Boosted Classifiers for Object Detection
    Varga, Robert
    Nedevschi, Sergiu
    2017 13TH IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2017, : 325 - 330
  • [37] Feature extraction and fusion network for salient object detection
    Chao Dai
    Chen Pan
    Wei He
    Multimedia Tools and Applications, 2022, 81 : 33955 - 33969
  • [38] Object detection based on feature extraction and morphological operations
    Wang, LY
    Li, XP
    Fang, K
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 1001 - 1003
  • [39] MAFPN: a mixed local-global attention feature pyramid network for aerial object detection
    Ma, Tengfei
    Yin, Haitao
    REMOTE SENSING LETTERS, 2024, 15 (09) : 907 - 918
  • [40] LGF2: Local and Global Feature Fusion for Text-Guided Object Detection
    Miao, Shuyu
    Zheng, Hexiang
    Zheng, Lin
    Jin, Hong
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 124 - 135