Combining transformer global and local feature extraction for object detection

被引:6
|
作者
Li, Tianping [1 ]
Zhang, Zhenyi [1 ]
Zhu, Mengdi [1 ]
Cui, Zhaotong [1 ]
Wei, Dongmei [1 ]
机构
[1] Shandong Normal Univ, Sch Phys & Elect, Jinan, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Attention mechanism; Transformer; Anchor-free; Detector head;
D O I
10.1007/s40747-024-01409-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional neural network (CNN)-based object detectors perform excellently but lack global feature extraction and cannot establish global dependencies between object pixels. Although the Transformer is able to compensate for this, it does not incorporate the advantages of convolution, which results in insufficient information being obtained about the details of local features, as well as slow speed and large computational parameters. In addition, Feature Pyramid Network (FPN) lacks information interaction across layers, which can reduce the acquisition of feature context information. To solve the above problems, this paper proposes a CNN-based anchor-free object detector that combines transformer global and local feature extraction (GLFT) to enhance the extraction of semantic information from images. First, the segmented channel extraction feature attention (SCEFA) module was designed to improve the extraction of local multiscale channel features from the model and enhance the discrimination of pixels in the object region. Second, the aggregated feature hybrid transformer (AFHTrans) module combined with convolution is designed to enhance the extraction of global and local feature information from the model and to establish the dependency of the pixels of distant objects. This approach compensates for the shortcomings of the FPN by means of multilayer information aggregation transmission. Compared with a transformer, these methods have obvious advantages. Finally, the feature extraction head (FE-Head) was designed to extract full-text information based on the features of different tasks. An accuracy of 47.0% and 82.76% was achieved on the COCO2017 and PASCAL VOC2007 + 2012 datasets, respectively, and the experimental results validate the effectiveness of our method.
引用
收藏
页码:4897 / 4920
页数:24
相关论文
共 50 条
  • [41] An Object Detection Model for Power Lines With Occlusions Combining CNN and Transformer
    Shi, Weicheng
    Lyu, Xiaoqin
    Han, Lei
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [42] Global and local feature extraction by natural elastic nets
    Wu, JM
    Lin, ZH
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (09): : 2267 - 2271
  • [43] Global and local preserving feature extraction for image categorization
    Bie, Rongfang
    Jin, Xin
    Xu, Chuan
    Chen, Chuanliang
    Xu, Anbang
    Shen, Xian
    ARTIFICIAL NEURAL NETWORKS - ICANN 2007, PT 2, PROCEEDINGS, 2007, 4669 : 546 - +
  • [44] Combining local and global motion models for feature point tracking
    Buchanan, Aeron
    Fitzgibbon, Andrew
    2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 2038 - +
  • [45] Fusion of global and local information for object detection
    Garg, A
    Agarwal, S
    Huang, TS
    16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL III, PROCEEDINGS, 2002, : 723 - 726
  • [46] Local and Global Feature Based Explainable Feature Envy Detection
    Yin, Xin
    Shi, Chongyang
    Zhao, Shuxin
    2021 IEEE 45TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2021), 2021, : 942 - 951
  • [47] Salient detection via local and global feature
    Cai Q.
    Hao J.-Y.
    Cao J.
    Li H.-S.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2017, 25 (03): : 772 - 778
  • [48] Feature aggregation with transformer for RGB-T salient object detection
    Zhang, Ping
    Xu, Mengnan
    Zhang, Ziyan
    Gao, Pan
    Zhang, Jing
    NEUROCOMPUTING, 2023, 546
  • [49] FPDT: a multi-scale feature pyramidal object detection transformer
    Huang, Kailai
    Wen, Mi
    Wang, Chen
    Ling, Lina
    JOURNAL OF APPLIED REMOTE SENSING, 2023, 17 (02)
  • [50] LGAFormer: transformer with local and global attention for action detection
    Zhang, Haiping
    Zhou, Fuxing
    Wang, Dongjing
    Zhang, Xinhao
    Yu, Dongjin
    Guan, Liming
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (12): : 17952 - 17979