HA-Transformer: Harmonious aggregation from local to global for object detection

被引:4
|
作者
Chen, Yang [1 ]
Chen, Sihan [1 ]
Deng, Yongqiang [2 ]
Wang, Kunfeng [1 ]
机构
[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China
[2] VanJee Technol, Beijing 100193, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Transformer; multi-head self-attention; global interaction; transition module;
D O I
10.1016/j.eswa.2023.120539
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, the Vision Transformer (ViT) with global modeling capability has shown its excellent performance in classification task, which innovates the development direction for a series of vision tasks. However, due to the enormous cost of multi-head self-attention, reducing computational cost while holding the capability of global interaction remains a big challenge. In this paper, we propose a new architecture by establishing an end-to-end connection from local to global via bridge tokens, so that the global interaction is completed at the window level, effectively solving the quadratic complexity problem of transformer. Besides, we consider a hierarchy of information from short-distance to long-distance, which adds a transition module from local to global to make a more harmonious aggregation of information. Our proposed method is named HA-Transformer. The experimental results on COCO dataset show excellent performance of HA-Transformer for object detection, outperforming several state-of-the-art methods.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Combining transformer global and local feature extraction for object detection
    Li, Tianping
    Zhang, Zhenyi
    Zhu, Mengdi
    Cui, Zhaotong
    Wei, Dongmei
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (04) : 4897 - 4920
  • [2] Memory Enhanced Global-Local Aggregation for Video Object Detection
    Chen, Yihong
    Cao, Yue
    Hu, Han
    Wang, Liwei
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 10334 - 10343
  • [3] Local and Global Feature Aggregation-Aware Network for Salient Object Detection
    Da, Zikai
    Gao, Yu
    Xue, Zihan
    Cao, Jing
    Wang, Peizhen
    ELECTRONICS, 2022, 11 (02)
  • [4] Global-local Feature Aggregation for Event-based Object Detection on EventKITTI
    Liang, Zichen
    Cao, Hu
    Yang, Chu
    Zhang, Zikai
    Chen, Guang
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS (MFI), 2022,
  • [5] Global and local information aggregation network for edge-aware salient object detection
    Zhang, Qing
    Zhang, Liqian
    Wang, Dong
    Shi, Yanjiao
    Lin, Jiajun
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 81
  • [6] Spatiotemporal Aggregation Transformer for Object Detection With Neuromorphic Vision Sensors
    Guo, Zhaoxuan
    Gao, Jiandong
    Ma, Guangyuan
    Xu, Jiangtao
    IEEE SENSORS JOURNAL, 2024, 24 (12) : 19397 - 19406
  • [7] Feature aggregation with transformer for RGB-T salient object detection
    Zhang, Ping
    Xu, Mengnan
    Zhang, Ziyan
    Gao, Pan
    Zhang, Jing
    NEUROCOMPUTING, 2023, 546
  • [8] Fusion of global and local information for object detection
    Garg, A
    Agarwal, S
    Huang, TS
    16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL III, PROCEEDINGS, 2002, : 723 - 726
  • [9] LGAFormer: transformer with local and global attention for action detection
    Zhang, Haiping
    Zhou, Fuxing
    Wang, Dongjing
    Zhang, Xinhao
    Yu, Dongjin
    Guan, Liming
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (12): : 17952 - 17979
  • [10] From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation
    Xie, Yunfei
    Xie, Cihang
    Yuille, Alan
    Mei, Jieru
    COMPUTER VISION - ECCV 2024, PT LXVI, 2025, 15124 : 341 - 356