HA-Transformer: Harmonious aggregation from local to global for object detection

被引:4
|
作者
Chen, Yang [1 ]
Chen, Sihan [1 ]
Deng, Yongqiang [2 ]
Wang, Kunfeng [1 ]
机构
[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China
[2] VanJee Technol, Beijing 100193, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Transformer; multi-head self-attention; global interaction; transition module;
D O I
10.1016/j.eswa.2023.120539
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, the Vision Transformer (ViT) with global modeling capability has shown its excellent performance in classification task, which innovates the development direction for a series of vision tasks. However, due to the enormous cost of multi-head self-attention, reducing computational cost while holding the capability of global interaction remains a big challenge. In this paper, we propose a new architecture by establishing an end-to-end connection from local to global via bridge tokens, so that the global interaction is completed at the window level, effectively solving the quadratic complexity problem of transformer. Besides, we consider a hierarchy of information from short-distance to long-distance, which adds a transition module from local to global to make a more harmonious aggregation of information. Our proposed method is named HA-Transformer. The experimental results on COCO dataset show excellent performance of HA-Transformer for object detection, outperforming several state-of-the-art methods.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Transformer-Based Global PointPillars 3D Object Detection Method
    Zhang, Lin
    Meng, Hua
    Yan, Yunbing
    Xu, Xiaowei
    ELECTRONICS, 2023, 12 (14)
  • [42] PointDet++: an object detection framework based on human local features with transformer encoder
    Yudi Tang
    Bing Wang
    Wangli He
    Feng Qian
    Neural Computing and Applications, 2023, 35 : 10097 - 10108
  • [43] Local to Global: A Sparse Transformer-Based Small Object Detector for Remote Sensing Images
    Li, Zheng
    Wang, Yongcheng
    Feng, Hao
    Chen, Chi
    Xu, Dongdong
    Zhao, Tianqi
    Gao, Yunxiao
    Zhao, Zhikang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [44] The contrasting impact of global and local object attributes on Kanizsa figure detection
    Conci, Markus
    Mueller, Hermann J.
    Elliott, Mark A.
    PERCEPTION & PSYCHOPHYSICS, 2007, 69 (08): : 1278 - 1294
  • [45] Unifying Global-Local Representations in Salient Object Detection With Transformers
    Ren, Sucheng
    Zhao, Nanxuan
    Wen, Qiang
    Han, Guoqiang
    He, Shengfeng
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2870 - 2879
  • [46] Salient object detection by local and global manifold regularized SVM model
    Zhang, Lihe
    Zhang, Dandan
    Sun, Jiayu
    Wei, Guohua
    Bo, Hongguang
    NEUROCOMPUTING, 2019, 340 : 42 - 54
  • [47] Salient object detection using local, global and high contrast graphs
    Nouri, Fatemeh
    Kazemi, Kamran
    Danyali, Habibollah
    SIGNAL IMAGE AND VIDEO PROCESSING, 2018, 12 (04) : 659 - 667
  • [48] The contrasting impact of global and local object attributes on Kanizsa figure detection
    Markus Conci
    Hermann J. Müller
    Mark A. Elliott
    Perception & Psychophysics, 2007, 69 : 1278 - 1294
  • [49] Salient object detection based on global to local visual search guidance
    Wu, Yangxi
    Zhang, Dongbo
    Yin, Feng
    Zhang, Ying
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 102
  • [50] Local to global purification strategy to realize collaborative camouflaged object detection
    Tong, Jinghui
    Bi, Yaqiu
    Zhang, Cong
    Bi, Hongbo
    Yuan, Ye
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 241