HA-Transformer: Harmonious aggregation from local to global for object detection

被引:4
|
作者
Chen, Yang [1 ]
Chen, Sihan [1 ]
Deng, Yongqiang [2 ]
Wang, Kunfeng [1 ]
机构
[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China
[2] VanJee Technol, Beijing 100193, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Transformer; multi-head self-attention; global interaction; transition module;
D O I
10.1016/j.eswa.2023.120539
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, the Vision Transformer (ViT) with global modeling capability has shown its excellent performance in classification task, which innovates the development direction for a series of vision tasks. However, due to the enormous cost of multi-head self-attention, reducing computational cost while holding the capability of global interaction remains a big challenge. In this paper, we propose a new architecture by establishing an end-to-end connection from local to global via bridge tokens, so that the global interaction is completed at the window level, effectively solving the quadratic complexity problem of transformer. Besides, we consider a hierarchy of information from short-distance to long-distance, which adds a transition module from local to global to make a more harmonious aggregation of information. Our proposed method is named HA-Transformer. The experimental results on COCO dataset show excellent performance of HA-Transformer for object detection, outperforming several state-of-the-art methods.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Local-Global Self-Attention for Transformer-Based Object Tracking
    Chen, Langkun
    Gao, Long
    Jiang, Yan
    Li, Yunsong
    He, Gang
    Ning, Jifeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12316 - 12329
  • [32] A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
    Huang, Youxiang
    Jiao, Donglai
    Huang, Xingru
    Tang, Tiantian
    Gui, Guan
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 241 - 254
  • [33] Multi-Source Aggregation Transformer for Concealed Object Detection in Millimeter-Wave Images
    Sun, Peng
    Liu, Ting
    Chen, Xiaotong
    Zhang, Shiyin
    Zhao, Yao
    Wei, Shikui
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6148 - 6159
  • [34] Global Context-Aware Progressive Aggregation Network for Salient Object Detection
    Chen, Zuyao
    Xu, Qianqian
    Cong, Runmin
    Huang, Qingming
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10599 - 10606
  • [35] Global to Local: Clip-LSTM-Based Object Detection From Remote Sensing Images
    Teng, Zhu
    Duan, Yani
    Liu, Yan
    Zhang, Baopeng
    Fan, Jianping
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [36] DYNAMIC OBJECT-AWARE MONOCULAR VISUAL ODOMETRY WITH LOCAL AND GLOBAL INFORMATION AGGREGATION
    Wan, Yiming
    Gao, Wei
    Han, Sheng
    Wu, Yihong
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 603 - 607
  • [37] GLFormer: Global and Local Context Aggregation Network for Temporal Action Detection
    He, Yilong
    Zhong, Yong
    Wang, Lishun
    Dang, Jiachen
    APPLIED SCIENCES-BASEL, 2022, 12 (17):
  • [38] FVIFormer: Flow-Guided Global-Local Aggregation Transformer Network for Video Inpainting
    Yan, Weiqing
    Sun, Yiqiu
    Yue, Guanghui
    Zhou, Wei
    Liu, Hantao
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2024, 14 (02) : 235 - 244
  • [39] A Convolution with Transformer Attention Module Integrating Local and Global Features for Object Detection in Remote Sensing Based on YOLOv8n
    Lang, Kaiqi
    Cui, Jie
    Yang, Mingyu
    Wang, Hanyu
    Wang, Zilong
    Shen, Honghai
    REMOTE SENSING, 2024, 16 (05)
  • [40] Dual Network Structure With Interweaved Global-Local Feature Hierarchy for Transformer-Based Object Detection in Remote Sensing Image
    Xue, Jingqian
    He, Da
    Liu, Mengwei
    Shi, Qian
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 6856 - 6866