HA-Transformer: Harmonious aggregation from local to global for object detection

被引：4

作者：

Chen, Yang ^{[1
]}

Chen, Sihan ^{[1
]}

Deng, Yongqiang ^{[2
]}

Wang, Kunfeng ^{[1
]}

机构：

[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China

[2] VanJee Technol, Beijing 100193, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2023年 / 230卷

基金：

中国国家自然科学基金;

关键词：

Object detection; Transformer; multi-head self-attention; global interaction; transition module;

D O I：

10.1016/j.eswa.2023.120539

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, the Vision Transformer (ViT) with global modeling capability has shown its excellent performance in classification task, which innovates the development direction for a series of vision tasks. However, due to the enormous cost of multi-head self-attention, reducing computational cost while holding the capability of global interaction remains a big challenge. In this paper, we propose a new architecture by establishing an end-to-end connection from local to global via bridge tokens, so that the global interaction is completed at the window level, effectively solving the quadratic complexity problem of transformer. Besides, we consider a hierarchy of information from short-distance to long-distance, which adds a transition module from local to global to make a more harmonious aggregation of information. Our proposed method is named HA-Transformer. The experimental results on COCO dataset show excellent performance of HA-Transformer for object detection, outperforming several state-of-the-art methods.

引用

页数：9

共 50 条

[21] A survey: object detection methods from CNN to transformer
Ershat Arkin
Nurbiya Yadikar
Xuebin Xu
Alimjan Aysa
Kurban Ubul
Multimedia Tools and Applications, 2023, 82 : 21353 - 21383
[22] Camouflaged Object Detection Based on Feature Aggregation and Global Semantic Learning
Wang, Kuan
Li, Xiuhong
Li, Boyuan
Li, Songlin
Wei, Zijun
Wan, Lining
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XII, 2025, 15042 : 258 - 271
[23] Radar-camera fusion for 3D object detection with aggregation transformer
Li, Jun
Zhang, Han
Wu, Zizhang
Xu, Tianhao
APPLIED INTELLIGENCE, 2024, 54 (21) : 10627 - 10639
[24] AHT: A Novel Aggregation Hyper-transformer for Few-Shot Object Detection
Lai, Lanqing
Yu, Yale
Suo, Wei
Wang, Peng
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XII, 2024, 14436 : 43 - 55
[25] Motion-Guided Global-Local Aggregation Transformer Network for Precipitation Nowcasting
Dong, Xichao
Zhao, Zewei
Wang, Yupei
Wang, Jianping
Hu, Cheng
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[26] CoupleNet: Coupling Global Structure with Local Parts for Object Detection
Zhu, Yousong
Zhao, Chaoyang
Wang, Jinqiao
Zhao, Xu
Wu, Yi
Lu, Hanqing
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4146 - 4154
[27] Integration of Local and Global Features for Anatomical Object Detection in Ultrasound
Rahmatullah, Bahbibi
Papageorghiou, Aris T.
Noble, J. Alison
MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION - MICCAI 2012, PT III, 2012, 7512 : 402 - 409
[28] Local and Global Collaboration for Object Detection Enhancement with Information Redundancy
Lee, Jinseok
Ryu, Junghun
Hong, Sangjin
Cho, We-Duke
AVSS: 2009 6TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, 2009, : 358 - +
[29] Local and Global Information Exchange for Enhancing Object Detection and Tracking
Lee, Jinseok
Cho, Shung Han
Oh, Seong-Jun
Hong, Sangjin
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2012, 6 (05): : 1400 - 1420
[30] Learning Local-Global Representation for Scribble-Based RGB-D Salient Object Detection via Transformer
Wang, Yue
Zhang, Lu
Zhang, Pingping
Zhuge, Yunzhi
Wu, Junfeng
Yu, Hong
Lu, Huchuan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 11592 - 11604

← 1 2 3 4 5 →