A CNN-Transformer Hybrid Model Based on CSWin Transformer for UAV Image Object Detection

被引：24

作者：

Lu, Wanjie ^{[1
]}

Lan, Chaozhen ^{[2
]}

Niu, Chaoyang ^{[1
]}

Liu, Wei ^{[1
]}

Lyu, Liang ^{[2
]}

Shi, Qunshan ^{[2
]}

Wang, Shiju ^{[1
]}

机构：

[1] PLA Strateg Support Force Informat Engn Univ, Inst Data & Target Engn, Zhengzhou 450001, Peoples R China

[2] PLA Strateg Support Force Informat Engn Univ, Inst Geospatial Informat, Zhengzhou 450001, Peoples R China

来源：

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING | 2023年 / 16卷

基金：

中国国家自然科学基金;

关键词：

Object detection; Transformers; Feature extraction; Detectors; Autonomous aerial vehicles; Computational modeling; Training; Convolutional neural network (CNN); hybrid network; object detection; transformer; unmanned aerial vehicle (UAV) image; NETWORK;

D O I：

10.1109/JSTARS.2023.3234161

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The object detection of unmanned aerial vehicle (UAV) images has widespread applications in numerous fields; however, the complex background, diverse scales, and uneven distribution of objects in UAV images make object detection a challenging task. This study proposes a convolution neural network transformer hybrid model to achieve efficient object detection in UAV images, which has three advantages that contribute to improving object detection performance. First, the efficient and effective cross-shaped window (CSWin) transformer can be used as a backbone to obtain image features at different levels, and the obtained features can be input into the feature pyramid network to achieve multiscale representation, which will contribute to multiscale object detection. Second, a hybrid patch embedding module is constructed to extract and utilize low-level information such as the edges and corners of the image. Finally, a slicing-based inference method is constructed to fuse the inference results of the original image and sliced images, which will improve the small object detection accuracy without modifying the original network. Experimental results on public datasets illustrate that the proposed method can improve performance more effectively than several popular and state-of-the-art object detection methods.

引用

下载

页码：1211 / 1231

页数：21

共 50 条

[41] Hybrid CNN-Transformer model for medical image segmentation with pyramid convolution and multi-layer perceptron
Liu, Xiaowei
Hu, Yikun
Chen, Jianguo
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 86
[42] Multi-Object Tracking Algorithm Based on CNN-Transformer Feature Fusion
Zhang, Yingjun
Bai, Xiaohui
Xie, Binhong
Computer Engineering and Applications, 2024, 60 (02) : 180 - 190
[43] A CNN-transformer hybrid approach for an intrusion detection system in advanced metering infrastructure
Yao, Ruizhe
Wang, Ning
Chen, Peng
Ma, Di
Sheng, Xianjun
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (13) : 19463 - 19486
[44] HCT-net: hybrid CNN-transformer model based on a neural architecture search network for medical image segmentation
Yu, Zhihong
Lee, Feifei
Chen, Qiu
APPLIED INTELLIGENCE, 2023, 53 (17) : 19990 - 20006
[45] CNN-TransNet: A Hybrid CNN-Transformer Network With Differential Feature Enhancement for Cloud Detection
Ma, Nan
Sun, Lin
He, Yawen
Zhou, Chenghu
Dong, Chuanxiang
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
[46] HCT-net: hybrid CNN-transformer model based on a neural architecture search network for medical image segmentation
Zhihong Yu
Feifei Lee
Qiu Chen
Applied Intelligence, 2023, 53 : 19990 - 20006
[47] Medical Image Classification with a Hybrid SSM Model Based on CNN and Transformer
Hu, Can
Cao, Ning
Zhou, Han
Guo, Bin
ELECTRONICS, 2024, 13 (15)
[48] Semhybridnet: a semantically enhanced hybrid CNN-transformer network for radar pulse image segmentation
Liu, Hongjia
Xiao, Yubin
Wu, Xuan
Li, Yuanshu
Zhao, Peng
Liang, Yanchun
Wang, Liupu
Zhou, You
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (02) : 2851 - 2868
[49] UCTNet: Uncertainty-guided CNN-Transformer hybrid networks for medical image segmentation
Guo, Xiayu
Lin, Xian
Yang, Xin
Yu, Li
Cheng, Kwang-Ting
Yan, Zengqiang
PATTERN RECOGNITION, 2024, 152
[50] CTIF-Net: A CNN-Transformer Iterative Fusion Network for Salient Object Detection
Yuan, Junbin
Zhu, Aiqing
Xu, Qingzhen
Wattanachote, Kanoksak
Gong, Yongyi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3795 - 3805

← 1 2 3 4 5 →