A CNN-Transformer Hybrid Model Based on CSWin Transformer for UAV Image Object Detection

被引：40

作者：

Lu, Wanjie ^{[1
]}

Lan, Chaozhen ^{[2
]}

Niu, Chaoyang ^{[1
]}

Liu, Wei ^{[1
]}

Lyu, Liang ^{[2
]}

Shi, Qunshan ^{[2
]}

Wang, Shiju ^{[1
]}

机构：

[1] PLA Strateg Support Force Informat Engn Univ, Inst Data & Target Engn, Zhengzhou 450001, Peoples R China

[2] PLA Strateg Support Force Informat Engn Univ, Inst Geospatial Informat, Zhengzhou 450001, Peoples R China

来源：

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING | 2023年 / 16卷

基金：

中国国家自然科学基金;

关键词：

Object detection; Transformers; Feature extraction; Detectors; Autonomous aerial vehicles; Computational modeling; Training; Convolutional neural network (CNN); hybrid network; object detection; transformer; unmanned aerial vehicle (UAV) image; NETWORK;

D O I：

10.1109/JSTARS.2023.3234161

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The object detection of unmanned aerial vehicle (UAV) images has widespread applications in numerous fields; however, the complex background, diverse scales, and uneven distribution of objects in UAV images make object detection a challenging task. This study proposes a convolution neural network transformer hybrid model to achieve efficient object detection in UAV images, which has three advantages that contribute to improving object detection performance. First, the efficient and effective cross-shaped window (CSWin) transformer can be used as a backbone to obtain image features at different levels, and the obtained features can be input into the feature pyramid network to achieve multiscale representation, which will contribute to multiscale object detection. Second, a hybrid patch embedding module is constructed to extract and utilize low-level information such as the edges and corners of the image. Finally, a slicing-based inference method is constructed to fuse the inference results of the original image and sliced images, which will improve the small object detection accuracy without modifying the original network. Experimental results on public datasets illustrate that the proposed method can improve performance more effectively than several popular and state-of-the-art object detection methods.

引用

页码：1211 / 1231

页数：21

共 50 条

[1] A hybrid CNN-Transformer model for Historical Document Image Binarization
Rezanezhad, Vahid
Baierer, Konstantin
Neudecker, Clemens
PROCEEDINGS OF THE 2023 INTERNATIONAL WORKSHOP ON HISTORICAL DOCUMENT IMAGING AND PROCESSING, HIP 2023, 2023, : 79 - 84
[2] Image harmonization with Simple Hybrid CNN-Transformer Network
Li, Guanlin
Zhao, Bin
Li, Xuelong
NEURAL NETWORKS, 2024, 180
[3] HCformer: Hybrid CNN-Transformer for LDCT Image Denoising
Yuan, Jinli
Zhou, Feng
Guo, Zhitao
Li, Xiaozeng
Yu, Hengyong
JOURNAL OF DIGITAL IMAGING, 2023, 36 (05) : 2290 - 2305
[4] HCformer: Hybrid CNN-Transformer for LDCT Image Denoising
Jinli Yuan
Feng Zhou
Zhitao Guo
Xiaozeng Li
Hengyong Yu
Journal of Digital Imaging, 2023, 36 (5) : 2290 - 2305
[5] CNN-Transformer Hybrid Architecture for Early Fire Detection
Yang, Chenyue
Pan, Yixuan
Cao, Yichao
Lu, Xiaobo
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 570 - 581
[6] Remote sensing image change detection based on CNN-Transformer structure
Pan, Mengyang
Yang, Hang
Fan, Xianghui
CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2024, 39 (10) : 1361 - 1379
[7] GhostFormer: Efficiently amalgamated CNN-transformer architecture for object detection
Xie, Xin
Wu, Dengquan
Xie, Mingye
Li, Zixi
PATTERN RECOGNITION, 2024, 148
[8] Object Detection Algorithm Based on CNN-Transformer Dual Modal Feature Fusion
Yang Chen
Hou Zhiqiang
Li Xinyue
Ma Sugang
Yang Xiaobao
ACTA PHOTONICA SINICA, 2024, 53 (03)
[9] TFCNs: A CNN-Transformer Hybrid Network for Medical Image Segmentation
Li, Zihan
Li, Dihan
Xu, Cangbai
Wang, Weice
Hong, Qingqi
Li, Qingde
Tian, Jie
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 781 - 792
[10] Hybrid CNN-Transformer Feature Fusion for Single Image Deraining
Chen, Xiang
Pan, Jinshan
Lu, Jiyang
Fan, Zhentao
Li, Hao
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 378 - 386

← 1 2 3 4 5 →