CCTseg: A cascade composite transformer semantic segmentation network for UAV visual perception

被引：12

作者：

Yi, Shi ^{[1
,3
,4
]}

Li, Junjie ^{[1
]}

Jiang, Gang ^{[1
]}

Liu, Xi ^{[1
]}

Chen, Ling ^{[2
,5
]}

机构：

[1] Chengdu Univ Technol, Coll Mech & Elect Engn, Chengdu 610059, Peoples R China

[2] Chengdu Univ Technol, Coll Math & Phys, Chengdu 610059, Peoples R China

[3] Vehicle Measurement Control & Safety Key Lab Sichu, Chengdu 610039, Peoples R China

[4] Chongqing Univ Posts & Telecommun, Minist Educ, Key Lab Ind Internet Things & Networked Control, Chongqing 400065, Peoples R China

[5] Chengdu Univ Technol, Geomath Key Lab Sichuan Prov, Chengdu 610059, Peoples R China

来源：

MEASUREMENT | 2023年 / 211卷

关键词：

Semantic segmentation; UAV image; Visual perception; Composite encoder; Transformer block;

D O I：

10.1016/j.measurement.2023.112612

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Semantic segmentation could obtain the pixel level classification of surrounding environments which is an essential task for autonomous vehicles and mobile robots visual perception. Most existing semantic segmentation networks were focused on the visual perception of autonomous vehicles. Little attention is paid to the semantic segmentation for UAV (Unmanned Aerial Vehicle) visual perception, which is crucial to UAV autonomous flight and landing spot searching. Compared with views from autonomous vehicles, the UAV-based views were more challenging for the semantic segmentation task due to images captured by UAV containing large-scale variation of objects size caused by different altitude and angle. The existing semantic segmentation networks for the visual perception of autonomous vehicles are generally inadequate to effectively extract the representative features of UAV images which required contain context information and local information simultaneously. A cascade composite transformer-based semantic segmentation network is proposed in this study for UAV visual percep-tion. A cascade composite encoder is designed which consists of three transformer-based feature extraction backbones and cascade fused low-level features, middle-level features and high-level features to achieve better feature representation capacity. The spatial enhanced transformer block is implemented as the basic feature extraction block of each backbone to make the extracted features contain context information of environments and local information of objects. A symmetric rhombus decoder is proposed to integrate multi-stage features and make fully utilise of middle stage features which contained abundance of useful information, thus accurately pixel level prediction could be obtained in this way. Ablation studies and comparison experiments for the pro-posed CCTseg have been conducted on two public UAV imagery datasets suitable for UAV autonomous flight and landing spot observing. Experimental results have demonstrated the effectiveness of the proposed network structure and the superiority of proposed network over other state-of-the-art methods for the semantic seg-mentation of UAV visual perception.

引用

页数：18

共 50 条

[1] Hybrid CNN and Transformer Network for Semantic Segmentation of UAV Remote Sensing Images
Zhou X.
Zhou L.
Gong S.
Zhang H.
Zhong S.
Xia Y.
Huang Y.
IEEE Journal on Miniaturization for Air and Space Systems, 2024, 5 (01): : 33 - 41
[2] UAVformer: A Composite Transformer Network for Urban Scene Segmentation of UAV Images
Yi, Shi
Liu, Xi
Li, Junjie
Chen, Ling
PATTERN RECOGNITION, 2023, 133
[3] HSPFormer: Hierarchical Spatial Perception Transformer for Semantic Segmentation
Chen, Siyu
Han, Ting
Zhang, Changshe
Su, Jinhe
Wang, Ruisheng
Chen, Yiping
Wang, Zongyue
Cai, Guorong
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025,
[4] Semantic Segmentation of UAV Images Based on Transformer Framework with Context Information
Kumar, Satyawant
Kumar, Abhishek
Lee, Dong-Gyu
MATHEMATICS, 2022, 10 (24)
[5] Semantic segmentation feature fusion network based on transformer
Li, Tianping
Cui, Zhaotong
Zhang, Hua
SCIENTIFIC REPORTS, 2025, 15 (01):
[6] Efficient and adaptive semantic segmentation network based on Transformer
Zhang H.-B.
Cai L.
Ren J.-P.
Wang R.-Y.
Liu F.
Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (06): : 1205 - 1214
[7] A Lightweight CNN-Transformer Network With Laplacian Loss for Low-Altitude UAV Imagery Semantic Segmentation
Lu, Wen
Zhang, Zhiqi
Nguyen, Minh
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 20
[8] CSENet: Cascade semantic erasing network for weakly-supervised semantic segmentation
Liu, Jiahui
Yu, Changqian
Yang, Beibei
Gao, Changxin
Sang, Nong
NEUROCOMPUTING, 2021, 453 : 885 - 895
[9] Detail Perception Network for Semantic Segmentation in Water Scenes
Liang, Cuixiao
Cai, Wenjie
Peng, Shaowu
Liu, Qiong
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT III, 2022, 13282 : 187 - 199
[10] An Enhanced Downsampling Transformer Network for Point Cloud Semantic Segmentation
Wang, Yang
Wei, Zixuan
Wan, Zhibo
ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2023, 2024, 1998 : 262 - 269

← 1 2 3 4 5 →