CCTseg: A cascade composite transformer semantic segmentation network for UAV visual perception

被引:12
|
作者
Yi, Shi [1 ,3 ,4 ]
Li, Junjie [1 ]
Jiang, Gang [1 ]
Liu, Xi [1 ]
Chen, Ling [2 ,5 ]
机构
[1] Chengdu Univ Technol, Coll Mech & Elect Engn, Chengdu 610059, Peoples R China
[2] Chengdu Univ Technol, Coll Math & Phys, Chengdu 610059, Peoples R China
[3] Vehicle Measurement Control & Safety Key Lab Sichu, Chengdu 610039, Peoples R China
[4] Chongqing Univ Posts & Telecommun, Minist Educ, Key Lab Ind Internet Things & Networked Control, Chongqing 400065, Peoples R China
[5] Chengdu Univ Technol, Geomath Key Lab Sichuan Prov, Chengdu 610059, Peoples R China
关键词
Semantic segmentation; UAV image; Visual perception; Composite encoder; Transformer block;
D O I
10.1016/j.measurement.2023.112612
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Semantic segmentation could obtain the pixel level classification of surrounding environments which is an essential task for autonomous vehicles and mobile robots visual perception. Most existing semantic segmentation networks were focused on the visual perception of autonomous vehicles. Little attention is paid to the semantic segmentation for UAV (Unmanned Aerial Vehicle) visual perception, which is crucial to UAV autonomous flight and landing spot searching. Compared with views from autonomous vehicles, the UAV-based views were more challenging for the semantic segmentation task due to images captured by UAV containing large-scale variation of objects size caused by different altitude and angle. The existing semantic segmentation networks for the visual perception of autonomous vehicles are generally inadequate to effectively extract the representative features of UAV images which required contain context information and local information simultaneously. A cascade composite transformer-based semantic segmentation network is proposed in this study for UAV visual percep-tion. A cascade composite encoder is designed which consists of three transformer-based feature extraction backbones and cascade fused low-level features, middle-level features and high-level features to achieve better feature representation capacity. The spatial enhanced transformer block is implemented as the basic feature extraction block of each backbone to make the extracted features contain context information of environments and local information of objects. A symmetric rhombus decoder is proposed to integrate multi-stage features and make fully utilise of middle stage features which contained abundance of useful information, thus accurately pixel level prediction could be obtained in this way. Ablation studies and comparison experiments for the pro-posed CCTseg have been conducted on two public UAV imagery datasets suitable for UAV autonomous flight and landing spot observing. Experimental results have demonstrated the effectiveness of the proposed network structure and the superiority of proposed network over other state-of-the-art methods for the semantic seg-mentation of UAV visual perception.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Hybrid CNN and Transformer Network for Semantic Segmentation of UAV Remote Sensing Images
    Zhou X.
    Zhou L.
    Gong S.
    Zhang H.
    Zhong S.
    Xia Y.
    Huang Y.
    IEEE Journal on Miniaturization for Air and Space Systems, 2024, 5 (01): : 33 - 41
  • [2] UAVformer: A Composite Transformer Network for Urban Scene Segmentation of UAV Images
    Yi, Shi
    Liu, Xi
    Li, Junjie
    Chen, Ling
    PATTERN RECOGNITION, 2023, 133
  • [3] HSPFormer: Hierarchical Spatial Perception Transformer for Semantic Segmentation
    Chen, Siyu
    Han, Ting
    Zhang, Changshe
    Su, Jinhe
    Wang, Ruisheng
    Chen, Yiping
    Wang, Zongyue
    Cai, Guorong
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025,
  • [4] Semantic Segmentation of UAV Images Based on Transformer Framework with Context Information
    Kumar, Satyawant
    Kumar, Abhishek
    Lee, Dong-Gyu
    MATHEMATICS, 2022, 10 (24)
  • [5] Semantic segmentation feature fusion network based on transformer
    Li, Tianping
    Cui, Zhaotong
    Zhang, Hua
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [6] Efficient and adaptive semantic segmentation network based on Transformer
    Zhang H.-B.
    Cai L.
    Ren J.-P.
    Wang R.-Y.
    Liu F.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (06): : 1205 - 1214
  • [7] A Lightweight CNN-Transformer Network With Laplacian Loss for Low-Altitude UAV Imagery Semantic Segmentation
    Lu, Wen
    Zhang, Zhiqi
    Nguyen, Minh
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 20
  • [8] CSENet: Cascade semantic erasing network for weakly-supervised semantic segmentation
    Liu, Jiahui
    Yu, Changqian
    Yang, Beibei
    Gao, Changxin
    Sang, Nong
    NEUROCOMPUTING, 2021, 453 : 885 - 895
  • [9] Detail Perception Network for Semantic Segmentation in Water Scenes
    Liang, Cuixiao
    Cai, Wenjie
    Peng, Shaowu
    Liu, Qiong
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT III, 2022, 13282 : 187 - 199
  • [10] An Enhanced Downsampling Transformer Network for Point Cloud Semantic Segmentation
    Wang, Yang
    Wei, Zixuan
    Wan, Zhibo
    ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2023, 2024, 1998 : 262 - 269