Scene-level buildings damage recognition based on Cross Conv-Transformer

被引:1
|
作者
Shi, Lingfei [1 ]
Zhang, Feng [1 ,2 ,5 ]
Xia, Junshi [3 ]
Xie, Jibo [4 ]
机构
[1] Zhejiang Univ, Sch Earth Sci, Hangzhou, Peoples R China
[2] Zhejiang Prov Key Lab Geog Informat Sci, Hangzhou, Peoples R China
[3] RIKEN Ctr Adv Intelligence Project, Geoinformat Unit, Tokyo, Japan
[4] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing, Peoples R China
[5] Zhejiang Univ, Sch Earth Sci, 38 Zheda Rd, Hangzhou 310027, Peoples R China
关键词
Scene recognition; damaged buildings; aerial images; transformer;
D O I
10.1080/17538947.2023.2261770
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Different to pixel-based and object-based image recognition, a larger perspective based on the scene can improve the efficiency of assessing large-scale building damage. However, the complexity of disaster scenes and the scarcity of datasets are major challenges in identifying building damage. To address these challenges, the Cross Conv-Transformer model is proposed to classify and evaluate the degree of damage to buildings using aerial images taken after earthquake. We employ Conv-Embedding and Conv-Projection to extract features from the images. The integration of convolution and Transformer reduces the computational burden of the model while enhancing its feature extraction capabilities. Furthermore, the two branch Conv-Transformer architecture with global and local attention is designed, allowing each branch to focus on global and local features respectively. The cross-attention fusion module merges feature information from the two branches to enrich classification features. At last, we utilize aerial images captured during the Beichuan and Yushu earthquakes as both the training and test sets to assess the model. The proposed Cross Conv-Transformer model improved classification accuracy by 4.7% and 2.1% compared to the ViT and EfficientNet. The results show that the Cross Conv-Transformer model could significantly reduces misclassification between severely and moderately damaged categories.
引用
收藏
页码:3987 / 4007
页数:21
相关论文
共 50 条
  • [21] TRANSFORMER BASED MULTIMODAL SCENE RECOGNITION IN SOCCER VIDEOS
    Gan, Yaozong
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
  • [22] Hypergraph convolutional network based weakly supervised point cloud semantic segmentation with scene-level annotations
    Lu, Zhuheng
    Zhang, Peng
    Dai, Yuewei
    Li, Weiqing
    Su, Zhiyong
    NEUROCOMPUTING, 2025, 620
  • [23] Scene-Level Geographic Image Classification Based on a Covariance Descriptor Using Supervised Collaborative Kernel Coding
    Yang, Chunwei
    Liu, Huaping
    Wang, Shicheng
    Liao, Shouyi
    SENSORS, 2016, 16 (03):
  • [24] MonoSAID: Monocular 3D Object Detection based on Scene-Level Adaptive Instance Depth Estimation
    Xia, Chenxing
    Zhao, Wenjun
    Han, Huidan
    Tao, Zhanpeng
    Ge, Bin
    Gao, Xiuju
    Li, Kuan-Ching
    Zhang, Yan
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2024, 110 (01)
  • [25] Transformer-based end-to-end scene text recognition
    Zhu, Xinghao
    Zhang, Zhi
    PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695
  • [26] MonoSAID: Monocular 3D Object Detection based on Scene-Level Adaptive Instance Depth Estimation
    Chenxing Xia
    Wenjun Zhao
    Huidan Han
    Zhanpeng Tao
    Bin Ge
    Xiuju Gao
    Kuan-Ching Li
    Yan Zhang
    Journal of Intelligent & Robotic Systems, 2024, 110
  • [27] Semi-supervised center-based discriminative adversarial learning for cross-domain scene-level land-cover classification of aerial images
    Zhu, Ruixi
    Yan, Li
    Mo, Nan
    Liu, Yi
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2019, 155 : 72 - 89
  • [28] SAM: Self Attention Mechanism for Scene Text Recognition Based on Swin Transformer
    Shuai, Xiang
    Wang, Xiao
    Wang, Wei
    Yuan, Xin
    Xu, Xin
    MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 443 - 454
  • [29] Automatic Urban Scene-Level Binary Change Detection Based on a Novel Sample Selection Approach and Advanced Triplet Neural Network
    Fang, Hong
    Guo, Shanchuan
    Wang, Xin
    Liu, Sicong
    Lin, Cong
    Du, Peijun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [30] ViTSTR-Transducer: Cross-Attention-Free Vision Transformer Transducer for Scene Text Recognition
    Buoy, Rina
    Iwamura, Masakazu
    Srun, Sovila
    Kise, Koichi
    JOURNAL OF IMAGING, 2023, 9 (12)