Scene-level buildings damage recognition based on Cross Conv-Transformer

被引：1

作者：

Shi, Lingfei ^{[1
]}

Zhang, Feng ^{[1
,2
,5
]}

Xia, Junshi ^{[3
]}

Xie, Jibo ^{[4
]}

机构：

[1] Zhejiang Univ, Sch Earth Sci, Hangzhou, Peoples R China

[2] Zhejiang Prov Key Lab Geog Informat Sci, Hangzhou, Peoples R China

[3] RIKEN Ctr Adv Intelligence Project, Geoinformat Unit, Tokyo, Japan

[4] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing, Peoples R China

[5] Zhejiang Univ, Sch Earth Sci, 38 Zheda Rd, Hangzhou 310027, Peoples R China

来源：

INTERNATIONAL JOURNAL OF DIGITAL EARTH | 2023年 / 16卷 / 02期

关键词：

Scene recognition; damaged buildings; aerial images; transformer;

D O I：

10.1080/17538947.2023.2261770

中图分类号：

P9 [自然地理学];

学科分类号：

0705 ; 070501 ;

摘要：

Different to pixel-based and object-based image recognition, a larger perspective based on the scene can improve the efficiency of assessing large-scale building damage. However, the complexity of disaster scenes and the scarcity of datasets are major challenges in identifying building damage. To address these challenges, the Cross Conv-Transformer model is proposed to classify and evaluate the degree of damage to buildings using aerial images taken after earthquake. We employ Conv-Embedding and Conv-Projection to extract features from the images. The integration of convolution and Transformer reduces the computational burden of the model while enhancing its feature extraction capabilities. Furthermore, the two branch Conv-Transformer architecture with global and local attention is designed, allowing each branch to focus on global and local features respectively. The cross-attention fusion module merges feature information from the two branches to enrich classification features. At last, we utilize aerial images captured during the Beichuan and Yushu earthquakes as both the training and test sets to assess the model. The proposed Cross Conv-Transformer model improved classification accuracy by 4.7% and 2.1% compared to the ViT and EfficientNet. The results show that the Cross Conv-Transformer model could significantly reduces misclassification between severely and moderately damaged categories.

引用

页码：3987 / 4007

页数：21

共 50 条

[21] TRANSFORMER BASED MULTIMODAL SCENE RECOGNITION IN SOCCER VIDEOS
Gan, Yaozong
Togo, Ren
Ogawa, Takahiro
Haseyama, Miki
2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
[22] Hypergraph convolutional network based weakly supervised point cloud semantic segmentation with scene-level annotations
Lu, Zhuheng
Zhang, Peng
Dai, Yuewei
Li, Weiqing
Su, Zhiyong
NEUROCOMPUTING, 2025, 620
[23] Scene-Level Geographic Image Classification Based on a Covariance Descriptor Using Supervised Collaborative Kernel Coding
Yang, Chunwei
Liu, Huaping
Wang, Shicheng
Liao, Shouyi
SENSORS, 2016, 16 (03):
[24] MonoSAID: Monocular 3D Object Detection based on Scene-Level Adaptive Instance Depth Estimation
Xia, Chenxing
Zhao, Wenjun
Han, Huidan
Tao, Zhanpeng
Ge, Bin
Gao, Xiuju
Li, Kuan-Ching
Zhang, Yan
JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2024, 110 (01)
[25] Transformer-based end-to-end scene text recognition
Zhu, Xinghao
Zhang, Zhi
PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695
[26] MonoSAID: Monocular 3D Object Detection based on Scene-Level Adaptive Instance Depth Estimation
Chenxing Xia
Wenjun Zhao
Huidan Han
Zhanpeng Tao
Bin Ge
Xiuju Gao
Kuan-Ching Li
Yan Zhang
Journal of Intelligent & Robotic Systems, 2024, 110
[27] Semi-supervised center-based discriminative adversarial learning for cross-domain scene-level land-cover classification of aerial images
Zhu, Ruixi
Yan, Li
Mo, Nan
Liu, Yi
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2019, 155 : 72 - 89
[28] SAM: Self Attention Mechanism for Scene Text Recognition Based on Swin Transformer
Shuai, Xiang
Wang, Xiao
Wang, Wei
Yuan, Xin
Xu, Xin
MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 443 - 454
[29] Automatic Urban Scene-Level Binary Change Detection Based on a Novel Sample Selection Approach and Advanced Triplet Neural Network
Fang, Hong
Guo, Shanchuan
Wang, Xin
Liu, Sicong
Lin, Cong
Du, Peijun
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[30] ViTSTR-Transducer: Cross-Attention-Free Vision Transformer Transducer for Scene Text Recognition
Buoy, Rina
Iwamura, Masakazu
Srun, Sovila
Kise, Koichi
JOURNAL OF IMAGING, 2023, 9 (12)

← 1 2 3 4 5 →