Building Precision: Efficient Encoder-Decoder Networks for Remote Sensing Based on Aerial RGB and LiDAR Data

被引：0

作者：

Sulaiman, Muhammad ^{[1
]}

Finnesand, Erik ^{[1
]}

Farmanbar, Mina ^{[1
]}

Belbachir, Ahmed Nabil ^{[2
]}

Rong, Chunming ^{[1
,2
]}

机构：

[1] Univ Stavanger, Dept Elect Engn & Comp Sci, N-4021 Stavanger, Norway

[2] NORCE Norwegian Res Ctr, N-5008 Bergen, Norway

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Building precision; deep learning; LiDAR; remote sensing; semantic segmentation; U-Net; context-transfer U-Net; CONVOLUTIONAL NEURAL-NETWORK; EXTRACTION; IMAGES;

D O I：

10.1109/ACCESS.2024.3391416

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Precision in building delineation plays a pivotal role in population data analysis, city management, policy making, and disaster management. Leveraging computer vision technologies, particularly deep learning models for semantic segmentation, has proven instrumental in achieving accurate automatic building segmentation in remote sensing applications. However, current state-of-the-art (SOTA) techniques are not optimized for precisely extracting building footprints and, specifically, boundaries of the building. This deficiency highlights the need to leverage Light Detection and Ranging (LiDAR) data in conjunction with aerial RGB and streamlined deep learning for improved precision. This work utilizes the MapAI dataset, which includes a variety of objects beyond buildings, such as trees, electricity lines, solar panels, vehicles, and roads. These objects showcase diverse colors and structures, mirroring the rooftops in Denmark and Norway. Due to the aforementioned problems, this study modified UNet and CT-UNet to use LiDAR data and RGB images to segment buildings using Intersection Over Union (IoU) to evaluate building overlap and Boundary Intersection Over Union (BIoU) to evaluate precise building boundaries and shapes. The proposed work changes the configuration of these networks to streamline with LiDAR data for efficient segmentation. The batch data in training is augmented to improve model generalization and overcome overfitting. Batch normalization inclusion also improves overfitting. Four backbones with transfer learning are employed to enhance convergence and parameter efficiency of segmentation: ResNet50V2, DenseNet201, EfficientNetB4, and EfficientNetV2S. Test-Time Augmentation (TTA) is employed to improve the predicted mask. Experiments are performed using single and ensemble models, with and without Augmentation. The ensemble model outperforms the single model, and TTA also improves the results. LiDAR data with RGB improves the combined score (average of IoU and BIoU) by 13.33% compared to only RGB images.

引用

页码：60329 / 60346

页数：18

共 50 条

[41] HeteroNet: a heterogeneous encoder-decoder network for sea-land segmentation of remote sensing images
Ji, Xun
Tang, Longbin
Liu, Tianhe
Guo, Hui
[J]. JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (05)
[42] Automatic 3D Multiple Building Change Detection Model Based on Encoder-Decoder Network Using Highly Unbalanced Remote Sensing Datasets
Gomroki, Masoomeh
Hasanlou, Mahdi
Chanussot, Jocelyn
[J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 10311 - 10325
[43] Cascaded deep convolutional encoder-decoder neural networks for efficient liver tumor segmentation
Budak, Umit
Guo, Yanhui
Tanyildizi, Erkan
Sengur, Abdulkadir
[J]. MEDICAL HYPOTHESES, 2020, 134
[44] Deep convolutional encoder-decoder networks based on ensemble learning for semantic segmentation of high-resolution aerial imagery
Zhu, Huming
Liu, Chendi
Li, Qiuming
Zhang, Lingyun
Wang, Libing
Li, Sifan
Jiao, Licheng
Hou, Biao
[J]. CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2024, 6 (04) : 408 - 424
[45] Automatic Building Extraction on High-Resolution Remote Sensing Imagery Using Deep Convolutional Encoder-Decoder With Spatial Pyramid Pooling
Liu, Yaohui
Gross, Lutz
Li, Zhiqiang
Li, Xiaoli
Fan, Xiwei
Qi, Wenhua
[J]. IEEE ACCESS, 2019, 7 : 128774 - 128786
[46] Efficient Channel Attention Based Encoder-Decoder Approach for Image Captioning in Hindi
Mishra, Santosh Kumar
Rai, Gaurav
Saha, Sriparna
Bhattacharyya, Pushpak
[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
[47] Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks
Cho, Kyunghyun
Courville, Aaron
Bengio, Yoshua
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 1875 - 1886
[48] Extracting Buildings from Remote Sensing Images Using a Multitask Encoder-Decoder Network with Boundary Refinement
Xu, Hao
Zhu, Panpan
Luo, Xiaobo
Xie, Tianshou
Zhang, Liqiang
[J]. REMOTE SENSING, 2022, 14 (03)
[49] Image restoration of finger-vein networks based on encoder-decoder model
Guo, Xiao-jing
Li, Dan
Zhang, Hai-gang
Yang, Jin-feng
[J]. OPTOELECTRONICS LETTERS, 2019, 15 (06) : 463 - 467
[50] An anomaly detection method based on double encoder-decoder generative adversarial networks
Liu, Hui
Tang, Tinglong
Luo, Jake
Zhao, Meng
Zheng, Baole
Wu, Yirong
[J]. INDUSTRIAL ROBOT-THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH AND APPLICATION, 2021, 48 (05): : 643 - 648

← 1 2 3 4 5 →